See the New Standard for Data Virtualization in Action!

Shira Sarid
By Shira Sarid
I
July 7, 2020
July 7, 2020

Varada offers the new standard for data virtualization, with a smart indexing layer that runs directly on the customer data lake. Varada enables data teams to operationalize the entire data lake while ensuring interactive performance, without the need to move data, model or manually optimize.

Our secret sauce is our ability to automatically and dynamically index relevant data, at the structure and granularity of the source. Varada enables any query to meet continuously evolving performance and concurrency requirements for users and analytics API calls, while keeping costs predictable and under control.

Serve Queries Directly on Your Data Lake. No Need to Move Data!

Varada is deployed directly in your cloud VPC so you can meet security and governance requirements as  data doesn’t need to be moved or duplicated.

Varada connects directly to a wide range of data sources, including:

  • Public / Private Cloud Storage: on-prem Hadoop, AWS S3, GCP (coming soon), BigQuery (coming soon), Azure object storage (coming soon)
  • Data Formats: ORC, Parquet, JSON, CSV, and more
  • Data Catalogs: Hive Metastore, AWS Glue Additional Data Sources: PostgreSQL, MySQL, and more

Varada includes out-of-the-box native support for all community supported Presto connectors to access a wide array of data sources. The Varada query engine also expands upon the open source Presto query engine by adding enterprise grade support for high availability in the Coordinator and Workers, so both can withstand node failures. Varada’s cost-based optimizer extends the basic optimizer with knowledge of how and when to accelerate queries with adaptive indexes. Varada Workers are able to auto-scale based on dynamic workload and administrator configuration.

The Power of Big Data Indexing

Varada’s unique indexing efficiently indexes data directly from the data lake across selected columns so that every query is optimized automatically. Varada indexes adapt to changes in data over time, taking advantage of Presto’s vectorized columnar processing by splitting columns into small chunks, called nanoblocks™. Based on the data type, structure, and distribution of data in each nanoblock, Varda automatically creates an optimal index. To ensure fast performance for every query and each nanoblock, Varada automatically selects from a set of indexing algorithms and indexing parameters that adapt and evolve as data changes to ensure best fit index any data nanoblock.

Varada’s Indexing technology is used for:

  • Filters – any SQL WHERE clause, on any column, within an SQL statement can use an index. Indexes are used for point lookups, range queries and string matching of data in nanoblocks
  • Joins – any SQL JOIN statement uses the index of the key column; the index can be used for dimensional JOINs — combining a fact table with a filtered dimension table, for self-joins of fact tables based on time or any other dimension as an ID, and for a joins between materialized (indexed) data and virtualized data sources Varada will automatically detect and use the index for applicable JOINs
  • Aggregations (coming soon) – SQL aggregations and grouping can leverage nanoblock indexes to accelerate performance

At query time when running through the Varada endpoint, users see transparent performance benefits when filtering, joining and aggregating data. Varada transparently applies indexes to any SQL WHERE clause, on indexed columns, within a SQL statement. Indexes are used for point lookups, range queries and string matching of data in nanoblocks. Varada automatically detects and uses indexes to accelerate JOINs using the index of the key column. Varada indexes can be used for dimensional JOINs combining a fact table with a filtered dimension table, for self-joins of fact tables based on time or any other dimension as an ID, and for joins between indexed data and federated data sources. SQL aggregations and grouping is accelerated using nanoblock indexes as well.

A Single Data Platform for Any SQL Application

Varada enables to serve all your SQL applications – including internal dashboards and apps, customer-facing apps, BI tools, etc. – out of the box, making sure it’s easy and worry-free.

Varada enables to combine SQL (via JOIN, UNION, etc.) data from direct data source connectivity (such as a relational database or data lake) with materialized indexed datasets using their inline indexes.

Virtual views can seamlessly mix data sources and materialized indexed datasets, enabling to transparently serve data application and users from different data tiers.

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept