Self-Optimizing Cloud
Data Virtualization

Varada’s dynamic and adaptive indexing solution enables to
balance performance and cost with zero data-ops.

See the magic on AWS

Varada’s data virtualization platform serves as a smart acceleration layer on your data lake, which remains the single source of truth, and runs in the customer cloud environment (VPC). Varada enables data teams to operationalize the entire data lake while ensuring interactive performance, without the need to move data, model or manually optimize.

Our secret sauce is our ability to automatically and dynamically index relevant data, at the structure and granularity of the source. Varada enables any query to meet continuously evolving performance and concurrency requirements for users and analytics API calls, while keeping costs predictable and under control.

Zero Data-Ops.

Zero
Data-Ops.

Varada automatically accelerates queries according to workload behavior and automatic detection of hot data and bottlenecks. The platform also enables data teams to define business priorities and accordingly adjust performance and budgets, eliminating the need to build separate silos for each use case. The platform seamlessly defines the semantic layer, chooses which queries to accelerate and which data to index. Varada leverages machine learning to elastically adjust the cluster to meet demand and optimize cost and performance.

Adaptive
& Dynamic Indexing.

Adaptive
& Dynamic Indexing.

Our indexing technology breaks data, across any column, into nano blocks. Varada automatically chooses the most effective index for each nano-block based on the data content and structure. We use a variety of indexes such as Bitmap, Dictionary, Trees, text analysis etc. and tailor each one to every nano block. This unique indexing technology is what makes all your data available and interactive.

Varada Data Virtualization Platform

Query Orchestrator

Manages queries and cluster resources according to budgets and workload priorities, and which elastically grows and shrinks the cluster resources based on the load.

Distributed Query Engine (Presto-based)

Includes a Coordinator node that optimizes and distributes queries, and Workers that execute queries using massively parallel processing.

Acceleration engine

Optimizes queries on the fly using adaptive indexing, data materialization, and intermediate result calculation based on workload insights.

Workload Monitoring & Learning Engine

Uses machine learning to detect repeating patterns and hotspots in queries and adaptively choose dynamic acceleration. This engine exposes information to data teams, provides full visibility, explores workloads, and prioritizes according to business needs.

Platform Overview

Varada includes out-of-the-box native support for all community supported Presto connectors to access a wide array of data sources. The Varada query engine also expands upon the open source Presto query engine by adding enterprise grade support for high availability in the Coordinator and Workers, so both can withstand node failures. Varada’s cost-based optimizer extends the basic optimizer with knowledge of how and when to accelerate queries with adaptive indexes. Varada Workers are able to auto-scale based on dynamic workload and administrator configuration.

All Your Available Data Becomes Instantly Operational

Varada’s big data infrastructure platform is deployed within your VPC to ensure optimal control, security and governance. Varada connects directly to a wide range of data sources, including:

  • Public / Private Cloud Storage: on-prem Hadoop, AWS S3, GCP (coming soon)
  • BigQuery (coming soon), Azure object storage (coming soon) Data Formats: ORC, Parquet, JSON, CSV, and more
  • Data Catalogs: Hive Metastore, AWS Glue Additional Data Sources: PostgreSQL, MySQL, and more
Varada connects to any data source on the data lake Varada connects to any data source on the data lake

The Power
of Adaptive Indexing

Varada’s unique indexing efficiently indexes data directly from the data lake across selected columns so that every query is optimized automatically. Varada indexes adapt to changes in data over time, taking advantage of Presto’s vectorized columnar processing by splitting columns into small chunks, called nanoblocks™. Based on the data type, structure, and distribution of data in each nanoblock, Varda automatically creates an optimal index. To ensure fast performance for every query and each nanoblock, Varada automatically selects from a set of indexing algorithms and indexing parameters that adapt and evolve as data changes to ensure best fit index any data nanoblock.

Varada dynamic and adaptive indexing for big data

At query time when running through the Varada endpoint, users see transparent performance benefits when filtering, joining and aggregating data. Varada transparently applies indexes to any SQL WHERE clause, on indexed columns, within a SQL statement. Indexes are used for point lookups, range queries and string matching of data in nanoblocks. Varada automatically detects and uses indexes to accelerate JOINs using the index of the key column. Varada indexes can be used for dimensional JOINs combining a fact table with a filtered dimension table, for self-joins of fact tables based on time or any other dimension as an ID, and for joins between indexed data and federated data sources. SQL aggregations and grouping is accelerated using nanoblock indexes as well.

Optimzed accelerated query example

This example highlights the different techniques Varada leverages to optimize and accelerate queries:

Varada big data indexing and dynamic filtering Varada big data indexing and dynamic filtering Varada big data indexing and dynamic filtering

Resource Aware Intelligen Cost Based Optimizer

Varada takes Presto’s built-in Cost-Based Optimizer (CBO) to the next level, by automatically analyzing and introducing indexes for filtering, joins and aggregates, continuously reanalyzing query performance on the fly, and balancing resources across the entire system. Varada uses machine learning to decide when and what to optimize. With the benefit of lightweight indexing, Varada is able to use intelligent and elastic resource allocation, and leveraging intermediate results. The resulting cost model is exposed to administrators and users who can then prioritize specific user queries.

Varada’s cluster is deployed as a private managed service, within your VPC environment. This deployment model ensures that data remains within your secured VPC. Varada integrates with standard AWS data access and governance solutions. Varada includes a control center that enables easy and ongoing monitoring of Varada clusters. Just as with the Varada cluster, the control center is deployed in your VPC and uses cross-account permissions to provision and set up instances in your VPC.

Highly Available wih Streamlined DevOps

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept