Dynamically Accelerate Queries Across Any Dataset with Machine Learning

Varada’s platform autonomously identifies which datasets to accelerate and how to optimally balance performance and price.
The ability to dynamically accelerate different datasets is at the core of Varada’s solution. Our unique big data indexing technology is designed to continuously monitor and learn which datasets, within the massive data lake, are frequently used or required to meet specific performance requirements of high-priority workloads. By using this feedback loop, different datasets are dynamically and automatically operationalized by indexing, cache, intermediate results, or any combination that delivers optimal performance and price balance.

Varada’s Query Acceleration Engine

Varada’s Query Acceleration Engine is based on a machine learning feedback loop that continuously monitors and evaluates query behavior and performance, and dynamically configures query acceleration strategies.

The Collector

Varada continuously collects query execution metadata from the query engine. Query execution metadata is then transformed into a summarized model that is optimized for insight extraction. The modelled metadata is stored in columnar ORC format in an admin-defined S3 bucket.

01

Varada continuously collects query execution metadata from the query engine. Query execution metadata is then transformed into a summarized model that is optimized for insight extraction. The modelled metadata is stored in columnar ORC format in an admin-defined S3 bucket.

The Accelerator

Using historical query and data usage patterns, based on the Collector output, Varada creates an actionable set of insights:

  • Usage, statistics and aggregators of specific columns
  • Queries usage, statistics, column relations and selectivity levels
  • Operators used on specific tables
  • Cross table relations, identifying common join keys
  • And more…

Acceleration Strategies

Insights are continuously revised, based on real-time usage and query performance, and translated into two types of acceleration strategies: cache and index strategies are used to automatically create acceleration instructions on which data to index, and how, and which data to cache.

Cache Strategies:

Based on the frequency of data usage and its business priority, the platform uses SSD columnar nanoblock caching to speed up data access.

Indexing Strategies:

The platform is adaptive to the data and uses different indexing technologies to speed up data searches, filters and joins. The impact of each index is evaluated separately based on data type and level of selectively so that the platform can use the optimal index.

Keeping Instructions as Granular as Possible

Acceleration instructions are extremely granular and are based on specific tables, columns and even partitions. To ensure acceleration meets additional business considerations, acceleration strategies are also directly influenced by workloads priorities, as determined by administrators and data consumers. Each new set of instructions is automatically configured and implemented, according to budget caps and allocated resources, as set by administrators.
02

Using historical query and data usage patterns, based on the Collector output, Varada creates an actionable set of insights:

  • Usage, statistics and aggregators of specific columns
  • Queries usage, statistics, column relations and selectivity levels
  • Operators used on specific tables
  • Cross table relations, identifying common join keys
  • And more…

Acceleration Strategies

Insights are continuously revised, based on real-time usage and query performance, and translated into two types of acceleration strategies: cache and index strategies are used to automatically create acceleration instructions on which data to index, and how, and which data to cache.

Cache Strategies:

Based on the frequency of data usage and its business priority, the platform uses SSD columnar nanoblock caching to speed up data access.

Indexing Strategies:

The platform is adaptive to the data and uses different indexing technologies to speed up data searches, filters and joins. The impact of each index is evaluated separately based on data type and level of selectively so that the platform can use the optimal index.

Keeping Instructions as Granular as Possible

Acceleration instructions are extremely granular and are based on specific tables, columns and even partitions. To ensure acceleration meets additional business considerations, acceleration strategies are also directly influenced by workloads priorities, as determined by administrators and data consumers. Each new set of instructions is automatically configured and implemented, according to budget caps and allocated resources, as set by administrators.

Varada Control Center

Based on the query execution metadata, Varada delivers actionable workload-level observability that enables administrators to easily understand how data is used by different workloads and users, how resources are allocated among different workloads and users, how and why bottlenecks occur, etc.
This deep observability also enables data teams to gain control by effectively identifying and optimizing high priority workloads, instead of optimizing each queries one-by-one.

Administrators maintain full control on query acceleration

Business-Driven Priorities

Administrators and data consumers can prioritize workloads and set budget caps to ensure the platform meets business requirements across different use cases. Workload prioritization is used by the platform to drive cache and indexing strategies.

Instruction Management

Though acceleration instructions are generated automatically by the platform, administrators have full control -- view, manage and override specific instructions via Varada’s Control Center, and determine which datasets to accelerate and which strategies to apply.

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept