Elastic Indexing at Petabyte Scale

By Ori Reshef
August 4, 2021
August 4, 2021

Varada Ships Version 3.0, Adding Elastic Scaling to the Power of Indexing for Big Data Analytics, Extending TCO and Performance Advantages

See announcement on BusinessWire.

Agility is not limited to the type of queries, but also to the volume of queries, which means volatility in compute for query processing is expected to be high. Data teams are often measured on how quickly they can react to spikes in demand.

Varada announces a new set of features designed to add elastic scaling to our autonomous indexing platform.

Now indexing is not only adaptive and dynamic, but also extremely elastic enabling a true zero DataOps experience on the data lake. Effective separation of compute and storage enables to elastically scale and add additional clusters as query traffic fluctuates, avoiding overprovisioning and idle resources.

Delivering The Most Effective Query Path

Varada’s platform includes three data and index layers:

  1. Hot data and index: SSD NVMe attached nodes, in the customer’s VPC, are used to process queries and store hot data and cache for optimal performance.
  2. Warm index and data: object storage bucket on the customer’s data lake is used to store all indexes for scaling purposes. This layer is shared among all clusters to ensure minimal resources are allocated to indexing when scaling out or adding new clusters.
  3. Cold data: the customer’s data lake remains the single source of truth.

Schedule A Demo

Identifying the Optimal Query Path

The platform uses consistent hashing to split each query, hitting a specific cluster, into uniformly distributed work tasks. Each task identifies the optimal data path for the query, which includes hot data and index on the cluster’s SSDs, warm data and index which is shared by all clusters, or the data lake which remains the single source of truth.

The top priority is of course using SSDs which delivers the optimal performance. Data and index in SSDs is determined by the platform’s machine-learning based automated acceleration instructions or priorities set by platform admins. If data and/or index are not available, the query path will choose the warm data & index and revert to the data lake if not available.

Scale Effectively & Elastically

When scaling in or eliminating clusters, indexes are not lost and remain available. An index-once approach enables to speed up warm-up time by 10x-20x compared to indexing data from scratch:

Varada Elastic Scaling Illustration
  • Indexes are created based on actual demand by queries, or to meet priorities and performance requirements defined by data admins. 
  • As new indexes are created by the platform they are also stored in a designated folder on the customer’s data lake (“warm data”), in addition to the cluster’s SSDs (“hot data”). 
  • When the cluster is scaled in or eliminated and some (or all) nodes are shut down, indexes remain available as warm data.
  • Warm indexes enable fast warming up when scaling back out or adding new clusters, and adding SSD resources to the cluster(s).
  • Customers can continue using their existing scaling and auto-scaling policies, scaling groups and tools.
  • When scaling in, data admins keep the ability to start a cluster with the state and acceleration instructions of previously live clusters.

Varada’s platform is based on a multi-cluster approach, which enables different clusters to share warm indexed data by accessing the designated bucket on the data lake. This approach enables different analytics use cases to run effectively on the data lake while sharing the “investment” in indexing. Any new use case that is added, even if it’s set up on a separate cluster, will benefit from any existing indexing that were already created.

Be Proactive!

In addition to behavior-based indexing, data platform teams can opt for indexing in the background by low cost nodes (i.e. spots). Indexing will be stored on the “warm data” layer for fast warming up in the future. This option can be used to prepare in advance for upcoming spikes in analytics requirements or to significantly reduce TCO.

Data teams can also leverage the “warm data” for fast index recovery after node failure and significantly reduce recovery time.

To see Varada’s autonomous indexing platform in action, schedule a live demo!

We use cookies to improve your experience. To learn more, please see our Privacy Policy