See announcement on BusinessWire.
Agility is not limited to the type of queries, but also to the volume of queries, which means volatility in compute for query processing is expected to be high. Data teams are often measured on how quickly they can react to spikes in demand.
Varada announces a new set of features designed to add elastic scaling to our autonomous indexing platform.
Now indexing is not only adaptive and dynamic, but also extremely elastic enabling a true zero DataOps experience on the data lake. Effective separation of compute and storage enables to elastically scale and add additional clusters as query traffic fluctuates, avoiding overprovisioning and idle resources.
Varada’s platform includes three data and index layers:
The platform uses consistent hashing to split each query, hitting a specific cluster, into uniformly distributed work tasks. Each task identifies the optimal data path for the query, which includes hot data and index on the cluster’s SSDs, warm data and index which is shared by all clusters, or the data lake which remains the single source of truth.
The top priority is of course using SSDs which delivers the optimal performance. Data and index in SSDs is determined by the platform’s machine-learning based automated acceleration instructions or priorities set by platform admins. If data and/or index are not available, the query path will choose the warm data & index and revert to the data lake if not available.
When scaling in or eliminating clusters, indexes are not lost and remain available. An index-once approach enables to speed up warm-up time by 10x-20x compared to indexing data from scratch:
Varada’s platform is based on a multi-cluster approach, which enables different clusters to share warm indexed data by accessing the designated bucket on the data lake. This approach enables different analytics use cases to run effectively on the data lake while sharing the “investment” in indexing. Any new use case that is added, even if it’s set up on a separate cluster, will benefit from any existing indexing that were already created.
In addition to behavior-based indexing, data platform teams can opt for indexing in the background by low cost nodes (i.e. spots). Indexing will be stored on the “warm data” layer for fast warming up in the future. This option can be used to prepare in advance for upcoming spikes in analytics requirements or to significantly reduce TCO.
Data teams can also leverage the “warm data” for fast index recovery after node failure and significantly reduce recovery time.
To see Varada’s autonomous indexing platform in action, schedule a live demo!