The Power of Inline Indexing

Inline Indexing™ is at the heart of Varada’s innovative approach to enable interactive analytics on large and complex datasets. Inline Indexing reflects data as a mesh of nanoblocks™, each independently indexed and inter-connected. Any SQL query on Varada, will fly across nanoblocks to filter, join and aggregate using indexes across a variety of domains.

Varada clusters run within your Virtual Private Cloud (VPC) and serve as an operational data tier between data consumers and the existing data sources such as columnar data on a data lake. Existing data sources continue to serve as a single source of truth.

Data consumers include any application that generates SQL such as interactive BI tools, data APIs and custom decision systems, as well as analysts and data scientists

Varada Data Virtualization Platform for SQL queries directly on the data lake

Distributed SQL Engine

Varada is deployed within the customer VPC and connects directly to a wide range of data sources, including cloud storage, various data formats, data catalogs, etc.
Read more

Indexed Materialized View

The operational dataset is adaptively indexed as it is loaded, across all dimensions.
Read more

Inline Indexing Engine

Varada is deployed within your VPC and connects directly to a wide range of data sources, including cloud storage, various data formats, data catalogs, etc.
Read more

Defining the Indexed Materialized View (Last-Mile ETL)

The operational dataset is adaptively indexed as it is loaded, across all dimensions.
Read more

Automatic Synchronization

To enable workloads to work out-of-the box, Varada embeds Presto (community edition), a “SQL-on-anything” engine that supports different connectors to access different data sources.
Read more

Environment and Sources

fetches data from underlying data sources, according to the operational dataset defined by users, and keeps it up-to-date.
Read more

Environment and Sources

Varada’s big data infrastructure platform is deployed within your VPC to ensure optimal control, security and governance. Varada connects directly to a wide range of data sources, including:

  • Public / Private Cloud Storage: on-prem Hadoop, AWS S3, GCP (coming soon)
  • BigQuery (coming soon), Azure object storage (coming soon) Data Formats: ORC, Parquet, JSON, CSV, and more
  • Data Catalogs: Hive Metastore, AWS Glue Additional Data Sources: PostgreSQL, MySQL, and more
Watch video
Varada connects to any data source

Defining the Indexed Materialized View (Last-Mile ETL)

Operational datasets are defined using the CREATE MATERIALIZED VIEW SQL command, determining dataset definition, lifecycle and applying last-mile ETL. The ability to create operational datasets in a single-click enable:

  • Simple control and management of data lifecycle policies and access control
  • Flexible and live semantic layer to define indexed data
  •  Indexed view insights help understand usage pattern and cost of any materialized view
  • Easy and effective last-mile ETLs

Bottom line, the materialized view approach shifts the focus from building custom data pipelines and optimization to creating a flexible live definition. The materialized view can be easily updated or changed over time to accommodate evolving business needs.

Watch video
Varada Index Big Data

Unified semantic layer to all your data

All the data in the data lake, as well as external sources, can also be accessed via a unified semantic layer leveraging query pushdown capabilities.

Varada enables to combine SQL (via JOIN, UNION, etc.) data from direct data source connectivity (such as a relational database or data lake) with materialized indexed datasets using their inline indexes.

Virtual views can seamlessly mix data sources and materialized indexed datasets, enabling to transparently serve data application and users from different data tiers.

Watch video
Unified semantic Layer for all your queries

Automatic Synchronization

Varada keeps the operational data continuously synchronized, enabling easy data lifecycle management directly on the data lake. Varada leverages native cloud services such as AWS Glue, SQS, and customer managed catalogs such as Hive Metastore to keep data continuously fresh. Varada supports different synchronization and update scenarios:

  • Control refresh behavior – set per view whether refresh is on-demand, scheduled or automatic on any detected change
  • Incremental updates – any view that has an incremental definition will update only on delta changes
  • Automatic change tracking – keep your incremental indexed view up-to-date automatically using S3 notifications and metastore polling to detect changes
Varada Index Syncronization

Inline Indexing Engine

During the data materialization process, Varada loads and indexes the data across all dimensions.

The operational dataset is indexed as it is loaded, at the rate of the data ingest, without any user intervention or post-processing. The result is that any query on an inline indexed dataset will find an index ready for it.

The inline index is adaptive to the data so that each dimension (columns) is split into very small pieces, called nanoblocks, which are then stored on NVMe SSDs.

To ensure fast performance for every query and each nanoblock, Varada leverages a variety of indexing algorithms and indexing parameters that adapt and evolve as data changes to ensure best fit index any data nanoblock.

Inline Indexing is used for:

  • Filters – any SQL WHERE clause, on any column, within an SQL statement can use an index. Indexes are used for point lookups, range queries and string matching of data in nanoblocks
  • Joins – any SQL JOIN statement uses the index of the key column; the index can be used for dimensional JOINs — combining a fact table with a filtered dimension table, for self-joins of fact tables based on time or any other dimension as an ID, and for a joins between materialized (indexed) data and virtualized data sources Varada will automatically detect and use the index for applicable JOINs
  • Aggregations (coming soon) – SQL aggregations and grouping can leverage nanoblock indexes to accelerate performance
Varada Indexing Big Data

Distributed SQL Engine

To enable workloads to work out-of-the box, Varada embeds Presto (community edition), a “SQL-on-anything” engine that supports different connectors to access different data sources. Varada offers any Presto capability out-of-the-box, including support of Presto connectors. All community supported connectors are included by default. Varada SQL Engine expands Presto community edition:

  • High Availability – coordinator and worker high availability to sustain service on node failure and other issues
  • Cost-Based Optimizer – Varada’s cost-based optimizer knows how and when to apply the usage of inline indexes
  • Elastic Scaling – Worker nodes on the Varada cluster can auto-scale according to workloads and configuration
  • Materialized View Insights – Varada collects data and usage statistics on all materialized views
  • Query Insights – Varada analyzes the usage across clusters, data sources, data consumers and materialized indexed views to track frequency and identify trends
Varada Presto Distributed SQL
Varada Shared-Everything SSD NVMe
We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept