We are so excited to unveil Varada’s data virtualization platform which helps organizations instantly monetize all of their available data with a predictable and controlled budget. Using a dynamic indexing technology, the Varada Data Platform enables data teams to balance query performance and cost of queries at massive scale, without ceding control of their data to third-party vendors.
The beta period for our product has proven two things. First, that organizations are desperate for a way to simplify data ops management while getting the cost of query acceleration under control. Second, the path we’ve chosen is striking a chord: Varada is a ‘zero data ops’ approach that eliminates data silos by serving many workloads from one platform. And because all queries will run atop the data lake, there is a single source of truth that eliminates the need to move or model data. With several dozen early users on the platform, it’s time to bring this innovative approach to a market that’s ready for it.
The Varada Data Platform, available today, offers advantages compared with other data virtualization tools:
Allowing organizations to retain full control of their data and avoid vendor lock-in. Because the Varada Data Platform sits atop a customer’s existing data lake, there is no need to move data or budget for additional ETLs and storage, which reduces both cost and complexity while enabling data teams to keep data secure under consistent policies.
Data teams get deep visibility into workload performance and cluster utilization. They can easily define workload priorities, business requirements and budget. Varada automatically optimizes workloads to meet those performance and budget requirements. Even without the input of data architects, Varada continuously monitors workloads to identify heavy users, hotspots, bottlenecks and other issues and, using machine learning, elastically adjusts the compute and storage cluster. Alternatively, data teams have the option to exercise fine-grained control of budgets and business requirements, so they can gain full control and flexibility.
The Varada Data Platform drastically reduces query execution time and the required compute resources. The key is Varada’s proprietary indexing technology, which breaks data across any column into nanoblocks and automatically chooses the most effective index for each nanoblock based on the data content and structure. This unique indexing technology is what makes queries extremely fast without the need to model data or move it to optimized data platforms.
At query time when running through the Varada endpoint, users see transparent performance benefits when filtering, joining and aggregating data. Varada transparently applies indexes to any SQL WHERE clause, on indexed columns, within a SQL statement. Indexes are used for point lookups, range queries and string matching of data in nanoblocks. Varada automatically detects and uses indexes to accelerate JOINs using the index of the key column. Varada indexes can be used for dimensional JOINs combining a fact table with a filtered dimension table, for self-joins of fact tables based on time or any other dimension as an ID, and for joins between indexed data and federated data sources. SQL aggregations and grouping is accelerated using nanoblock indexes as well.
Varada’s big data analytics platform is deployed within your VPC to ensure optimal control, security and governance. The platform is available on AWS Marketplace with integrated billing through AWS, or via AMI.
Varada supports any SQL for data analytics and connects directly to a wide range of data sources, including:
Data Formats: ORC, Parquet, JSON, CSV and more
Data Catalogs: Hive Metastore, AWS Glue
Additional Data Sources: PostgreSQL, MySQL and more
Stay tuned! GCP, Azure and Kubernetes coming soon.
Read the full press release here.