Athena Full Scans Often Result in Spiraling & Unpredictable Costs

David Krakov
By David Krakov
August 18, 2020
August 18, 2020

As data virtualization gains popularity, administrators are running into new challenges. The best solutions for virtualization are query engines such as Presto, which is highly versatile and available in hosted native implementations such as AWS Athena. AWS Athena is a devops light implementation of Presto that integrates natively with the rest of the AWS ecosystem, offering significantly reduced operational overhead. Yet at scale, this popular approach doesn’t give administrators the tools they need to properly and efficiently tune performance for problem queries or manage costs and resources with an ever growing user base.

As Athena administrators know, Athena works exceptionally well out of the box until users run into performance issues. Even though core Presto has powerful tools for optimization, a zero devops solution such as Athena doesn’t include any tooling for analyzing performance issues. As a result, when an Athena deployment gains adoption in an organization, users run into roadblocks trying to productionalize the system. For example, some queries incur expensive and time consuming scans, which means users can’t reliably power real time dashboards. Users also run into issues getting predictable query times when issuing complicated joins.

The New Standard for Data Virtualization Architecture

Varada has introduced a Presto based query engine that gives data lake administrators the power to optimize their Athena-based data virtualization architecture. Varada lets you run a full scale production grade data virtualization solution without needing to resort to an add-on data warehouse or hand optimize every query. Best of all, Varada runs directly in your VPC through the AWS Marketplace offering. Users can access everything in the data lake via Varada through the shared catalog using AWS Glue or the Hive metastore. Administrators simply need to make Varada available to users via a standard Presto endpoint.

Make Your Users (and CFO) Happy!
Augment AWA Athana to enable price and performance optimizations

Instead of being stuck with a black box that’s difficult to tune, Varada takes the cost based optimization to the next level. By automatically and continuously analyzing queries and workloads, Varada offers deep visibility into how your workloads perform:

  • Learn how resources are used on hourly / weekly basis 
  • Identify heavy spenders and improve the pipeline
  • Improve predicate pushdown and significantly reduce IO & CPU
  • Identify your “hottest” data
  • Improve JOINs performance
  • Better production roll-out experience, and identify upgrade risks upfront

This smart and continuous monitoring enables Varada to automatically balancing resources across the entire system.

The Power of Indexing on the Data Lake

Varada also introduces adaptive indexes for filtering, joins and aggregates, and leverages machine learning to decide when and what to optimize. With the benefit of lightweight indexing, Varada is able to use intelligent and elastic resource allocation, and leveraging intermediate results. The resulting cost model is exposed to administrators and users who can then prioritize specific user queries.

Whether you’re considering more cost effective architectures for a cloud data lake or have already gotten started with Presto and Athena, you’ll find a lot of success with an AWS Athena based solution paired with Varada. Athena brings the reality of a no-devops query engine to AWS based data lakes, enabling true data virtualization without costly data duplication and brittle data movement. For large-scale use cases, Varada enables data architects to seamlessly accelerate and optimize workloads to meet specific performance and cost requirements with zero data-ops and effective resource utilization.

To learn more about how to close the data virtualization gap, download the whitepaper.

We use cookies to improve your experience. To learn more, please see our Privacy Policy