Presto Workload Analyzer:
Tips to Get Started

Guy Mast
By Guy Mast
I
June 11, 2020
June 11, 2020

Data platform teams are consistently in need to monitor their Presto cluster performance and identify bottlenecks in order to meet data workers demand.

Presto users usually optimize queries via EXPLAIN ANALYZE command or use Presto’s Web UI query plan for getting detailed insights on queries. These queries execution details are collected by Presto and are exposed via the coordinator REST API, in JSON format.

But Presto falls short on ease-of-use and insights are often only intuitive to highly experienced Presto experts. Moreover, Presto lacks the “big picture” view on how workloads perform. It is extremely challenging to derive resolutions from one query to other similar queries or identify workload optimization opportunities.

Most challenges center around increasing data availability, improve the delicate cost / performance balance and speed up time-to-market.

Varada’s Workload Analyzer is a free and easy to use tool that offers deep actionable insights and unprecedented visibility into analytics workloads running on Presto:

  • Resource utilization – get workload-level / cluster analysis on CPU, RAM and I/O utilization patterns.
  • Identify heavy spenders – learn which users and tables take up the most resources in terms of CPU and I/O so you can scale accordingly and eliminate bottlenecks.
  • Deep insights on workload characteristics – Varada analyzes how your workloads use Presto operators such as Aggregations, JOIN, LIKE and on which data sources they access. Furthermore, you’ll see the impact the usage has on overall performance so you can improve where possible.

For example, you’ll discover where to apply JOIN reordering and where to change  JOIN-distribution-type. Deeply understanding the level of selectivity of queries will also enable to apply acceleration technologies.

Let’s Get Started!

Step 1: Download the script, it’s free!

The script runs on your Presto cluster in your VPC. It will not slow down your cluster and no data will be sent to Varada or any third parties.

New call-to-action

The New call-to-action is also available for a deep dive.

Step 2: Collect

Fetch JSONs from the Presto coordinator for the analyzed / recorded period.

Step 3: Extract & Analyze

The script generates a detailed customized report based on collected JSONs. 

See the Workload Analyzer in action!
Check out this New call-to-action – the report is based on real customer data that has been anonymized.

Extend Presto-Based Workflows to Support Low Latency Analytics

Varada offers a unique New call-to-action that delivers a significant performance uplift and cost optimization for latency-sensitive applications. Indeed, Presto offers unprecedented agility so that data-driven applications can run any query, anytime. But cost / performance is a real challenge which has limited Presto usage in these use cases.

Varada embeds Presto so you enjoy all its benefits, including the ability to run any SQL-based application out-of-the-box. But we also add a hot-tier of indexed data that is responsible for the uplift. Varada indexed data on all dimensions so you don’t lose the granularity of data.

Varada operates directly on top of existing sources of truth, just like Presto, and automatically synchronizes changes and indexes new data rapidly. Queries can be served to accommodate operational SLAs less than 60 seconds from data arrival. By eliminating and modeling, Varada enables any query to run while supporting various SLAs on response time, concurrency for users and analytics API calls, and data freshness.


Ready to see Varada in action? Click New call-to-action to schedule a short demo.

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept