Presto / Trino Workload Analyzer: Tips to Get Started

Guy Mast
By Guy Mast
I
June 1, 2021
June 1, 2021

Data platform teams are consistently in need to monitor their Presto / Trino cluster performance and identify bottlenecks in order to meet data workers demand.

Presto users usually optimize queries via EXPLAIN ANALYZE command or use Presto’s Web UI query plan for getting detailed insights on queries. These queries execution details are collected by Presto and are exposed via the coordinator REST API, in JSON format.

But Presto falls short on ease-of-use and insights are often only intuitive to highly experienced Presto experts. Moreover, Presto lacks the “big picture” view on how workloads perform. It is extremely challenging to derive resolutions from one query to other similar queries or identify workload optimization opportunities.

Most challenges center around increasing data availability, improve the delicate cost / performance balance and speed up time-to-market.

Presto Trino Analyzer Varada

Varada’s Workload Analyzer is a free and available on github. It offers deep actionable insights and unprecedented observability for Presto clusters, including Trino (FKA PrestoSQL), PrestoDB, Starburst Enterprise and Dataproc. Insights include:

  • Resource utilization – get workload-level / cluster analysis on CPU, RAM and I/O utilization patterns.
  • Identify heavy spenders – learn which users and tables take up the most resources in terms of CPU and I/O so you can scale accordingly and eliminate bottlenecks.
  • Deep insights on workload characteristics – Varada analyzes how your workloads use Presto operators such as Aggregations, JOIN, LIKE and on which data sources they access. Furthermore, you’ll see the impact the usage has on overall performance so you can improve where possible.

For example, you’ll discover where to apply JOIN reordering and where to change  JOIN-distribution-type. Deeply understanding the level of selectivity of queries will also enable to apply acceleration technologies.


Before we get started, don’t forget to join the Analyzer Slack workspace


Schedule A Demo

Step 1: Clone the repository!

The script runs on your Presto / Trino cluster in your VPC. It will not slow down your cluster and no data will be sent to Varada or any third parties.

Go to github!

The README file is also available for a deep dive.

Step 2: Collect

Fetch JSONs from the Presto coordinator for the analyzed / recorded period.

Step 3: Extract & Analyze

The script generates a detailed customized report based on collected JSONs. 

See the Workload Analyzer in action!
Check out this New call-to-action – the report is based on real customer data that has been anonymized.

Here are examples of insights the Analyzer delivers:

Scheduled time by user

Presto Trino Observability Varada

Wall time usage by operator type

Extend Presto-Based Workflows to Support Low Latency Analytics

Varada offers a unique query acceleration platform that delivers a significant performance uplift and cost optimization for latency-sensitive applications. Indeed, Presto offers unprecedented agility so that data-driven applications can run any query, anytime. But cost / performance is a real challenge which has limited Presto usage in these use cases.

Varada directly connects to existing Trino clusters so you enjoy all its benefits, including the ability to run any SQL-based application out-of-the-box. But we also Varada’s autonomous indexing you can expect 10x-100x faster performance! You can also say goodbye to concurrency issues…


Ready to see Varada in action? Click New call-to-action to schedule a short demo.


Initially published on June 2020.

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept