Tips to Optimize Presto & Trino with Deep Observability

Guy Mast
By Guy Mast
I
June 15, 2021
June 15, 2021

Varada recently released a new open source tool that enables data platform teams to easily analyze, monitor and optimize presto clusters, both PrestoDB and Trino. The Presto Analyzer offers workload-level insights on how clusters perform, and enables to instantly identify heavy users and bottlenecks, and effectively optimize performance and concurrency.


Clone the Presto Analyzer from Github


Varada’s Presto Analyzer continuously collects and stores QueryInfo JSONs, in the background without impacting query performance, and summarizes key query metrics to a summary.jsonl file.

The tool also generates a detailed analysis report that includes:

  • Query details – query peak memory, input data read by query, and joins distribution.
  • Table activities – wall time utilization, and input bytes read, by table scans.
  • Analysis of Presto operators- wall time usage, and input bytes read, by operator.

>> Check out this Presto Analyzer sample report. The report is based on real customer data that has been anonymized.

You can use the Presto Analyzer on the these versions:

  • Trino (previously known as PrestoSQL) – 351 and older
  • Presto DB – 0.245.1 and older
  • Starburst Data Enterprise – 323e and older
  • Dataproc – 1.5.x and older

Schedule A Demo

Optimize Presto Queries with Observability

The Presto Analyzer report offers dozens of charts that will help you truly understand how you cluster is performing and offer actionable insights on how to optimize resources, performance and concurrency. 

Here are some examples of useful charts to start with:

Scheduled Time by Hour

Presto Optimizations
  • Identify Presto cluster scheduled time by the hour
  • Analyze the cluster’s resources usage on an hourly and weekly basis
  • Effectively create cluster scaling rules

Scheduled Time by User

Presto Optimizations
  • Identify fraction of scheduled time by user
  • Get visibility into Presto users’ resource consumption patterns
  • Identify Presto heavy spenders

Wall Time Utilization by Table Scan

Presto Optimizations
  • Identify the fraction of wall time used to scan the top tables
  • Identify tables that consume the most resources in your Presto cluster
  • Identify key tables that may benefit the most from partitioning, based on highlighted query patterns, and reduce the amount of data scanned

Tip #1: Identify Resource-Heavy Presto Operators

The Presto Analyzer instantly identifies which Presto operations are the most resource-intensive and expensive. You can also identify which Presto query types consume most of the cluster resources (create table, insert, select, etc.).

In most cases ScanFilterAndProject takes most of the resources. With better partitioning strategy or with indexing you can optimize queries and reduce the amount of data scanned, which of course will lead to much lower resource consumption and will significantly improve query performance.

Tip #2: Identify Presto Query Patterns and Outliers

It’s critical to identify patterns in queries executions and spot problematic queries, identify fraction of queries that are memory, i/o intensive, and fraction of queries that are not interactive.

The Presto Analyzer enables copying query id to investigate further the root cause and check whether filters are not optimally pushed down or if joins are executed effectively. In addition you can implement big data partitioning or indexing to reduce the amount of data scanned.

Tip #3: Optimize Presto Joins

Presto joins operations often require specific attention and optimizations. Varada’s Presto Analyzer enables to easily identify if there are issues with join ordering and when it is effective to use broadcast and partitioned joins. 

You can update statistics by running the ANALYZE command and validate using the show stats command. You can also ensure Presto cost-based optimizer (CBO) is performing effectively and ordering tables correctly.


See how you can accelerate Presto queries by 10x-100x with autonomous indexing.

We use cookies to improve your experience. To learn more, please see our Privacy Policy
Accept