Varada recently released a new open source tool that enables data platform teams to easily analyze, monitor and optimize presto clusters, both PrestoDB and Trino. The Presto Analyzer offers workload-level insights on how clusters perform, and enables to instantly identify heavy users and bottlenecks, and effectively optimize performance and concurrency.
Clone the Presto Analyzer from Github
Varada’s Presto Analyzer continuously collects and stores QueryInfo JSONs, in the background without impacting query performance, and summarizes key query metrics to a summary.jsonl file.
The tool also generates a detailed analysis report that includes:
>> Check out this Presto Analyzer sample report. The report is based on real customer data that has been anonymized.
You can use the Presto Analyzer on the these versions:
The Presto Analyzer report offers dozens of charts that will help you truly understand how you cluster is performing and offer actionable insights on how to optimize resources, performance and concurrency.
Here are some examples of useful charts to start with:
The Presto Analyzer instantly identifies which Presto operations are the most resource-intensive and expensive. You can also identify which Presto query types consume most of the cluster resources (create table, insert, select, etc.).
In most cases ScanFilterAndProject takes most of the resources. With better partitioning strategy or with indexing you can optimize queries and reduce the amount of data scanned, which of course will lead to much lower resource consumption and will significantly improve query performance.
It’s critical to identify patterns in queries executions and spot problematic queries, identify fraction of queries that are memory, i/o intensive, and fraction of queries that are not interactive.
The Presto Analyzer enables copying query id to investigate further the root cause and check whether filters are not optimally pushed down or if joins are executed effectively. In addition you can implement big data partitioning or indexing to reduce the amount of data scanned.
Presto joins operations often require specific attention and optimizations. Varada’s Presto Analyzer enables to easily identify if there are issues with join ordering and when it is effective to use broadcast and partitioned joins.
You can update statistics by running the ANALYZE command and validate using the show stats command. You can also ensure Presto cost-based optimizer (CBO) is performing effectively and ordering tables correctly.
See how you can accelerate Presto queries by 10x-100x with autonomous indexing.