Data platform teams are consistently in need to monitor their Presto / Trino cluster performance and identify bottlenecks in order to meet data workers demand.
Presto users usually optimize queries via EXPLAIN ANALYZE command or use Presto’s Web UI query plan for getting detailed insights on queries. These queries execution details are collected by Presto and are exposed via the coordinator REST API, in JSON format.
But Presto falls short on ease-of-use and insights are often only intuitive to highly experienced Presto experts. Moreover, Presto lacks the “big picture” view on how workloads perform. It is extremely challenging to derive resolutions from one query to other similar queries or identify workload optimization opportunities.
Most challenges center around increasing data availability, improve the delicate cost / performance balance and speed up time-to-market.
Varada’s Workload Analyzer is a free and available on github. It offers deep actionable insights and unprecedented observability for Presto clusters, including Trino (FKA PrestoSQL), PrestoDB, Starburst Enterprise and Dataproc. Insights include:
For example, you’ll discover where to apply JOIN reordering and where to change JOIN-distribution-type. Deeply understanding the level of selectivity of queries will also enable to apply acceleration technologies.
The script runs on your Presto / Trino cluster in your VPC. It will not slow down your cluster and no data will be sent to Varada or any third parties.
The README file is also available for a deep dive.
Fetch JSONs from the Presto coordinator for the analyzed / recorded period.
The script generates a detailed customized report based on collected JSONs.
See the Workload Analyzer in action!
Check out this – the report is based on real customer data that has been anonymized.
Here are examples of insights the Analyzer delivers:
Scheduled time by user
Wall time usage by operator type
Varada offers a unique query acceleration platform that delivers a significant performance uplift and cost optimization for latency-sensitive applications. Indeed, Presto offers unprecedented agility so that data-driven applications can run any query, anytime. But cost / performance is a real challenge which has limited Presto usage in these use cases.
Varada directly connects to existing Trino clusters so you enjoy all its benefits, including the ability to run any SQL-based application out-of-the-box. But we also Varada’s autonomous indexing you can expect 10x-100x faster performance! You can also say goodbye to concurrency issues…
Ready to see Varada in action? Click to schedule a short demo.
Initially published on June 2020.