Data platform teams are consistently in need to monitor their Presto cluster performance and identify bottlenecks in order to meet data workers demand.
Presto users usually optimize queries via EXPLAIN ANALYZE command or use Presto’s Web UI query plan for getting detailed insights on queries. These queries execution details are collected by Presto and are exposed via the coordinator REST API, in JSON format.
But Presto falls short on ease-of-use and insights are often only intuitive to highly experienced Presto experts. Moreover, Presto lacks the “big picture” view on how workloads perform. It is extremely challenging to derive resolutions from one query to other similar queries or identify workload optimization opportunities.
Most challenges center around increasing data availability, improve the delicate cost / performance balance and speed up time-to-market.
For example, you’ll discover where to apply JOIN reordering and where to change JOIN-distribution-type. Deeply understanding the level of selectivity of queries will also enable to apply acceleration technologies.
Fetch JSONs from the Presto coordinator for the analyzed / recorded period.
The script generates a detailed customized report based on collected JSONs.
Varada offers a unique that delivers a significant performance uplift and cost optimization for latency-sensitive applications. Indeed, Presto offers unprecedented agility so that data-driven applications can run any query, anytime. But cost / performance is a real challenge which has limited Presto usage in these use cases.
Varada embeds Presto so you enjoy all its benefits, including the ability to run any SQL-based application out-of-the-box. But we also add a hot-tier of indexed data that is responsible for the uplift. Varada indexed data on all dimensions so you don’t lose the granularity of data.
Varada operates directly on top of existing sources of truth, just like Presto, and automatically synchronizes changes and indexes new data rapidly. Queries can be served to accommodate operational SLAs less than 60 seconds from data arrival. By eliminating and modeling, Varada enables any query to run while supporting various SLAs on response time, concurrency for users and analytics API calls, and data freshness.