We are big fans of both PrestoDB and Trino! You can easily add Varada to get 10x-100x faster data lake queries and eliminate time-consuming dataops.
Originally built by Facebook in 2013, PrestoDB is a distributed query engine built over ANSI SQL that works with many BI tools and is capable of querying petabytes of data. Presto was built to solve for data access at a massive scale on the data lake.
A few years later, Trino (formerly known as PrestoSQL) forked out of PrestoDB and was expanded to accommodate a much broader variety of customers and analytics use cases. Both are user-friendly options with good performance, high interoperability, and a strong community. They allow access to data from different data sources within a single query, can combine data from multiple sources, support many data stores and data formats, and have many connectors including Hive, Phoenix (MR), Postgres, MySQL, Kafka etc.
Presto was designed to be adaptive, flexible, and extensible. It supports a wide variety of use cases with diverse characteristics. Presto is responsible for supporting much of the SQL analytic workload at Facebook, including interactive/BI queries and long-running batch extract-transform-load (ETL) jobs. In addition, Presto powers several end-user facing analytics tools, serves high performance dashboards, provides an SQL interface to multiple internal NoSQL systems, and supports Facebook’s A/B testing infrastructure. In aggregate, Presto processes hundreds of petabytes of data and quadrillions of rows per day at Facebook.
Presto and Trino were widely acknowledged as the fastest growing SQL platform for big data analytics and was adopted and has been proven at scale in a variety of use cases at Airbnb, Comcast, Netflix, Twitter, Uber and many more.
But why do we have multiple versions of Presto, that are nearly identical in terms of functionality and feature set?
Today there are four main established solutions based on the original PrestoDB project. The two main open source projects (Presto & Trino), and the two commercial projects (Starburst Data & Ahana) that offer managed services with simplified implementations, additional support that enterprises need, robust implementations of the full PrestoDB/Trino feature sets, and dedicated expertise beyond the community channels.
2017 was a big year for Presto. AWS launched Amazon Athena, a serverless query service based on PrestoDB that makes it easy to analyze data directly in S3 using standard SQL. Though Athena is based on Presto, there are significant infrastructure differences which means use cases may benefit from one platform versus the other.
In 2017, Starburst was spun out of Teradata to develop and support an enterprise-grade commercial distribution of the open source Presto project — offering improved performance and security while making it easy to deploy, connect and manage a Presto environment. This is also the year Varada was founded to enable data architects to seamlessly accelerate and optimize Presto workloads, using dynamic analysis and adaptive indexing, resulting in optimal control over performance and cost.
In 2019, key members of Facebook’s Presto team, which left the company a year earlier over disagreement regarding the governance of the project, decided to continue development on a community version of Presto, forking the original PrestoDB tree into a PrestoSQL fork, and created the Presto Software Foundation to govern the development of the software and community under meritocracy. Later that year, Starburst backed PrestoSQL and hired the three original creators of Presto as CTOs. Much of the Presto community hitched its wagons to the new foundation and the code in its GitHub repository. Companies like LinkedIn, Lyft, Netflix, GrubHub, Slack, Comcast, FINRA, Condé Nast, Nordstrom and thousands of others use this version of Presto today.
On the other side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is governed by the Presto Foundation, which was launched in September 2019 by Facebook, Uber, Twitter and Alibababa under the auspices of The Linux Foundation. PrestoDB is a proven workhorse and runs in production at Facebook, Uber, Twitter and thousands of other leading companies.
In 2020, Ahana was launched as a commercially managed SaaS distribution of Presto on AWS with the vision to simplify open data lake analytics. Ahana is based on the original PrestoDB fork, the open source project created by Facebook. Ahana also offers professorial technical support for PrestoDB and develops the PrestoDB ecosystem with support from the Presto Foundation.
Finally, in 2021, following trademark enforcement efforts by The Linux Foundation, the PrestoSQL fork was rebranded as Trino, while the PrestoDB fork retained full rights over the Presto trademark.
It’s no secret, that we are huge Presto and Trino fans — all flavors work great.
A while back we had to place a bet on which data lake query engine will win the data lake analytics race. Presto stood out immediately. It’s extremely flexible, and coupled with the vibrant community it was a perfect fit with Varada’s indexing-based acceleration technology.
Varada’s proprietary indexing logic automatically analyzes the data lake and introduces indexes for filtering, joins and aggregates, continuously evaluating query performance on the fly. Varada’s engine automatically prioritizes the data to index or cache based on a smart observability layer that continuously monitors demand. Varada indexes data directly from the data lake across any column. This means that every query is optimized automatically.
The platform is already available for use on AWS and will soon be available on Google Cloud Platform as well as on any other platform via a partnership with Starburst Data.
No matter which version of Presto or Trino you’re using, you can add Varada to your cluster and leverage the advantages of autonomous indexing:
Data teams using Varada don’t need to compromise on performance to achieve agility and optimal resource utilization on the data lake: they can leverage the power of autonomous indexing, caching, intermediate results, and optimized dynamic filtering implementation to accelerate Presto queries by 10x-100x on their existing cluster — at a 40%-60% cost reduction.
Multi-dimensional queries — using hundreds and often thousands of columns — makes partitioning simply ineffective in reducing data reads. Indexing is multi-dimensional by nature and is extremely effective for selective and highly selective queries that are based on filtering many columns.
Varada’s Workload-level observability component enables data teams to seamlessly monitor, optimize and accelerate workloads to meet dynamic business requirements. Data teams can easily set priorities, performance requirements and budget caps.
By autonomously accelerating Presto & Trino queries, Varada brings down dataops to the bare minimum, enabling data teams to focus on business priorities, deliver on analytics demands faster, and maintain control over data lake analytics cost and performance.
Varada is committed to both the Trino and PrestoDB communities and is an active contributor. Varada recently released a new open source tool that enables data platform teams to easily analyze, monitor and optimize Presto clusters, both PrestoDB and Trino. The Presto Analyzer offers workload-level insights on how clusters perform, and enables it to instantly identify heavy users and bottlenecks, and effectively optimize performance and concurrency.