Data is flowing into the data lake, users are up and running with analytics, and you picked a managed service so you have zero DevOps costs.
Yet users aren’t happy and your DataOps team keeps complaining about being overworked. What’s going on?
It could be that by going with a managed service for your analytics you’ve accidentally traded off DevOps costs for increased DataOps load. Outsourced services are great to get started but they come with hidden costs. The hardest to identify and easiest to address is the increase in DataOps workloads. More users means more queries but not necessarily proportionally higher budgets.
In order to fill that gap, the DataOps team works tirelessly to optimize queries and restructure data sets. Yet the teams are hamstrung due to insufficient visibility into the analytics system. Managed analytics providers give you a Block Box. You pour in money and queries go faster. In some cases there are basic query statistics that tell you how much data was scanned or time spent in a given operation. When production queries are slow, you throw more budget at the problem. When users complain, you throw DataOps at the problem.
As the DataOps team spends time pouring over queries, looking for where they can pre-join or create smaller data subsets with just the information that users need to query, the queries start changing. With more users working through more data every day, the backload of optimizations only grows. Meanwhile, DataOps isn’t going back and cleaning up the old optimizations, so the system costs that you’ve been managing down slowly start to creep back up until everything comes to a halt and the team needs to do a big reset on the indexes, caches, and materializations. This endless cycle quickly leads to user frustration and DataOps team burnout. There’s a better way to handle the onslaught of user demand. You need to get the right level of visibility and control, with enough automation to handle the basic needs of your entire user base.
You may not have heard your DataOps team complaining about visibility. After all, depending on your analytics vendor they may have access to detailed query statistics. Starting with the slowest queries, on a query by query basis, they can dive in and throw their bag of tricks at optimizing each query by hand. Changing the join order, manually adding indexes, creating materialized tables to speed up common predicates…there’s a well known play book for making queries go faster. The issue isn’t that DataOps can’t make individual queries run faster, it’s that they don’t see how each query factors into a business workload. The right level of visibility is at the workload level. By grouping queries into the business actions they support, DataOps can start to identify patterns among which workloads need priority and which are slow relative to the business needs, not the individual user request.
Workload level visibility gives DataOps a Glass Box through which they look at how the system is performing as a whole and in different parts most relevant to business needs. A workload might be the set of production queries that access a specific data set, or all the queries run by the development team. The DataOps team now has a broader field of view into how analytics are being used across the organization. Instead of tackling queries from slowest to next slowest, DataOps can make decisions about where to focus resources based on business priorities. Rather than constantly chasing optimizations on a query by query basis, with reporting at the workload level, DataOps can successfully stay on top of rapidly changing queries by working on the common business needs.
By surfacing visibility across resources at the query level and providing workload level reporting, DataOps can prioritize which workloads to focus on. With some automation, they can also reduce the overall cost of managing an analytics system. By telling the system which workloads are more important, DataOps can let the system dynamically create appropriate indexes, refine which queries to cache, and even materialize tables with the right columns sets, including pre-joining dimensions. Without good automation, DataOps is still better off with a Glass Box system than a Black Box, but costs inevitably grow with usage.
Varada has recently introduced a Visibility Center, for surfacing critical query information such as data accessed, and CPU and memory used. The Visibility Center gives DataOps the ability to see usage at both the query and workload level. This means that when prioritizing where to optimize, DataOps teams have a full view of the system and can understand where their efforts will make the biggest impact on the business. Varada’s Workload Management gives DataOps the tools to direct the underlying optimization system based on workload priority.
From there, Varada automatically optimizes query performance and resource utilization.
Varada’s built-in indexing, query caching, and materialization extend the basic flexible scaling strategies available in other query engines. Altogether, Varada helps teams avoid the all too common trade off of DevOps savings for DataOps costs.
It’s easy to see how outsourcing analytics to a managed provider saves on DevOps costs and keeps users happy. Even if you’re able to manage the direct spend by tasking DataOps with ongoing optimizations, the hidden costs and burn out, from having DataOps constantly chasing the slowest query, ends up eating into your savings. You have to make sure DataOps teams have the right level of visibility and the tools to know where to optimize, ideally with sophisticated automation. Look for solutions that give you a Glass Box view into your analytics system and workload level reporting and controls. You’ll be able to save both on DevOps and on DataOps.
To see Varada in action, schedule a short demo!