How to Get Optimal ROI on Your Data

By Roman Vainbrand
March 18, 2021
March 18, 2021

A Cautionary Tale About the So-Called ‘Cost Savings’ of Managed Data Analytics Services

It’s 2021, and every enterprise is now either “data-driven” or being left behind.  The good news is that in 2021 it is not very hard for companies to launch a data analytics infrastructure. The highly competitive Big Data arena offers many options for cloud-based data analytics providers who can quickly deliver data-driven insights and accelerate the speed of business. In theory, by outsourcing data-crunching workloads to managed data analytics service providers (e.g., Snowflake or AWS Redshift), organizations can save themselves the cost of supporting such demanding operations with their internal infrastructure. In other words, outsourcing the data analytics to third parties offers the oh-so-appealing promise of lower DevOps costs.

Buyer Beware the Hidden Costs

In reality, outsourced services are a great option for getting started, but they come with hidden costs that escalate over time, particularly as the number of analytics projects within the organization increases. Here’s why: As you expand the use of data analytics across the organization (which is a good and desirable thing), more and more business units request queries for their own purposes. As your use of managed solutions scales out, the costs scale up accordingly.

So, most organizations put a cap on the spending. That puts the onus on internal teams to manage the organization’s use of the managed analytics provider. For the sake of simplicity, let’s refer to this internal managing entity as the DataOps team.

Schedule A Demo

DataOps as a Cost Center

The DataOps team now has responsibility for managing the overall data analytics budget, prioritizing query requests, and figuring out ways to make the data analytics budget stretch further. The DataOps team faces several quandaries:

  • Optimization takes time. The DataOps team spends time pouring over queries, looking for ways to speed them up and make them more resource efficient. Starting with the slowest queries or most budget-consuming queries, the DataOps team attempts to optimize each query by hand using a well-known bag of tricks for making queries go faster. This manual optimization takes up an enormous amount of staff time. Moreover, this manipulation for efficiency’s sake can force compromises on what insights are delivered; for example, your queries may be run on a subset of data instead of all of it. 
  • Backlogs are costly. As more users work through more data every day, the backload of optimizations grows, creating a vicious cycle. The DataOps team has even less time to do the requisite “house cleaning” on the old optimizations, and “clutter” clogs the system, making it impossible for the data team to assess the impact of actions.
  • Insufficient visibility hobbles the DataOps team. At best, most analytics providers supply query statistics that tell you how much data was scanned or how much time was spent for a given query. What your team really needs is visibility at the workload level. For example, a workload might be the set of production queries that access a specific data set, or all the queries run by the product development team. By grouping queries into the business actions they support, DataOps teams can identify which workloads need priority based on business needs rather than on the needs of an individual user or query.  

Burnout burns up your ROI. The frustration of your data users and the burnout experienced by your DataOps team can stymie your best-made plans to capitalize on Big Data and build a data-driven culture. 

All of these DataOps dilemmas create rising operational expenses, so much so, in fact, that the cost savings of “zero DevOps” is ultimately negated by the rising cost of DataOps. 

Don’t Let It Happen: Focus on What Matters and Automate the Rest

Fortunately, there’s a better way to handle the onslaught of user demand and control costs as your organization transforms into a data-driven business. Your DataOps teams need the right level of visibility and control, with enough automation to handle the basic needs of your entire user base. Seek these features in your data management solution:

Workload-level visibility gives DataOps teams an open view to see how data is being used across the entire organization and better focus DataOps resources on business priorities.
Automation is essential to reducing the overall cost of managing an analytics system. For instance, DataOps teams should be able to tell their query management system which workloads are more important. Based on this information, the query management system should automatically and dynamically create appropriate indexes, refine which queries to cache, and even materialize tables with the right columns sets, including pre-joining dimensions.

If you want to avoid trading DevOps savings for DataOps costs as you transform your organization into a data-driven business, make sure your DataOps team is equipped with a data management solution that offers workload-level visibility, automation, and control over performance and cost.

See how Varada’s big data indexing dramatically accelerates queries vs. AWS Athena:

To see Varada in action on your data set, schedule a short demo!

We use cookies to improve your experience. To learn more, please see our Privacy Policy