A data lake without any users has the potential to be more of a money pit than a valuable asset. Yet once you give users access to the company data lake, you quickly discover that it can be a financial liability as well. Giving users unrestrained access to analytics is a virtual guarantee that you’ll end up with massive cost overruns.
All too often, the spend visibility and control levers are extremely limited. In particular, analytics solutions that are designed for single users or small teams, fail to provide the enterprise scale controls required to oversee large scale deployments. In order to avoid the unexpected cost overages, pay close attention to three critical capabilities that give you financial control over your analytics solution, especially as your usage expands across the organization: deep visibility into usage and resources, workload level reporting, and workload level controls over resources and budgets.
Growing analytics adoptions is a double edged sword. You’re able to realize organizational benefits from your data lake, which helps extract value from years of investment. Unfortunately, the easiest way to give users access, via an external managed service, is also the least cost effective as adoption grows. When users start blowing their budgets, clamping down ends up stifling innovation, the opposite of what you’ve been trying to achieve. The best solution is to look at bringing a query engine in house, which gives users access to data including the scale and automation that you get from an external provider and the right tools for good financial governance.
The foundation of good financial governance is deep visibility into all aspects of how the organization wide analytics use your data lake resources. This includes the standard query and cost metrics that you get from external providers as well as higher level workload level reporting that is only available using internal tools. Most query engines only give account level or user level reporting, in some cases just showing data scanned, not CPU time or cost. Simple query level reporting doesn’t provide a view into what business related workloads cost to run.
In order to make sense of the deep visibility you get from a modern query engine, you need to report at a workload level. Consider, for example a trivial scenario where business critical customer reports on clickstream data are not being generated using the same analytics system as the research workloads. This gives the analytics teams tremendous flexibility to iterate and deploy changes based on customer demand. Unless the analytics system can differentiate between the queries in development and production workloads, your DataOps team ends up focusing on just the slowest queries. You end up with every workload partially optimized, still overspending in order to speed up all your workloads.
In small groups, the data ops team may be able to work on segmenting production and development workloads and even manually fine tuning resource allocation. As usage grows, it’s easy to see how the data ops team quickly becomes overwhelmed. When your bill doubles and the financial reporting from your analytics provider only provides breakdowns by data store, data accessed, and CPU time, there’s little to work from when trying to optimize costs. Any savings in dev ops achieved by outsourcing to a managed analytics provider, gets eaten up by manually tweaking and tuning the queries for each user and segmenting workloads to manage costs.
The final capability you need for good financial governance is being able to dedicate resources and manage budgets based on workload priorities and the relevant business requirements. Just as important as having workload level visibility into resource utilization is having the tools to allocate resources and budget to queries based on which workload they support. By standardizing on an analytics engine that provides deep visibility into resource utilization, workload level reporting, and controls at the workload level, you can allocate resources and acceleration engines to business critical workloads. This means you get a detailed breakdown of resource usage and cost on a per query basis as well as an aggregation based on which queries make up a given workload. With workload level reporting, you can focus the DataOps team and allocate resources and budgets to the appropriate workloads.
Varada introduces game changing visibility that makes financial governance at scale not just possible, but cost effective. Instead of pouring any cloud related savings into data ops teams to manually turn basic resource utilization into workload level reports, Varada provides both workload level visibility and resource management out of the box. The Varada Visibility Center starts with reporting data based on usage, such as which users are querying what table, down to the column level. Workload Management lets you define workloads based on collections of queries.
For example, all queries from the production user that query clickstream data can be grouped into the Customer Reports workload and all queries by users in the development group can be grouped into the Development workload. With actionable visibility into the performance, resource utilization, and cost of both workloads, you can direct Varada to prioritize the Customer Reports and put a budget cap on the Development workloads.
Varada’s unique index-based acceleration engines leverage this deep workload-level visibility to effectively accelerate high-priority workload to meet their relevant business requirements without breaking budget caps.
Unless you’re going to declare your data lake a failure and go back to the data stone ages, you need to make sure that your analytics solution delivers the deep visibility into how users are consuming resources, a view into not just query performance but whole workload performance that maps costs to business value, and fine grained automation and controls to balance SLAs and user experience within a fixed budget. Outsourced analytics solutions, while quick to get started, don’t offer any of the three critical tools: visibility, workload level reporting, and workload based resource control. As you scale to enterprise wide deployment, make sure you factor in the capabilities to need to manage costs before they overtake your savings.
To see Varada in action, schedule a short demo!