The #1, Yet Ignored, Reason that Data Lake Analytics Projects Fail

By Ori Reshef
June 7, 2021
June 7, 2021

The engineering aspects of large data projects are often challenging and complex. But it is the highly charged human reaction to data innovation that will often determine its failure — or success.

In a large, modern organization, in the marketing analytics domain, a massive data lake initiative was under way. The project team already got management  very excited about the potential of this new (and expensive) initiative. The promise was to turn the entire organization into a data-driven one, enabling innovation through data across the organizations’ functions, thus creating a much desired competitive advantage.   

This was the org’s data landscape before the data lake project:

  • A central data warehouse serving both the customer success teams and the end customer application itself
  • The data model serving everyone was a reduced and aggregated version of the original streamed raw data
    • To compare, each raw event sent to the company’s servers (original data) contains 300 data points (dimensions); the data model in the data warehouse contains 70. 
  • Each VP in the management team had their own siloed data pipeline:
    • Marketing asks R&D for periodic ETLs into their BI tools for research and ongoing reporting. Their data comes partially from the data. warehouse and partially from a customized ETL from Salesforce and Taboola.
    • Sales were managing their data in Salesforce alone.
    • Customer success was looking only at the DWH data (aggregated and reduced). 
    • R&D was busy doing data warehouse maintenance (customer facing apps etc.).
    • Data science was reactive to questions raised by marketing and BI on their data.

The project objective was to build a data lake that will hold the org’s raw events data and the additional business generated data (marketing, sales, HR etc.), and that could be queried by all parties according to their role and responsibility.  Sounds like a win-win for all? Well, spoiler alert: after a significant investment of resources the project failed, people left the company, and no central data lake was established.

Schedule A Demo

Data Analytics’ Magical Bias

I have been in and around data since my days on Access and excel circa 1998, and was fortunate to witness the phenomenal growth of data centricity. I’m not referring to the global growth in the volume of data — that phenomena has been amply discussed — but to the societal importance of data and its impact on the data practitioner, dealing day in and day out with the growing human hunger for her labor’s fruit.

In recent years I participated in and led several large scale data lake creation projects, heading the data platform teams. These projects always originated from the executive team’s deep strategic understanding that the organization’s data has the potential to increase our revenue, make us smarter and more efficient, keep us in the current trend, and, with some effort, maybe even bring world peace… 

Some of the projects I was involved in, like the one outlined above, failed miserably. A few succeeded, and indeed gave the organization the boost everyone expected. In both cases I learned, the hard way, that an enterprise data platform or data access project is much more than the set of technological decisions we make — which cloud, what ETL mechanism, file formats, access control etc. These are all engineering problems that have viable solutions. What I did not expect to deal with was the magical and mysterious power of data over all people at ALL levels in the organization. When it comes to the org’s data, everyone becomes acutely fascinated and highly emotional when. 

Within the big picture of developing the project and bringing it to production, these intense emotions — and not the non-trivial engineering project — put enormous personal pressure on the data practionaires, to a point that they weren’t able to execute  the right decisions. Who gets to decide on the architecture? Who will have query privileges? Who holds the domain expertise, and what will become of those who don’t? How will the new data paradigm – and data insights – change the org, it’s goals and workflows, and how will all this affect individual stakeholders? 

These are the questions and undercurrents circulating every data project in the making. This “magical bias” of data shocked me. What shocked me even more was that it persisted not only throughout development, but often prevented adoption even after a successful engineering solution was produced. 

Why does this happen, and how can we avoid this mental state from adversely impacting data platform projects? First, let’s understand what makes the concept of data so magical.

The Data Analytics Fear Factor: Oil vs. Gold

We‘ve all heard the phrase “data is the new oil” too many times. This phrase is credited to Clive Humby, a British mathematician and architect of Tesco’s Clubcard, who said back in 2006: “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so data must be broken down, analyzed, for it to have value.” In the same vein, Piero Scaruffi, cognitive scientist and author of “History of Silicon Valley”, said in 2016: “The difference between oil and data is that the product of oil does not generate more oil (unfortunately), whereas the product of data (self-driving cars, drones, wearables, etc) will generate more data (where do you normally drive, how fast/well you drive, who is with you, etc). So data is even better than oil because it generates more raw treasures.

Follow the metaphor: at the root of it all, the comparison between data and natural treasures creates the automatic association DATA = GOLD. Everyone is mining it and getting rich, just like during an earlier era of a west coast rush. And that association immediately creates the Fear of Missing Out (FOMO). 

The fear of missing out is defined as “a pervasive apprehension that others might be having rewarding experiences from which one is absent.” It’s the fear of missing out on the main event that is happening now and will define the future of the organization, on the coolest project to be involved in, on that thing that will make everyone rich, on that thing that will get you a promotion if you succeed.

Not being involved means that you are out of the fast lane to promotion, you’re not cool,  and your opinion matters less. Or at least, that’s how it is in the mind of the org’s employees. And there’s no wonder: the industry and the popular press are pushing hard the data-to-riches narrative. At every company, resources are given to the data team and data savvy employees (BI, SQL, Infra, Stats…) get the lion’s share of bonuses and promotions. Each small “Data win” is celebrated as the new growth engine of the organization, and data-driven companies’ IPOs are surging, with Snowflake, for example, recently marking the biggest SaaS IPO ever.

Mitigating the Human Reaction

This human reaction to the “new oil” or the “new gold” is at the heart of a successful (or unsuccessful) data lake architecture. Transformation and change are always a locus of apprehension and uncertainty. Most people, at heart, prefer the known to the unknown, stability to innovation — even at the price of the gifts that are the inherent promise of progress. As I demonstrated above, this is doubly true for AI and data projects, that at this point in time hold a mystical grasp on the human imagination.  How much the executive layer is aware of this phenomena, and of its importance, and manages the deployment of a new data project according to this understanding, will be the key to its success.

Schedule a short demo now to see Varada in action on your data set!

About Ori Reshef

Ori Reshef Varada
Ori Reshef, VP Products | Varada

Ori Reshef brings 15+ years of experience and deep expertise in the data space, specializing in speech, text and big data analytics, and creating business solutions from cutting edge technologies. As a product expert, Ori helped shape various solutions that focus on improving customer lifetime value (LTV) by mining three primary types of data:

  • Customer-generated data, to reveal the intent of the customer
  • Brand-generated data, to improve sales and retention
  • Operational data, to increase efficiency

Prior to joining Varada, Ori was the VP Data Products and Head of Data Science for Clicktale (acquired by Contentsquare), led the Analytics & Intelligence product for LivePerson (NASDAQ: LPSN), as well as several other senior product positions. Ori transitioned to executive product roles after serving as a Solution Delivery Manager and Senior Sales Consultant for Nice Systems (NASDAQ:NICE) and several other technology consulting positions.

We use cookies to improve your experience. To learn more, please see our Privacy Policy