Your Data Warehouse thinking is killing your AI ambitions

Stop trying to build a better data mousetrap

Steve Jones
7 min readJan 29, 2024

One of the single biggest issues with companies transformations towards being a data-driven and AI embedded business is that “data thinking” focuses on building a “better data warehouse”. The focus is on “gold” data that presents some form of “canonical view”.

This is going to doom your company

This mindset assumes that reporting actually matters and that building data for reporting is what data architectures should be about, and because no data warehouse in the history of ever has managed to achieve the goal of having “all the information for the business in a way that everyone can use in a single consistent way” the cycle of building better mousetraps continues. The trouble is, reports might be mice, but AI is a tiger.

A person kneels setting a small mousetrap in a sunlit jungle, unaware of the large tiger looming behind them in the shadows, observing quietly.

There is no “Medal Table” for data

The first thing to rid your mind of is the idea that there is some mythical medal table for data. Let’s be blunt, this stems from the reality that people who build applications didn’t used to care about data, so the data that came out of a system was riddled with issues, primarily because applications didn’t need to do anything “smart” with the data, it just needed to service the transactions.

This image shows a data processing workflow with stages labeled “System,” “RAW,” “Bronze,” “Silver,” and “Gold.” Multiple “System” stages funnel into “RAW,” which then progresses through “Conform” steps to “Bronze,” “Silver,” and finally “Gold.” Each stage is color-coded, with arrows indicating the flow of data

This leads to the idea that the initial data is “dirty” and not of actual use and needs an expensive, and time consuming, process to turn it into something useful. So the “raw” data is viewed as effectively hidden from any usage. This then has an initial curation that turns it into “Bronze” and then more and more curation until it becomes “Gold” data that everyone can rely upon.

Your “Gold” is Pyrite

This mindset assumes that you don’t have operational control of your digital reality. It makes you comfortable having that lack of control and punts the control into the mythical data quality pipeline.

If you don’t have operational control of your digital reality then your AI visions are doomed

That “Gold” view that people sit around in meetings and agree on, where everyone agrees on “quality” and clean-up rules, and then which is translated into transformation pipelines which deliver a view that your meeting thought was correct.

You then find that the business downloads that data into Excel, or a local reporting solution, adds in other data, often “unclean” operational data and then makes business decisions.

Blame Finance, but they’ve got a reason

One of the primary drivers of this mentality is the finance team, because they report to the market and have scheduled audits, then they have specific points in time where the data has to be “right”, it doesn’t need to be right all the time, it just needs to be right at the right time. The finance team takes all of this, crappy data, cleans it up, restates it, applies market rules to it, and then publishes it literally in a book which is the corporate results.

So this pipeline mentality makes sense when you publish the gold standard into a book because it isn’t a moving “thing” it isn’t operational, its a high-level view on the business, created at specific points, for regulatory purposes.

Sadly, rather than just have this approach for finance, data people embrace the idea that somehow publishing into a book is a gold standard for information, and that therefore everyone should be held to the publishing in a book standard.

You aren’t writing a data book

This banner depicts a solemn scene: people in suits kneeling around a raised dais. On the dais, a book titled “What Happened Last Week” glows mystically. The lighting is soft, focusing on the book, enhancing its revered status. The mood is one of deep reverence, evoking a ritualistic, cult-like gathering.

Data is your digital reality, which means operations is data reality

This Data Warehousing mindset is crushing the simple fact that:

For AI to have true impact, it needs to react based on an accurate view of reality

This reality comes in only two parts:

  1. What is now
  2. What is everything that has ever been

If you can’t create an accurate Decision Context for an AI, then it will not make a great decision. If your training of an AI is based on a sparse and inaccurate view of your history then that AI will be fundamentally flawed. Google’s DeepMind has shown the ability to brute force stochastic challenges like weather. That learning and training is based on accurate history and the ability to forecast better than traditional models is based on accurate digital reality. Google are not pushing weather data through a committee with a view to publishing it in a book, their funneling the current accurate reality to drive a more accurate outcome.

That is the challenge: Reality

Business accountability of reality, or just build for legacy

The mentality of data in an AI world is that:

Data is primarily to be used and consumed by AI

And that most of the impact of AI will be done at operational speed. Or to go back to our medal table of data…

So all of that pretty architecture designed to create the book of data, isn’t actually useful for AI. You need to change your architecture to be for an AI driven world, not a process-driven legacy where data is just for reporting.

You need to look at governance that builds your digital reality, not governance aimed at writing a data book. You need to think about control not quality.

We are not living in the world of reporting and data books, we are living in a world where AI drives, where data is your digital reality, this is the data inversion, where data leads, not follows.

It means changing the way we architect systems to make them data driven and for that data to be architected in a way that best enables AI to be engaged in driving decisions.

Data Participation not participation trophies

The reality is that traditional data warehousing thinking and its idea of some mythical “perfect” post transactional data set only exists because historically data hasn’t been important, except for finance regulations. The important parts of an IT estate were the transactional systems, and the most important parts of data are in business commissioned solutions, a lot of the time meaning Excel or ad-hoc business reports. The “Gold” data, outside of closing the books, is seen simply as another source to factor into those business views.

The future is different, AI is critical to that future, AI that works at operational speed and can be engaged in operational decisions. This is the participation of data in business, the day-to-day, minute-to-minute running of the business, not simply reporting after the fact what went on.

At it’s heart is a simple statement:

Business success relies on the ability of AI to rely on your digital reality

Your approach to data therefore is not about reporting, it is about that digital reality, and ensuring that the business is accountable. More and more business line effectiveness will be driven from AI, more and more business leaders will need to understand and control AI within their business areas. If your data architecture isn’t addressing this problem then all you are doing is dumping technical debt into your data swamp.

Reporting on a digital reality is easy

Now there is a positive point in this switch for those legacy thinkers, and its a something they’ve always complained about, even if their architectures shy away from addressing it. If you have an accurate digital reality, then reporting on it is simple, your “raw” data is operationally accurate, and that operational control requires them to also produce the collaborative data products needed to drive AI collaboration across, and beyond the business, all of this means that reporting is based on an accurate view of reality. For almost all of the business this is sufficient and simple. Over in finance they’ll probably still need to manipulate some of the information before publishing their books: tax rules, hierarchies, licensing etc might need some tweaking, but for sales, supply chain, manufacturing, HR, etc, etc, being able to report on reality is all they’ve ever wanted.

This banner shows a modern boardroom scene from a near-future. Half the participants are people in suits, and half are realistic tigers, sitting around a large table. They interact amicably in a stylish room with large windows revealing a futuristic cityscape. One wall features digital displays showing charts and numbers, adding to the room’s high-tech ambiance. The atmosphere is lively and cooperative, blending business with a whimsical touch.

If you are interested in what I think you should do instead, then the Data Mullet is a good place to start

--

--

My job is to make exciting technology dull, because dull means it works. All opinions my own.