‘Reprocrastiporting’ — the act of delaying data until nobody cares

Do you have a procrastination strategy for data?

Steve Jones
7 min readApr 25, 2024

One of the common themes that I’ve dealt with over the years is a challenge from people who’ve built data platforms:

Why won’t the business use the platform?

The answer to this is absolutely never “because the business doesn’t want data” and almost always because the barriers to entry for the business are too high. The barrier almost always comes down to the idea that the business should only accessed data once it has been ‘blessed’ as correct and a belief that reporting must be at the pinnacle of data quality.

I’m going to call this Reprocrastiporting, that is a way of procrastinating about data availability through focusing on ‘high-quality’ reporting, and let me explain why it is both exactly why the business isn’t using the platform, but also why they are blaming you for the issues.

The bad data is their fault

First off lets start with a statement of fact:

The data in the business is garbage because the business doesn’t make sure it is accurate on entry in the applications.

So everything that is done after this fact is making up for that challenge.

You ignore their biggest use case

Do your business users download poor quality data from operational systems like the ERP and CRM and then put them into dashboards or Excel sheets? Do they even have a few, or even large numbers of, people who actually have the job of subverting your data platform and instead pulling data from sources?

If you think they don’t, then that really just means you aren’t aware of it, because if the business isn’t using your platform for operational reporting, then they are using something, even if it is just a crappy embedded dashboard in an application.

This means that the first and most common time that they are looking at data within or close to the application is defined as out of scope, because you know its rubbish data.

They hate your central data model(s)

Next up we have the “why don’t the business get involved in data governance?”, well the curt reason is “because you aren’t doing data governance, you’re doing data quality”. If you think data quality is data governance, then you need to sit down and have a think: does your data governance strategy control your data or just try and make it better quality?

Part of this antipathy comes from the ‘central model’ concept, and if you are still trying to define a single canonical form central data model, then please stop, it has never worked before, and it will never work in future. Sure you might have in the past had a finance data warehouse where you thought it worked, for the reports in the warehouse, but that doesn’t mean it worked for data across the business.

The point here is that you set up a barrier to getting data that requires them to engage with a governance model where the central team gets blamed for the delay, which makes it super easy then for the business team to blame-farm their responsibility onto that team. The weeks, months, or even years, it takes to agree on this mythical model, are used as the excuse to do things differently.

And I’m including in this any data-mesh approach that attempts to govern every attribute of every data product “at an enterprise level”.

The world isn’t left to right

Underpinning all of this thinking is the idea that on the left hand side we have sources, and on the right hand side we have the ‘blessed’ data that everyone wants.

And indeed the business will lie directly to your face that they will only accept massively processed high-quality data, before turning round and downloading raw data into a spreadsheet and making strategic decisions for the next year.

The idea of this is that firstly, source systems are sacrosanct and cannot be considered at fault, and secondly that data is a sausage that needs to go through several stages of grinding and refining before it is fit for human consumption.

This gives a wonderful opportunity for procrastination and it tends to happen in two ways.

Firstly we draw a line from left to right for a given set of reports, we then go through the sausage factory for that set of reports, sometimes we might even go so far (and by sometimes I mean: always in the old data warehouse days) as just extracting from the source systems the information we need for those reports. We then build a wonderful thread for those reports, each stage taking time and being carefully curated, but ultimate aiming for a single target.

Then we get asked to create a new set of reports, which are sort of 80% the same but 20% different, and thanks to our Reprocrastiporting we can now view this new set of reports as a totally different thread through, we have to update our first set of transformation and potentially our extracts, and draw yet another sausage factory through.

If someone asks for “just the entire history of customer transactions, but I want to use the Customer from the Customer Master” and make the mistake of asking for that in reports or a dashboard, then we bring all of these things through, conforming them all to within an inch of their life. Then when two weeks later the business stops using that report, well the sausage line is laid down, so we might as well keep operating it. However if a data scientist comes along and asks exactly the same thing then they’ll probably get access to the raw data without a sausage factory.

The second way we do Reprocrastiporting through this architecture is the wonderful word that is beloved of Enterprise Architects: “Hollistic”, this means that when we build our sausage factory, instead of doing so based on a single set of reports, we instead set out to create the most flexible architecture possible that represents all things to all people and can be consumed in any way possible. This is a task of Sisyphus, a never ending series of meetings where ‘one more idea’ spins out yet another stream of work. Platforms that engage in Reprocrastiporting in this way tend to never get fully to live, having a few trial users, but constantly requiring refactoring and changing meaning they are not considered stable.

Governing the tools so nobody wants to engage

Another part of Reprocrastiporting is denying the business the ability to consume the data using the tools, particularly front-end & analytic tools, that they want. So you can’t use that new BI tool, no you can’t load it into that AI model, what do you mean you want to put some of it into a vector database that hasn’t been approved? This is a great way to delay adoption and have pointless battles. The problem with this mentality is that it tends to happen after you’ve done all the hard work, so you’ve done the quality, you’ve spun the processor cycles cleaning it all up, transforming it, aligning it, but then you say “no, you must use it in the way we want too”. Then starts an often pointless battle of feature and function, rather than either just enabling the business to pay for the tool themselves or better yet, automating the provisioning part for the tool you want so the choice for the business is easy v hard work, and if they want to do the hard work… why are you wasting time stopping them?

If data has value then access is everything

So am I saying given the business access to bad data?

Yes, yes I am

Because back to that first fact, it is their fault its bad. Giving the business fast access to bad data and telling them how bad it is is one of the greatest ways to shift blame from the data platform team to where it belongs.

Am I saying that if someone wants to combine the raw transactional data with a clean customer MDM and use lazy conformance to do that.

Again yes

Telling them that the raw data is rubbish but now you’ve linked it to high quality customer information is liable to be a very powerful thing. So why wouldn’t you want to do that?

Does this mean you don’t do quality? Nope, but it means you always focus on surfacing quality rather than enforcing it as a binary. It means on getting people access to data as quickly as possible and then working together to resolve the data issues that exist, whether those can be upstream in the sources or require data patching to bring up to standard.

With data, perfect isn’t just the enemy of good, perfect is actually worse than bad.

Are there risks with people having managed access to data of known quality? Yes there are, but they are known risks, unlike the unmanaged risks you have today.

As AI becomes the primary consumer of data then this approach to visible risk will become essential, and the Reprocrastiporting approach will help to ensure that AI solutions get built based on invisible risk as the business doesn’t care about “golden” reports when looking at an AI that is working directly with the CRM or ERP.

An image showing a mountain of reports leading up to a golden statue at the top, symbolizing the golden reports and reprocrastiporting challenge.

--

--

My job is to make exciting technology dull, because dull means it works. All opinions my own.