Why data governance is an operational problem

Steve Jones
3 min readOct 12, 2021

Today most of your companies operational reporting is done based on a schema that looks almost identical to the one within the operational system. It might be an extract into Excel from that system, but its a system derived schema.

Over in the Data world we’ve convinced ourselves this is a bad thing. “Oh no” we wail “that means it might not be using the corporate naming standards”, or “Will nobody think of the Master Data?”. We bemoan that the schema from that source system isn’t useful for another part of the business and so we go all Frozen and sing:

“Do you wanna build a pipeline?”

And the part of the business that works with the operational system acts like Elsa and ignores us. Going on to freeze out IT by developing its own, basically unmanaged, operational data store. And here is the thing, they tend to like the ODS even if its a total mess because it gives them fast data that they can make decisions on. There is also a dirty little secret about an ODS that IT never seems to learn:

If the data matches the source system… any problems are the system’s fault

You rarely get data quality complaints to IT about the ODS, because everyone accepts that the issue is upstream.

Rather than learning this lesson though, what we’ve done in IT is make the issue our problem, because we claim that the magic pipeline will fix quality issues.

Narrator: It won’t

I’ve said before that Operational Reporting needs to be the foundation of governance because this is where the business has both the most acceptance of issues and gains the most benefits from fixing them. It’s what I mean when I say that you need to look at culture over technology when addressing governance.

I used to say one of the great things about a data lake was that you can apply the quality after landing so if the quality rules, or source system, changes you can regenerate from that landing area. I don’t think this was wrong, but I think that the rise of CDC and real-time replication into that landing area has altered what this means.

The future is that real-time operational reporting will be the foundation of governance. We used to laugh at the idea of ‘fixing at source’ because in our pipeline driven world it was just too disconnected from the reality of the business for us to have that sort of pressure to get things fixed. In the new data driven world it becomes the business who are faced with the challenge:

Fix it at source, or add it into a DQ pipeline before you share with the rest of the organization

And given that most data issues step from operational process challenges, it becomes pretty clear which is actually the easiest solution within the business.

When I talk about modelling data from a business perspective this is part of what I mean. The business wants that fast OR view and at that level speed is critical and accuracy is how accurately it matches the source system.

Embracing Real-time Operational Reporting is the first stage of providing a data driven foundation.

--

--

My job is to make exciting technology dull, because dull means it works. All opinions my own.