Over the past month an interesting modeling challenge surfaced on a large project at my workplace. Actually, it’s more of a let’s-not-model challenge. Our Business Intelligence team asked that we load a legacy flat file to a UDB database. This request was definitely out of the norm at our shop. Legacy data is regularly modeled, transformed and loaded into our data warehouse. An important step in this process is modeling the data with an enterprise view.
As you can imagine, eyebrows were raised in the Database Management and Data Architecture camps. The battle lines were drawn. I know that I have seen similar maneuvers over my 20+ years of data management experience. Let’s see… Here is what you usually hear, “We need this data quick and have no time to model in the project plan.” or “Only a few people will use this, and it’s only temporary.”
What I discovered over the past few weeks of meetings was that sometimes something that is wrong can indeed be right. People that know me will say that I am not the typical data cop. My approach is more like that of an arbitrator or counselor. I would rather hear out the pro’s and con’s and make my design decision based on the evidence at-hand. I must admit I went into this process with the preconceived notion that “no way in hell would we be doing this” attitude.
So, what was the fate of this piece of legacy data? We did model it as a single entity mirroring the flat file structure. We did apply several of our standards to the design. The data was profiled. Domains were applied when known, class words assigned, and if a column existed in our databases, the modeled column was given the same name. Unknown data was given a “text” class word. Metadata was enhanced through UDPs that allowed us to link the database column to the legacy file, field and datatype. Descriptions prominently reflected that the data was sequestered with limited use within the enterprise.
Perhaps the most important decision made during this process was establishing the data governance of the table. It will exist for a limited time after which it will be deleted. Columns can not be exposed on production reports and displays. Columns can not be used in production computations and formulas. ETL and end user reporting standards reflect the limited, non-production use of the data and treating it as sequestered.
The primary business users of this data perform analytics for the corporation. They made a sound case that they needed to spend a few months analyzing and mining the data in the file to determine its value in an analytical environment. They are using analytical tools and processes; something that would have been nearly impossible with a flat file structure. I think this design effort is a good example of aligning IT with the business. The team came up with a viable solution to facilitate this analysis while maintaining the integrity of the enterprise’s data design.
Looking ahead, the agreed upon approach is to model data elements “discovered” to be of good quality and having analytical value to the enterprise into appropriate entities within the data warehouse. The moral of this story is that sometimes the wrong thing to do becomes the right thing to do. As data professionals, we need to “stick to our guns” to safeguard the soundness of our database designs. At the same time, we need to open ourselves up to exceptional conditions that benefit the enterprise as a whole. For this to be successful, we need to implement data governance that can flex some at the same time it governs with sound principles and standards.