Today’s blog post definitely shows my rural Midwest upbringing. Whether it is in the barn or an office, I have heard the expression, “You know what you get when you put lipstick on a pig? A pig with lipstick and nothing more.” The message it delivers is so applicable to a variety of situations.
As humorous as the quote is, it proves to be applicable to the data world. It very much applies to the IT community as a whole. A favorite trick in our technical world is to take a less than glamorous piece of data or application and wrap it in a shiny new envelop. The logic is if it looks pretty, people will love it, buy it, and believe it.
As a data modeler, I have been prodded over the years in countless data design sessions to just take the flat file and turn it into a table. Over the years, flat files were replaced with MS Access databases and MS Excel spreadsheets. The thought process is if it works today, why mess with it? Many developers favor the pre-relational model flat file design in a relational database.
Over forty years have passed since the concept of relational databases was introduced. DB2, Oracle and SQL Server have become the standard for the storage and management of corporate data. It is possible to shoehorn these flat file designs into these RDMS. Add to this the fact that increased processing power and cheaper storage allows designs that mimic flat file structures to function acceptably.
Massively parallel computing appliances and databases are opening a whole new world for this legacy flat file structures. It is possible to take a vast collection of legacy files and load them into a database environment such as Teradata or Netezza. Corporations with 30, 40 or more years of legacy data can now analyze the data forgoing costly database design and ETL processes.
The data designer in me is always questioning how wise any major shift in design and investment in a data technology is. It is part of all data professionals’ jobs to be a steward for the enterprise’s data and a watchdog to assure its integrity, stability, reusability, and other data principles. Here are my primary concerns with the proliferation and use of these database deployed legacy file structures.
- Data is not what it appears to be. Many legacy applications are 30+ years old. Over that period of time, many programs were written containing complex business rules built on complex business rules. Fields often contain multiple sets of domains. You can’t just take the field at face value and you must invest considerable time to replicate the legacy business rules to make it usable outside of the legacy application.
- Understanding the real cost and accessibility. Defining and populating the database is simple. Getting data out of the database may not be that simple. Shifting the focus from a well-designed database to a largely unstructured database may limit its usability. Data retrieval and usage can be costly. Not all tool and applications can easily access the data in this state.
- Garbage in, garbage out. Taking the data at face value is not the right thing to do. There is a reason that sound data analysis is the foundation of large data warehouses and enterprise applications. It is a good thing to validate the business usage and applicability of data before deploying it to a large audience. Unlocking this 40 year old legacy data may not be the right thing to do.
I do see a lot of value in using this legacy data. In this age of big data, this is our opportunity open the vaults of data largely inaccessible for years. I strongly believe that this must be done with thought and planning. Sound data principles apply to this effort just as with any data design initiative. Just remember that a pig with lipstick is only that. You need to teach that pig new tricks and make it live by the rules. You will then have a pig wearing lipstick that performs and makes you money.
Modeling Global User Community President