Earlier this week, a tweet caught my attention with the topic of “dark data”. I have to confess I have never seen that term before and was uncertain of what it meant. I followed the link to an article by Chloe Green on Information Age. She was reporting on the AIIM Roadshow in London where analysts were delving into the topic of dark data.
Alan Pelz-Sharpe, research director at Group 451, put the term into context for me. He describes dark data as the large quantities of information that an organization accumulates over years but that ends up being redundant or unusable, largely because there is too much of it, it is being produced too quickly, and it lacks structure.
There is no doubt that there is an explosion of data in our databases. The costs of servers, storage and technologies enable this easily. We live in an age where completive advantage and profitability is driven by the amount of information we can horde on our customers, prospects, products, income, expenses and so forth. Organizations proudly proclaim the terabytes of data they have accumulated in their data asset stores.
Analysts at the roadshow made a very astute observation that people do not want to own up to the issue of dark data. As data professionals, we own a good piece of the solution. I came up with four good practices we should remember in our daily chores as data architects, modelers and DBAs.
- Know the value of the data you manage.We should not blindly stuff data into our database because it is easy to do and cheap. Our role is to design and manage the corporate data asset. We have the responsibility of engaging our business partners in a dialog about the value of the data and the benefit our enterprise will obtain from storing that data. Never forget you are a gatekeeper of data for your enterprise.
- Knowing the use of the data you manage.A key part of designing and managing data is stepping outside our technical realm to get some understanding of the tools and techniques employed to analyze the data we manage. This business centric knowledge enables efficient design of data structures. It also gives us insight into when it is time to say good-bye to data.
- Recognizing, minimizing and eliminating redundancy.We live in a world where redundant data is present. We surely have inherited a legacy of stove-piped applications. Mergers and acquisitions do not necessarily result in merged data. ERPs and packaged applications often do not integrate well in our world. Data architects need recognize, own and manage redundancy. We are the voice that can help curb and eliminate the proliferation of redundant data. The problems associated with poorly managed data only gets worse as the volume of this data increases over time.
- Managing the lifecycle of data.Every piece of data has a lifecycle. Data professionals must identify this lifecycle at the onset of the data design process. Data and the demands for data are growing exponentially. We cannot sit passively on the sidelines and allow this growth to occur unmanaged. As a data architect, data modeler or DBA, we need to understand data from its birth to death in the enterprise.
Data professionals cannot solve the issue of dark data alone. It is a complex problem that touches many departments within the enterprise. We can own the issues that surround dark data that are under our control. Now, more than any time in the past, we need to embrace sound data management practices. Data is one of the most valuable assets of an enterprise. It is our responsibly to be custodians and not hoarders of this asset.
Modeling Global User Community President