Over thirty years ago, my business users asked for two years of customer data. IT said “Too big. Too costly.” Twenty years ago, my business users asked for five years of data in the data warehouse. IT said, “Too big. Will you use it?” Today, our business users want to keep some data indefinitely. IT says, “It’s big. Let’s see how we can do it.”
These memories surfaced earlier this week when Twitter announced that they are indexing EVERY single tweet since 2006. The facts dropped in the blog post were staggering: roughly half a trillion tweets, an index 100 times larger than their existing index, the index grows by several billion tweets a week and index-enabled queries return in less than 100ms.
The perception of big has changed over the years. New technologies emerged to manage large volumes of data. Cost effective and speedy database management hardware and software is readily available. Today’s software enables analytical questions to be answered by business users querying large volumes of data free of IT limitations.
Many members of the data architect community question their involvement in this shift to the new big. I am one of those individuals. For most of my career, I worked under the traditional IT project model of “here is the question I need answered; please give me the data I need”. The new big turns the proposition around to “please give me the data and let me think of the questions I can ask”.
Let’s take another look at that Twitter blog post. Five factors were listed as being the most important factors considered in their new indexing. Modularity, scalability, cost effectiveness, a simple interface, and incremental development. Sounds familiar, doesn’t it? These are factors that data architects face every day. They are attributes of any data initiative that needs to be paid attention to in today’s big data world.
Data architects still have a role in the new big. The value of big data starts when data first comes to life. It must be captured accurately, captured as enterprise asset, and managed properly through its lifecycle. No changes here. It’s what I have done and still do. The new big is all about good data management.
I am new to the new big. I have yet to get my feet wet with a big data effort. I can share my thoughts on how data architects can contribute to the success of a big data initiative through time-tested data management practices.
- Keep on doing what you are doing, even better. Yep, it’s that simple. Data needs to be properly managed to provide value. The data architect lays the foundation of quality data with integrity in the data models.
- Share and expose the data freely. Big data can create a big headache. You can ease the pain. The data architect’s data models, metadata, data dictionaries and glossaries give data meaning and integrity. Provide every means possible to share that knowledge to workers in the new big.
- Listen carefully to business and marketplace. Most of us work heads down while the world changes around us. I work with traditional databases and applications, but I keep myself informed of big data solutions that can be applied to business problems.
- Involve yourself in big data discussions and projects. New technology generally surfaces as a prototype or test bed initiative. This is another area of awareness to which data architects need to pay attention. Sell the value of data design and administration to this foray into big data.
- Become a social animal. Data architects need to be an active participant in LinkedIn, Twitter, webinars, and discussion boards. It’s free education and consulting from the experts of the new big. It’s where the conversation about big data is centered and is very active and current.