Common data architect conundrums

I have lived with a myriad of database conventions over the years. Some of my employers’ standards have been dictatorial and unyielding in the enforcement of datatype and naming conventions. Others have been freer and more tolerant of variations. One thing that is consistent is that they have all had standards and strived to have consistent database objects.

Where do I stand on the day-to-day use and enforcement of standards? I believe they must exist. I am a firm believer that consistency across the enterprise makes understanding data easier and breaks down wall of confusion over a data element’s purpose and meaning. I am an enforcer although an enforcer with a heart.

In the data architect world, it is hard to make absolute and unyielding rules and standards for most of our work. The world is complex, and the database that captures that world must be just as complex. That translates into being flexible. There must be some discretion allowed to grant a pardon from the standards when the need arises.

Here are five of my favorite datatype and naming conundrums. These are the common challenges I face in explaining a data element’s characteristics. Interesting enough, these five problems have remained consistent over time regardless of my employer.

  1. Just call it city, state and country.
    Geographic entities are common in database designs. Developers often want to use city and state without a class word (code, number, abbreviation, and name). This happens to be a standard where I stand my ground. These columns appear across the enterprise. State name, state code and state abbreviation accurately reflect the data they contain. Developers need to understand the world beyond their schema and the integration with internal and extremal data.
  2. If it’s a number, it must be numeric.
    What exactly does a social security number, account number or customer number look like? These identifiers are increasingly complex containing alphanumerics and symbols. Many shops have a naming standard that an element suffixed by number must be numeric. If the business knows the element as account number or part number, that is what I name it even if it is not purely numeric. It would only make life more confusing to suffix it with another class word. I always explain the syntax of the attribute in the definition.
  3. Is it text or description?
    This question frequently appears in my work. A description describes the subject element. Text is a block of freeform text. What complicates this discussion is a standard that a description is smaller and limited in size while text is large or very large. If a long description is warranted, the initial response is to call it text. I use description when it describes an object regardless how long the description is. The maximum data element size should not drive its naming.
  4. What is a code?
    In the early days of database design, a code was typically a single byte that was the identifier for a reference data table row containing the coded element’s name and description. Space was costly and at a premium. Technology advanced and disk space became affordable and large amounts of data became easier to manage. The length of the code element grew with this advance in technology. Today, coded data elements should be evaluated its contents. My rule of thumb is to not call it a code if it contains words, names or abbreviations. I also do not advocate a standard where a code must be a set size.
  5. The world of dates and times
    Dates and times are workhorses in the database. There is hardly a SQL statement that is not qualified by date and/or time. Misuse of date and timestamp appears more often than I would like in business requirements. I have been called out when a query is not returning the expected results only to find out that the “where” clause was using timestamps rather than dates. My current team has spent considerable effort educating folks on when they need a date or time as opposed to a timestamp. Those writing business requirements must know the right questions to ask when it comes to selecting the correct date and time datatype.

Tom Bilcze

    • Joe, Interestingly enough, I had that one in this post initially. It gets rehashed every now and then. I tend to lean on saying they are the same. A team member here has a good take against that though. She believes an indicator is a column defined boolean. If a column is y/n, she says that makes it a flag. The idea is that beyond boolean give it some more intelligence and makes it a flag. I can sort of see that but still prefer to use indicator. Thanks for commenting.

Leave a Reply

Your email address will not be published. Required fields are marked *