Think about that person who comes in before the dentist? They look around, take x-rays, and get everything clean and set up for the dentist.

In some ways a Data Hygienist is similar: come in before the Analyst; before the Accountant, before the Database Architect; come in before the intern who’s going to make calls or do a mass mailing. Seet the stage, clear up any issues that will distract them form doing what you hired them to do.

A Data Hygienist can support the Project Manager who needs to report the real-time truth to her bosses.

One thing all of them loathe is the Hygienist’s specialty: the initial gathering up of all of the data, cleaning and validating. Examples:

  • Collect and merge spreadsheets, word docs, and PDF files into a single document
  • Develop consistency between sources that are formatted horizontally, vertically and address block style
  • Develop consistency across field names
  • Deal with sources that are ALLCAPS
  • Install alerts to ensure that yesterday’s trustworthy data is trustworthy today
  • Flag and fish out duplicates
  • Peel out data that are combined in a single cell



Data Hygienists Make Life Easier

What are some of the headaches business owners face?

  • With data in several different formats and multiple sources, it’s time-consuming to find anything and detect inconsistencies.
  • Duplicates result in the expense of mailing 1 person several of the same thing … or NOT mailing anything because the datasets aren’t clean and  a business owner don’t want to annoy customers or waste money.
  • Inability to see what they need to do. Maybe a report provides an excellent breakdown of overall monthly sales. But the way the data is provided, it’s not easy to see side-by-side monthly comparison by product lines. Thus, it’s not easy to see where to shift resources.

All of these are areas where a Data Hygienist can be a solution. The information is present. A Data Hygienist cleans it up and makes it useful. Oftentimes projects like these are viewed as tedious pains in the you-know-what. And important projects are set aside until the pain is excruciating. Maybe someone is given 2 days to eyeball and COPY-PASTE a dataset into behaving itself. During those 2 days that person isn’t doing their regular job, and one thing a Data Hygienist knows: increased human intervention = increased errors. Not a good idea!

In some cases, I’ve reduced a 2-day manual effort down to a 1-hour formula-driven clean-up.

Think about whatever grimy, unruly datasets you’re dealing with and consider hiring a Data Hygienist to take it, handle it and provide some clarity for your business decisions.


UPDATE (3SEP12): Data Marshall is a more fitting term. Messy data is more like the old west experience, and less like a dentist’s office.