Think about that person who comes in before the dentist? They look around, take x-rays, and get everything clean and set up for the dentist.
In some ways a Data Hygienist is similar: come in before the Analyst; before the Accountant, before the Database Architect; come in before the intern who’s going to make calls or do a mass mailing. Seet the stage, clear up any issues that will distract them form doing what you hired them to do.
A Data Hygienist can support the Project Manager who needs to report the real-time truth to her bosses.
One thing all of them loathe is the Hygienist’s specialty: the initial gathering up of all of the data, cleaning and validating. Examples:
- Collect and merge spreadsheets, word docs, and PDF files into a single document
- Develop consistency between sources that are formatted horizontally, vertically and address block style
- Develop consistency across field names
- Deal with sources that are ALLCAPS
- Install alerts to ensure that yesterday’s trustworthy data is trustworthy today
- Flag and fish out duplicates
- Peel out data that are combined in a single cell
IT CAN BE MADNESS!!
BUT IT’S WHAT I SPECIALIZE IN
What are some of the headaches business owners face?
- With data in several different formats and multiple sources, it’s time-consuming to find anything and detect inconsistencies.
- Duplicates result in the expense of mailing 1 person several of the same thing … or NOT mailing anything because the datasets aren’t clean and a business owner don’t want to annoy customers or waste money.
- Inability to see what they need to do. Maybe a report provides an excellent breakdown of overall monthly sales. But the way the data is provided, it’s not easy to see side-by-side monthly comparison by product lines. Thus, it’s not easy to see where to shift resources.
All of these are areas where a Data Hygienist can be a solution. The information is present. A Data Hygienist cleans it up and makes it useful. Oftentimes projects like these are viewed as tedious pains in the you-know-what. And important projects are set aside until the pain is excruciating. Maybe someone is given 2 days to eyeball and COPY-PASTE a dataset into behaving itself. During those 2 days that person isn’t doing their regular job, and one thing a Data Hygienist knows: increased human intervention = increased errors. Not a good idea!
In some cases, I’ve reduced a 2-day manual effort down to a 1-hour formula-driven clean-up.
Think about whatever grimy, unruly datasets you’re dealing with and consider hiring a Data Hygienist to take it, handle it and provide some clarity for your business decisions.
UPDATE (3SEP12): Data Marshall is a more fitting term. Messy data is more like the old west experience, and less like a dentist’s office.