Data can be wily and you never know what’s inside a data set until you begin to work with it to scrub it or extract information from it. You might need to do the same procedure from different angles to locate particular oddities. So, forgive yourself for not having fixed everything with a single formula or function. Take a deep breath and keep pushing forward.

As you continue to work with data it just gives you more information about its strengths and weaknesses. This will help you plan your strategy in cleaning the data, and help you develop automatic actions when you see certain issues.

Sample problems that complicate data clean-up:
  • 100 NW 3rd St   vs   100 N.W. 3rd Street (won’t show up as a duplicate).
  • Spaces lurking as blank cells that your Excel formula counts as being populated.
  • Hidden columns, rows and worksheets.
  • Inconsistent formats. (Ex. one-third of the phone number fields are saved in text format.)
  • 1 spreadsheet has the First and Last Names in a single cell, and this is to be merged with a 2nd spreadsheet were the First and Last Names are in separate cells (and maybe a Middle Initial column).
  • Zip codes with the leading zeros clipped off.
  • Typos.
  • Complex formulae that someone else created long ago in a galaxy far far away.
  • Columns aren’t positioned to allow VLOOKUPS.
  • Mixture of 5-digit zipcodes and +4 zipcodes.
  • Leading or trailing spaces that your formula won’t count. Ex:
    • Your formula didn’t consider “Montreal     “ so, it’s just counting “Montreal” and skipping the 50 that have trailing spaces.

You get the picture. The point is that Data Hygiene takes patience and a zen-like trance in order to flow with all of the things that might pop up when you open the lid on filthy data. Go ‘head! Stick your hand in and get to know it. That attitude is going to get you further than a 12-variable nested IF statement.