WHY SO MUCH ATTENTION ON VALIDATION?
Embarrassment is what got me thinking about Excel data validation and general data quality. Let me describe 2 situations from years ago when I first started working with data:
#1 My boss regularly had me run reports, and he’d catch things that didn’t add up. I’d run back to my desk, confirm that my formulae and strategy were perfect. It became predictable, however, that there was crap data that I didn’t think to formally uncover and account for; e.g.,
- Transactions not assigned to the accounts they belonged to
- Typos in the account info that caused transactions to be omitted by database queries
Enough of that eventually lead to my own inquiries about how the doors to the database were flung open for the crap data to float right in.
#2 An accountant kept calling to ask how I knew my calculations were right. The phone would ring, I’d answer and she’d say, “Me again, I found another one … ”
Oh Lord! We’d go through my process again (but about another account), confirm the result, and she’d get down to the same concern: lots of right answers but insufficient visible effort to address the weird elements that are always in any dataset—ours and any other dataset across the land.
Fortunately, disaster never unfolded, there was just a lot of embarrassment before I got the point and made habitual effort to integrate data quality and data validation into my processes.
CAN YOU IDENTIFY?
Surely, if you’re an analyst or a data scrubber or a business owner who would like to analyze your data, you can identify with the mystery of “Can I trust this?” And then the next question, “How do datasets get so messy? One answer: poor input controls.
This has been on my mind because I’m preparing to teach an Excel Data Validation & Forms workshop. And to switch things up, here’s a video blogpost discussing why data validation deserves to be treated as its own topic.
FINAL WORD & ADDITIONAL READING
Perfect Data and Other Data Quality Myths is an excellent article that describes how “perfect data” shouldn’t be the focus. Information is too dynamic to expect perfection. A better goal is to have policies around “is the dataset clean enough for our purposes?” Remember: Excel is only the tool. Data Management is the task. Data validation helps avoid disaster.