Please forgive me because I’m blogging while angry. I’ve read a few articles that compare R and Excel and I’m steamed.
But first: why does this matter? Why respond?
Shiny New Thing Syndrome is easy to fuel with a comparison of options. Rather than just get to work with existing tools, it feels wise to go read another article that suggests a better tool, spend money on the new tool, fiddle around, waste time, then read other article … rinse … repeat.
Dishonest comparisons enable such stalling and profligate use of resources. I want to call “shenanigans” and stop the madness.
Another reason for responding. When we’re legitimately looking for the right tool or solution, zealous fans and sales people aren’t so forthcoming with the details of what you’re getting into with their pet. Their role isn’t to help you determine if their pet is good for your lifestyle, needs and level of patience. To be responsible, we’d say “X is better than Y when ________ .”
Finally. So many of these comparisons are just an excuse to bash Excel. They don’t do much to support their own tool of preference, but they try extra hard to get readers to agree that Excel is the Devil. For someone who knows Excel, it’s easy to smell the poorly concealed fish in these blogposts. These authors either:
- Don’t really know Excel
- They’re stretching the truth or
- Painting contrived scenarios
The ultimate reason why I reply to these things, and why I’ve stuck with this blog for 4 years is because I’m passionate about data, and hate the things that I see that are the result of bad data. So, data quality and solid processes are far more important to me than fetishizing an analytical tool.
An Example of Someone
Comparing Fetishizing R and Bashing Excel
Isaac Petersen, PhD student in Clinical Psychology wrote this article: Why R Is Better Than Excel For Fantasy Football (and most other) Data Analysis. Now look. Here’s one thing that has me suspect that he never tried very hard to fairly represent Excel.
Throw us a bone? Ok. Thank you, Your Majesty.
In the article, Petersen lists 15 reasons why R is supposedly better than Excel. Here’s what’s ahead in this blogpost:
- I’m not going to tackle all 15 of Petersen’s claims.
- I list the reason that Peterson offers, then I respond afterward.
- I admit right away that I don’t know R and will not pretend that I do.
- My experience with R is seeing other people use it, and it looks like code.
- I’m not going to bash R.
- I have no interest in proving that Excel is better.
Ultimately, I think these comparisons are silly when they aren’t in any kind of context. So, let’s get on with this.
RESPONSES TO ISAAC PETERSEN’S LOVE LETTER TO R
Reason #1: Data Manipulation
RESPONSE: Assuming that “data manipulation” is what Petersen calls data cleansing and data shaping, I can only say that Excel works. I’ve been cleaning and shaping data for about 15 years. When someone has manually tried to manipulate some data and it’s taken them 2 days to get through one quarter of the job, and I can use Excel to turn the entire job around in 45 minutes, what can we conclude? Excel works.
With the addition of Power Query … WOW! Excel can do some amazing data cleansing and shaping. That’s in addition to having functions like: LEFT, RIGHT, MID, SUBSTITUTE, INDEX, OFFSET, INDIRECT, MATCH, ISNUMBER.
Reason #2: Easier Automation
RESPONSE: Automation is a process. It’s not a tool feature. A person has to set things up in the right way if they are going to repeat a task. Excel can be set up for re-use in a process, and I imagine that R could be set up in such a way that it’s full of spaghetti code and barely good enough for just one use.
Petersen also says that R’s scripting language is better than Excel’s GUI. Well … ok. He’s not including VBA. So, he’s comparing R to native Excel which isn’t entirely fair. But let’s go with that.
Some people do prefer coding. They’re ok looking through lines of code, troubleshooting and finding the comma that should have been a semi-colon. For them–the masochistic–sure. But masochism is not appealing to me. That takes a special breed of human being that doesn’t include most of us. Writing code is great when code needs to be written, but not when there’s a perfectly fine GUI that can do most of our dirty work.
Reason #6: Larger datasets
RESPONSE: Power Pivot.
See Rob Collie’s excellent blog PowerPivotPro. It covers Power Pivot A-Z, how it’s used for enterprise level solutions, and how Excel with Power Pivot can work in harmony with other analytical tools. We really don’t need all this feuding, sniping and rabble-rousing.
Reason #10: Free
RESPONSE: Big deal. $99 for stand-alone Excel or roughly $10/month for Office 365.
$99 to get a GUI and skip the coding sounds like a bargain. In Petersen’s blogpost never does he say that something is impossible to do in Excel. That’s actually one helluva case in favor of spending the $99 and avoid the granular level of attention that’s required in a scripting language.
Reason #11: Open Source
RESPONSE: I plan a future blogpost about this so I’ll be brief. There’s a lot of good about the world of open source. The downside is that open source can be the wild west with all these regular folks writing and sharing code. Some of these regular folks write bad code that only other developers can see as being bad or bloated.
With something proprietary like Excel, there’s good.
- Functions and features like SUMIFS and conditional formatting work a certain predictable way, as dictated by the central body.
- Centralized control over how these things work. We don’t have 50 different quasi-pivot table scripts, and confusion over which one to use.
Is open source better than proprietary? It’d be asinine to answer that question on its own, without context.
Reason #15: Anyone (Including You) Can Contribute Packages to the Community to Improve its Functionality
RESPONSE: Yeah, right. Who is included in you? Clearly, Petersen’s article was meant for developers.
CAN R DO ANY WRONG?
Peterson’s blogpost closes with a section: When to use Excel
Petersen offers 4 reasons for choosing Excel over R. His first reason is Data Entry. Ok. So, here we go again with someone minimizing the world’s #1 BI tool as little more than a grocery list keeper. But even worse is the fact that Petersen offers 4 reasons, and ends each one with some form of “yeah but R is still better.”
WHY ARE THESE TYPES OF COMPARISONS SILLY?
Let’s ask: which is better, a motorcycle or a helicopter?
Anyone who answers that question is painting their own context around it. Neither is objectively better. Therefore, the question itself needs to provide the context. Is a free helicopter great for someone with no license, nowhere to put it and can’t afford the maintenance? Is a motorcycle a good deal for someone who was hoping for a treadmill to help them lose 50 pounds (3.57 stone for my British friends)?
The world is not limited to motorcycles and helicopters. The world isn’t limited to Excel and R. The world is definitely full of people who need to get shit done … and that’s the ONLY thing that matters.
I have a friend who’s a journalist and runs a fantasy football league. Should he take Petersen’s advice and learn R? Hell no. He barely wants to use Excel and he’s been successful however he does his analysis. For someone else, learning something new is an excuse to avoid putting their head down and getting to work. For other people, R is like a mouthful of wonderfully spicy Massaman Curry.
And it’s all ok. It’s all truly ok.