Please forgive me because I’m blogging while angry. I’ve read a few articles that compare R and Excel and I’m steamed.
But first: why does this matter? Why respond?
Shiny New Thing Syndrome is easy to fuel with a comparison of options. Rather than just get to work with existing tools, it feels wise to go read another article that suggests a better tool, spend money on the new tool, fiddle around, waste time, then read other article … rinse … repeat.
Dishonest comparisons enable such stalling and profligate use of resources. I want to call “shenanigans” and stop the madness.
Another reason for responding. When we’re legitimately looking for the right tool or solution, zealous fans and sales people aren’t so forthcoming with the details of what you’re getting into with their pet. Their role isn’t to help you determine if their pet is good for your lifestyle, needs and level of patience. To be responsible, we’d say “X is better than Y when ________ .”
Finally. So many of these comparisons are just an excuse to bash Excel. They don’t do much to support their own tool of preference, but they try extra hard to get readers to agree that Excel is the Devil. For someone who knows Excel, it’s easy to smell the poorly concealed fish in these blogposts. These authors either:
- Don’t really know Excel
- They’re stretching the truth or
- Painting contrived scenarios
The ultimate reason why I reply to these things, and why I’ve stuck with this blog for 4 years is because I’m passionate about data, and hate the things that I see that are the result of bad data. So, data quality and solid processes are far more important to me than fetishizing an analytical tool.
An Example of Someone Comparing Fetishizing R and Bashing Excel
Isaac Petersen, PhD student in Clinical Psychology wrote this article: Why R Is Better Than Excel For Fantasy Football (and most other) Data Analysis. Now look. Here’s one thing that has me suspect that he never tried very hard to fairly represent Excel.
Throw us a bone? Ok. Thank you, Your Majesty.
In the article, Petersen lists 15 reasons why R is supposedly better than Excel. Here’s what’s ahead in this blogpost:
- I’m not going to tackle all 15 of Petersen’s claims.
- I list the reason that Peterson offers, then I respond afterward.
- I admit right away that I don’t know R and will not pretend that I do.
- My experience with R is seeing other people use it, and it looks like code.
- I’m not going to bash R.
- I have no interest in proving that Excel is better.
Ultimately, I think these comparisons are silly when they aren’t in any kind of context. So, let’s get on with this.
RESPONSES TO ISAAC PETERSEN’S LOVE LETTER TO R
Reason #1: Data Manipulation
RESPONSE: Assuming that “data manipulation” is what Petersen calls data cleansing and data shaping, I can only say that Excel works. I’ve been cleaning and shaping data for about 15 years. When someone has manually tried to manipulate some data and it’s taken them 2 days to get through one quarter of the job, and I can use Excel to turn the entire job around in 45 minutes, what can we conclude? Excel works.
With the addition of Power Query … WOW! Excel can do some amazing data cleansing and shaping. That’s in addition to having functions like: LEFT, RIGHT, MID, SUBSTITUTE, INDEX, OFFSET, INDIRECT, MATCH, ISNUMBER.
Reason #2: Easier Automation
RESPONSE: Automation is a process. It’s not a tool feature. A person has to set things up in the right way if they are going to repeat a task. Excel can be set up for re-use in a process, and I imagine that R could be set up in such a way that it’s full of spaghetti code and barely good enough for just one use.
Petersen also says that R’s scripting language is better than Excel’s GUI. Well … ok. He’s not including VBA. So, he’s comparing R to native Excel which isn’t entirely fair. But let’s go with that.
Some people do prefer coding. They’re ok looking through lines of code, troubleshooting and finding the comma that should have been a semi-colon. For them–the masochistic–sure. But masochism is not appealing to me. That takes a special breed of human being that doesn’t include most of us. Writing code is great when code needs to be written, but not when there’s a perfectly fine GUI that can do most of our dirty work.
Reason #6: Larger datasets
RESPONSE: Power Pivot.
See Rob Collie’s excellent blog PowerPivotPro. It covers Power Pivot A-Z, how it’s used for enterprise level solutions, and how Excel with Power Pivot can work in harmony with other analytical tools. We really don’t need all this feuding, sniping and rabble-rousing.
Reason #10: Free
RESPONSE: Big deal. $99 for stand-alone Excel or roughly $10/month for Office 365.
$99 to get a GUI and skip the coding sounds like a bargain. In Petersen’s blogpost never does he say that something is impossible to do in Excel. That’s actually one helluva case in favor of spending the $99 and avoid the granular level of attention that’s required in a scripting language.
Reason #11: Open Source
RESPONSE: I plan a future blogpost about this so I’ll be brief. There’s a lot of good about the world of open source. The downside is that open source can be the wild west with all these regular folks writing and sharing code. Some of these regular folks write bad code that only other developers can see as being bad or bloated.
With something proprietary like Excel, there’s good.
- Functions and features like SUMIFS and conditional formatting work a certain predictable way, as dictated by the central body.
- Centralized control over how these things work. We don’t have 50 different quasi-pivot table scripts, and confusion over which one to use.
Is open source better than proprietary? It’d be asinine to answer that question on its own, without context.
Reason #15: Anyone (Including You) Can Contribute Packages to the Community to Improve its Functionality
RESPONSE: Yeah, right. Who is included in you? Clearly, Petersen’s article was meant for developers.
CAN R DO ANY WRONG?
Peterson’s blogpost closes with a section: When to use Excel
**Insert facepalm**
Petersen offers 4 reasons for choosing Excel over R. His first reason is Data Entry. Ok. So, here we go again with someone minimizing the world’s #1 BI tool as little more than a grocery list keeper. But even worse is the fact that Petersen offers 4 reasons, and ends each one with some form of “yeah but R is still better.”
WHY ARE THESE TYPES OF COMPARISONS SILLY?
Let’s ask: which is better, a motorcycle or a helicopter?
Anyone who answers that question is painting their own context around it. Neither is objectively better. Therefore, the question itself needs to provide the context. Is a free helicopter great for someone with no license, nowhere to put it and can’t afford the maintenance? Is a motorcycle a good deal for someone who was hoping for a treadmill to help them lose 50 pounds (3.57 stone for my British friends)?
The world is not limited to motorcycles and helicopters. The world isn’t limited to Excel and R. The world is definitely full of people who need to get shit done … and that’s the ONLY thing that matters.
I have a friend who’s a journalist and runs a fantasy football league. Should he take Petersen’s advice and learn R? Hell no. He barely wants to use Excel and he’s been successful however he does his analysis. For someone else, learning something new is an excuse to avoid putting their head down and getting to work. For other people, R is like a mouthful of wonderfully spicy Massaman Curry.
And it’s all ok. It’s all truly ok.
Couldn’t agree with you more, Oz. I’m one of the young people that Isaac talks about and my goal for 2014 was learning R and M – I still preffer Excel for more than 90% of my scenarios..
Comparing Excel to R is not something that I’d recommend to do. Is not a question that can have a generic answer. People will most likely be biased either way but I can tell you this, R on its own is an AMAZING tool for data science related scenarios.
I love Excel, don’t get me wrong, and I’m even writting a book about Power Query but overall, if I needed R it would be for something extremely specific like Machine Learning scenarios or some crazy stats that I need. Most of the stuff that you can do in R can be done in Excel with Power BI (the add-ins) but not that many people know that they even exist.
Anyway, I do think that the internet has a lot of people trying to lure other people into thinking the same way that they think 🙂 – so you always have to be careful with what you read since it might not be completely true or it’ll be a point of view from a really weird angle.
Keep up the R & Excel post! you should try to compare the lapply with some DAX functions like SUMX 😉
EscobarMiguel90
EscobarMiguel90 thanks for jumping in. I’d love to see you do a blogpost offering something balanced because you’re familiar with the tools, and sound like you’re discerning in when to use what. I welcome a lesson on that.
You’re right that the internet is itself part of the problem. I don’t think that Isaac and @FFAnalyticsNet were purposefully malicious. I do think that because the internet is so wide open, we should be careful in how we communicate.
1. I’ve been looking for a tool or trying to figure out what coding language would suit my needs best. Research gets difficult when a person has to weed through all the fanboi articles articles and sales pitches.
2. We can be encouraging but also should be responsible. I would rather see someone get their work done with a “good enough” tool than have them distracted with the false possibility that their existing tools are insufficient.
When is your M book coming out? I’m getting more into Power Query and M is a language I want to know. Also, you should write a blogpost that addresses DAX. You’re more qualified than I. 🙂
OzData Ken and I are writting the book and should be coming out sometime during June. http://www.amazon.com/Data-Monkey-Guide-Language-Excel/dp/1615470344/
Excel can be used for pretty much anything nowadays. People are clever and innovative but Microsoft’s new bet to address the data scientists needs is AML (Azure Machine Learning) which is not just Machine Learning but a whole framework for you to create data science related solutions with high availability. You might want to try it out and see it for yourself – R is actually integrated with that solution.
THATS where Microsoft wants us to be when it comes to data science but that doesn’t mean that you can’t do magic with Excel. Excel can be extremely powerful for data mining (to a certain degree).
I’ve been thinking about blogging about R but I’m trying to focus on Power Query until we release the book. I find Power Query to be extremely powerful and, of course, more user-friendly than R studio and perhaps even more robust to a certain extent when it comes to scrapping data.
Here’s a case on why you’d go with Power Query and not R when it comes to shaping data. You have no idea how long it’d take me to reproduce this M code in R http://www.powerpivotpro.com/2015/01/power-query-for-excel-combine-multiple-files-of-different-file-types/
EscobarMiguel90 Ok! Your book is advertised in the back of my book! COOL! It’s on my list of books to grab when it’s released.
Power Query is wonderful. I’ve used it to append documents, and set up a system where it’s pointed to a folder full of timesheets. When new timesheets are added, Excel/PowerQuery just needs to be refreshed to integrate the new data. No need to even open the new documents.
I haven’t done much with Power Pivot. There just doesn’t seem to be a need for it in the work I do. But when there’s 500 rows of complicated data, Power Query and data models have become my buddies.
Thank you for writing this. While I’m a lifetime coder, I appreciate the powers of Excel for serious data analysis because I’ve used Excel for that purpose for years.
The biggest snobs are often the biggest posers. There are a lot of poor coders faking it out there. In fact, there’s an entire hidden sub-industry of people pretending to be comfortable coding. Someone who is methodical, fastidious, and quantitative-minded can work wonders in Excel. Anyone claiming that R should replace Excel probably doesn’t understand either product well.
There can actually be a lot of coding in Excel. Every time you construct an IF-THEN formula in Excel, you’re coding. A lot of the coding in R isn’t any more complicated than an Excel formula. And of course Excel macros in VBA are even more coding.
And as you say, coding isn’t always the most efficient approach to solve a problem. For standard tasks and simple data sets, Excel is likely to be quicker than coding. In the business world, a lot of data comes in simple forms and efficiency usually matters. A lot. Businesses need to react quickly and sometimes a rough, simple answer is a lot more useful than an elaborate, nuanced one.
Excel is wonderful because of its inclusiveness and collaboration. You can do some serious data analysis in a spreadsheet, send it to almost any layperson, he or she can add data or clean up data, and send it back to you. Excel’s near-universal format allows exchanging tricks among users; they can organically learn from one another and improve their Excel skills. Laypersons can verify the work done by a data analyst, demystify it, even holding it accountable, building greater trust and involvement. Excel is a great gateway to bring out the inner nerd in a lot of non-technical people and good organizations thrive on this kind of synergy.
The business world will always need good data analysts that work in Excel or other visual-based tools. Coding specialists are valuable, too. The two groups will rarely be in direct competition with one another. Anyone who tries to pit them against each other is missing the bigger picture and doing everyone a disservice.
Matthew, I’m sorry it took so long to see your response. 🙁
Thanks for the comments. You summarize the situation very nicely.
The business world will always need good data analysts that work in Excel or other visual-based tools. Coding specialists are valuable, too. The two groups will rarely be in direct competition with one another. Anyone who tries to pit them against each other is missing the bigger picture and doing everyone a disservice.
To say that Excel and R are competitors is indeed so wrong. They’re just tools. They’re just tools.
Don’t know how to include a screenshot, but I will refer directly to your Top 5 Excel functions blogpost. You laudate vlookup. Besides that index match is better from any point of view, you state yourself that it would take you five minutes to accomplish the task you state. Well, the exact same tasks takes me in in R just under 30 seconds and this includes reading in the data set (I.e. typing the command necessary to read in these lists, actually reading in these lists and writing the command implementing the query.
While I perfectly understand any excitement over excels capabilities when your previously known alternative was eyeballing and/or pen and paper – you are wrong in any regard when you come the other way around. Things that are perfectly clear in R (and any programming language for that matter ) are only a waste of time in excel. In particular anything that involves data manipulation, i.e. cleansing, tidying and and any other part of preprocessing.
Ofc, you’ll have to code. But applying excel formulas is nothing else than coding, too. If you are perfectly fine getting your job done in excel, good for you, I am quite proficient in both and there really is no point I ever think R would not be superior to excel, except for one thing only: my colleagues know only excel and want to replicate my analyses so I have to implement them excel.
Just another problem with excel: if I call a function called index() I expect it to return a (multi)index, not a value. For the latter task, I’d expect a function called value(). It is this kind of inconsistency that makes excel so inconvenient if you have any mathematical or algorithmic training worth its name. And this what Petersen alludes to.
Here’s the deal.
R and Excel are completely different tools. Some people use both. Some people are just ok using one or the other. The tools have strengths and weaknesses. I just think that saying one is better than the other is nonsense. It’s more helpful to suggest to someone:
“For X purpose, this tool would be better than the other.”
Even then, we’d have to determine how well the person would be able to handle the “better” solution. I have friends who are serious coders and they’ve told me they know Excel has something better for cleansing a set of data, but they were under pressure and had to just continue down their familiar path with their familiar tools.
In that light, is Excel better or worse? I don’t know. Ultimately, we need to get stuff done.
Again, what I disagree with is the comparison of tools and then to prove that one is better than the other we resort to contrived contexts. No. The contexts should come first.
I concur that the project requirements, or more importantly the use case, should dictate the selection of tool[s] used for analysis. Seems like such a simple concept, Im not sure why anyone would say ‘yeah but.’
I’m not sure, either.