Given the recent debacle in the UK track and trace programme that has widely been blamed on Microsoft Excel we thought it would be interesting to share some opposing viewpoints from within Ebiquity on this ubiquitous software.
Nick Rhodes (Senior Data Scientists) goes head to head with Shane Alexander (Analytics Director).
Nick Rhodes – Senior Data Scientist
Excel is the fossil fuel engine of the analytic world. We all have to use it, but everyone knows that we should be doing better by now. Instead we continue with old habits no matter the consequences.
Let me list a few good reasons to hate Excel.
It is unpredictable and crashes frequently leaving you with near useless automatic backup files. There is a total lack of traceability (excuse the pun) and very little reproducibility. People edit cells like the Wild West – haphazardly and without fear of consequence.
Then there are the size limitations. Yes, this “bug” where the tracing service apparently ran out of rows or columns is unforgivable….but even used properly, the 1 million row limit is insufficient for the task of modern data analysis. Excel is particularly bad when used as a makeshift database.
It is slow and recursive, the whole workbook is constantly being recalculated and people write inefficient formulas. Excel randomly formats cells into dates even if you just want them to be left alone (e.g. the SEPT1 gene). The agencies we work with merge cells and colour code workbooks; effectively making them non-machine readable and creating a chain of inefficiency.
As for VBA…where to begin. VBA is consistently rated as the most hated language by programmers and data scientists. According to StackOverflow 80% of developers who work in VBA want out, compared to only 33% for Python.
Its ancestor Visual Basic has been deservedly deprecated. It is slow, not well vectorised, and overly reliant on loops. Thanks to the record macro feature…. VBA is usually also badly written.
The sooner government, business and academia move to modern tools such as R and Python the better.”
Shane Alexander – Analytics Director
It is very fashionable to hate Microsoft Excel in the data science and analytics these days. I’m old enough to remember when 65k rows was the norm and since then Excel has only gotten better.
It isn’t hard to mount a solid defence of Excel. It is the common man’s data analysis and manipulation tool. The learning curve is shallow and the barriers to entry are low. This is the tool for the analytics professionals who are in the trenches. People on deadlines who just need to get stuff done. In our business, it isn’t a coincidence that the most productive analysts generally have fantastic Excel skills.
Excel is democratic, it is the lingua franca of business analytics. Sure, there are all sorts of advantages to moving to ‘modern data science tools’ for the right application. But sometimes this disempowers clients and colleagues who can no longer engage directly with the numbers, for the simple reason that they can’t read Python. You could say well then maybe they should learn… but this misses the point. Our clients are smart, data literate people, but their main job is investing their marketing budget to achieve the best return on investment. They shouldn’t need to learn Python.
The idea that mistakes happen because people are using Excel is another fallacy. Mistakes can be buried deep in Python, R or SQL logic. Changes to open source libraries can cause silent errors. If you deal with complexity you just have to be vigilant. There is no such thing as a mistake free coding language.
There are also advantages when it comes to security. We have cutting edge web based platforms that run on secure Virtual Private Networks in the AWS cloud, but our developers sometimes spend as much time writing documentation for client side infosec teams as they do building and improving the tools. Then you have to worry about being blocked by weird IT policies or out of date browsers. We were still dealing with Internet Explorer 6 bugs in 2018.
Our desktop tools built in Excel, for the most part, can be emailed and just work. The numbers that they produce are exactly the same as the numbers produced on any other platform, because if you’re doing it right that is how maths works.”