Welcome to the Wheelhouse, a popular series of blogs from Ebiquity’s Marketing Effectiveness team.
In this latest edition of our blog The Wheelhouse, Group Director Nic Pietersma considers the issue of overfitting – building statistical models that contain too many variables. Spoiler alert: while this article is a little more technical and in-depth than usual, it is also practical and straightforward.
What is overfitting?
In the simplest terms, overfitting means that you have too many variables in your statistical model. The signal to noise ratio drops. Minor changes to inputs cause wild and crazy changes to outputs. The model becomes worse for prediction and worse for inference.
An academic might explain why overfitting is bad by talking about degrees of freedom, standard errors, confidence intervals and so on. But for people who use econometrics or marketing mix modelling (MMM) to support investment decisions in media lines or marketing channels, overfitting is a practical problem. It can make your model harder to interpret, unstable, and misleading.
It can lead to misplaced confidence in your investments and returns.
Why is overfitting so common in MMM?
There are three main reasons overfitting is a big problem in marketing mix modelling:
- Modern media plans are complex
- Providers are not candid with clients
- Complexity accumulates over time
Let’s unpack each of these reasons, starting with the complexity of modern media plans.
Consider YouTube. It can be bought in skippable, non-skippable, bumper, and masthead formats. Large screen or small screen. It can be bought against all adults, in-market audiences, or specific interest groups. You can buy Select (for the best content creators) or plain vanilla. All with vastly different CPMs. You might want to split YouTube by sub-type for greater insight and actionability, but this is just one media line.
Other channels can also be split in their own byzantine ways. With halo effects and creative splits to consider, you may end up with hundreds of media parameters in your model. Put simply, marketing mix models veer towards overfitting because modern media plans are inherently so complex.
But the complexity of today’s advertising ecosystem is not the only factor leading to overfitting. There are cultural issues at play here as well. Marketing is a sales-led industry, with a strong client service culture. When the client brief asks for more granularity, the instinct of most account handlers is to affirm that the brief is understood and that best efforts will be made, rather than alienate the client by going down a big technical rabbit hole about the dangers of overfitting. Front-line analysts are not always empowered to challenge the brief.
The third reason that overfitting creeps up on us is that complexity accumulates over time. If a model is extended and additional ROI splits are requested, the change will probably be accommodated with mild caveats. Since there is no objective cut off point that cannot be crossed, overfitting is more like a slippery slope that one can roll down at will, one update at a time.
The consequences of overfitting in MMM
So, now we know why overfitting is endemic to media mix models, let’s turn our attention to the next question: what’s the harm?
To answer this, we need a combination of a little bit of statistical intuition and a dose of commercial common sense. It’s important to consider the level of noise in the model versus a plausible sales effect size. Here is a rough worked example.
A reasonable MMM econometric model might have ~3% standard error. For a single week of activity to be statistically significant at the 10% level, you would need to shift sales by roughly 5%. If you have weekly revenue of £70m at a profit margin of 30% and you spend – say – £50K on one week of a new media line, you would need to expect a 21:1 return on investment to see a significant result. We should rule out reporting on this activity unless we have much more data to go on – ideally at least 20 to 30 weeks of data, preferably across a few clean bursts over a year or two.
What happens when we throw caution to the wind? What happens if we add a few hundred media variables to the model? When you start running out of degrees of freedom? In a nutshell, you have effectively built a random number generator.
When you extend an overfitted model by adding six months of new data, you will almost certainly see inconsistencies compared to the ROIs that have been previously reported. Besides overfitting, you will also run into related issues such as multicollinearity – the undesirable situation where several independent (or predictor) variables in your model are correlated – and this can cause further damage to analysis.
When this happens – in order to keep the model consistent – your front-line analyst will need to strongarm the model by either adding much tighter priors or putting hard constraints on how much the ROI estimate can move from one update to the next. If you want the data to do the talking, this is a bad outcome.
Three cures for overfitting
We’ve painted a picture of the problem in uncomfortable detail. It seems only right to spend some time talking about the cures. Here are three:
1. The Bayesian MMM cure and other fancy maths
Bayesian MMM allows you to anchor your estimates on a ‘prior’ and impose some stability on the model. For more information on Bayesian MMM see this previous article from The Wheelhouse from last summer.
The issue is: on what basis should you choose to set your prior? You can set priors based on industry benchmarks, but this is quite hard to do in practice, as you need to control for the scale of your client brand and have access to other information that might not be in the public domain, like profitability.
Another option is to take your prior from an initial round of modelling that is more parsimonious than the final version of the model. So, for example, we might anchor on the average performance of paid social as our prior but split the final report at the level of sub-channel (Instagram, Facebook, X, Reddit, Pinterest etc.) and perhaps split again by format or message.
This is not really a fix at a fundamental level. Just because you are using Bayesian MMM does not make the impact of a tiny little budget pot more measurable. Your model is still a random number generator at heart, but you can think of your new estimates as having an elastic band wrapped around them that keeps them close to the prior. It will take a stronger signal in the data to pull that estimate away from the reference point, and the further away the data wants to take you, the tighter that elastic band will get. It is not a cure for overfitting per se, but it will help stabilise your model and limit some of the outlier ROI estimates.
Another method that claims to be better suited to granular measurement is elastic net regression, which powers Meta’s Robyn package. Once again, it’s not a cure per se. Elastic net – or more specifically the ridge regression component – is something known as a ‘regularised regression’ approach that very slightly biases towards giving a little bit of credit to every variable in the model and slightly reducing the credit given to major factors. In an overfitted model, this approach ensures that the long tail of media lines all get their due. But it feels like cheating as we are ever so slightly resting a thumb on the scale.
Fancy maths can’t really fix overfitting, but as long as the end user truly understands the nature of the fix, it should be fine to proceed.
2. The Triangulation cure
Another potential cure to the granularity problem is to allow MMM to play to its strength of estimating overall incrementality and defer to some other, external technique to inform reporting on a more granular, sub-channel basis.
Three examples to bring this cure to life:
a) If your business is reasonably confident in its online tracking framework, it may be sufficient to model at the topline channel level in MMM and rely on attribution to inform the relative efficiencies within a media channel. This is not such a great solution if a high share of transactions happens offline, however.
b) If you believe effectiveness can be linked to a single golden metric – for example visual attentive seconds – then it is possible to score sub-channels on this basis and simplify your econometrics. The job of the MMM then becomes “to estimate the uplift per 100K attentive seconds” with another source doing the heavy lifting for within-channel estimates. There’s more detail here.
c) Geotesting is the gold standard if you want to answer one exam question really well. It’s a robust way to drill down to one sub-channel and it has no dependency on cookies or back data. Unfortunately, well-run tests can be time-consuming and no advertiser has infinite resources, so this solution does not scale well past a few tests each year.
3. The Culture cure
Of the three cures, this one is a must. Marketing effectiveness professionals need to get better at speaking to our clients candidly about the limitations of the methodology. We need to get better at having awkward conversations where feasibility and high expectations collide. Analysts – those who are closest to the data – need to be empowered to do their job as technical advisors on the solution design; the job is bigger than cranking a handle on a machine that reports consistent ROIs update-to-update.
Summing up
A spirit of openness is essential whether you work in an internal marketing effectiveness team, in an agency, or at an independent shop, such as Ebiquity. If clients are taken on the journey, given a deeper understanding of the trade-offs when choosing between a simpler or a more overfitted model, they go from being passive users of MMM to co-authors of the solution, which is how it should be.
Find out more: Ben Lambert: Overfitting in Econometrics; Richard McElreath: Fitting Over & Under.
Up next: Our next Wheelhouse by is by Ebiquity Director, Tom Loughnan. It’s a companion piece to this article dealing with diminishing returns and saturation effects in MMM models.
Want to gain a deeper understanding on how modelling can impact your outcomes?