Data is a ubiquitous facet of today’s business landscape. It flows from digital assets, to physical purchase data, to a host of other company-specific sources. But, what is the most effective way to glean actionable insights from this abundance of data? I have two words for you: data modeling.

Data modeling – also known as econometric data modeling – sounds scary at first, but it doesn’t have to be. In the context of the business decision-making process, data modeling attempts to predict an unknown variable (for example, is it probable that a custom will respond to a deal offer?) by examining known variables (has that customer bought anything from us before?). While the mathematical underpinnings of the data modeling process are indeed complex, there are a number of basic fundamentals that any business user should be familiar with before digging into this type of analysis.

  1. Does this relationship make sense?

One of the most common and egregious mistakes that any data scientist or business analyst can make is to rely solely on what a mathematical model spits out, not understanding the underlying business logic. Although there may be a highly significant level of correlation between two events, this may not imply that any meaningful relationship exists between the events, nor that one event causes the other.

When someone applies undue significance to a mathematical relationship, it’s called spurious correlation. Here’s an example: [1]

Mathematical model


Although this model is humorous, it’s easy to see how analysts could misinterpret statistical significance between two variables and come to the wrong conclusion. However, when that conclusion ultimately leads to a business decision, it can influence the course of the company — this is no longer a joking matter. Therefore, when performing a modeling project yourself or looking at the results of another analyst’s work, the most important question that you should keep in mind is: does this relationship make sense?

  1. If it looks too good to be true, it probably is

The second basic tenant comes from the age old saying, “if it’s too good to be true, it probably is.” For instance, let’s say an analyst on your team comes to you and claims to have produced a consumer behavior model that correctly predicts purchase behavior in 100% of cases. This is a red flag and you should immediately be suspicious of these results and begin asking questions. To understand why you should be suspicious, let’s take a step back.

The fundamental basis of modeling is to take a number of previously seen cases and extrapolate general principles from these cases. For example, one could look at the consumer profiles of thousands of customers who have purchased a particular pair of shoes and attempt to make generalizations about what characteristics these customers share. For instance:

  • Are they of a specific age, gender, geographic region?
  • Are their Facebook interests similar?
  • Do they have kids?

Through the data modeling process, we seek to find the variables that are actually indicative of purchase behavior. We then find consumers that share these similar characteristics in the hope that they too would be interested in purchasing our particular pair of shoes.

Back to my original point. The difficulty with a model that is too good at making such predictions stems from how we create models in the first place: by looking at past cases, we can come up with a mathematical model to try to predict future cases.

In the model-building process, it is possible describe each consumer exactly in our test case universe, thereby exactly predicting their decision to purchase or not. This process is called overfitting, which means instead of creating a model that is applicable outside of our test cases, we have tailored a model that exactly predicts our test cases and has no applicability outside of that. Because a model is a prediction mechanism of human behavior, it’s highly unlikely to be correct in all (or even close to all) cases.

Bottom line: if the model looks too good to be true, it probably is.

  1. Are we using the right model?

Just as there are many physical tools to perform different tasks (a hammer for nailing nails, a screwdriver for screwing screws — you get the point), there are also many mathematical models to perform various functions. The most important distinction to keep in mind when looking at models is a supervised versus unsupervised model.

In data modeling, a supervised model asks a specific question of the data. For example, “can we find groups of customers that are particularly likely to respond to Mother’s Day ads?” In contrast, an unsupervised modeling process seeks to expose interesting revelations from data, but has no specific guiding question. Such a process might search for natural groupings in a customer database without any prior outcome specification.

Within these two broadly defined groups, there are a plethora of mathematical models that will have strengths and weaknesses depending on the specific situation. However, the point here is not to bog the reader down in a quagmire of technicality, but to instead highlight a principle: do you (or your data science team) know why you are using the model you are using? If not, return to square one and begin with that question in mind.

Drawing the right insights

When using data modeling in your business, I encourage you to look at the process with three fundamentals in mind:

  1. Does this relationship make sense?
  2. If it looks too good to be true, it probably is
  3. Are we using the right model?

When doing so, data modeling can be a highly effective way to draw insights from the vast and ever-expanding pool of data that is available to businesses across the world. Never before has there been such an opportunity to leverage massive computer processing, consumer data, and mathematical knowledge in such a way as today.