“Torture the data, and it will confess to anything.”
– Ronald Coase, Economics Guru, Nobel Prize Laureate
One of the most overlooked foundational analytic tools is normalization, which is adjusting the values measured on different scales to a common scale. By no means is normalization an extraordinary tool, but it is critically important. If you don’t normalize data, you can create some incorrect conclusions from data that will lead you down the wrong path.
People love to look at absolute numbers, but they are often meaningless without normalizing them, so you are comparing apples to apples. I’ve seen it over and over again, people saying, “man our web traffic and sales are coming from California, New York, Texas, and Florida.” It makes sense because those four states make up a third of the US population. But, they may be underperforming once you normalize the traffic or sales to the population.
I once did a simple analysis for a fast-growing product company normalizing their web traffic by population per state. We found that 1 out of 100 people in Utah (the location of the headquarters) had visited the website vs. one out of 1000 in the US. It showed us the potential of the brand. If we could copy the awareness in Utah to the rest of the country, we could increase brand awareness and traffic by 10x! But, we would have never found that insight if we hadn’t normalized the data.
An example will help illustrate normalizing a data set.
Imagine you had 10 workers that produced widgets for a week. Below is the output per worker for the week. It looks like worker 1 was the most productive, and most people would read the table that way.
The issue is you don’t know how many hours each worker put in for the week. Once you have how many hours each worker put in for the week, then you can normalize his or her output to the common scale of output per hour.
Now that the above data is normalized, it tells a much different story, where worker 10 is the most productive, and worker 1 is just about average. Non-normalized data is probably the most common issue I deal with relating to poor analysis.
Normalization is a critical component whenever you are trying to compare the performance or potential of similar entities with different magnitudes. Below are the common data sets you should normalize before comparing:
Any time you are dealing with efficiency measurements across individuals, teams, stores, etc., make sure you normalize the data with a common denominator. The common denominator is often “per hour,” “per person,” and “per customer.”
Much like efficiency, any time you are trying to measure the effectiveness of something, such as advertising, quality, investments, or performance, make sure you normalize the data, so you are comparing apples to apples.
Whenever you are comparing the performance or potential of a market, you should normalize the analysis by the population of the market. I’ve done this numerous times, including comparing web traffic per state and normalizing the data as 1 out of “X” people visited the website per state, or evaluating the performance of different markets for a retailer and normalizing the data to be $ spend per person in each market.
Comparing financials over different time periods
Recalculating the line items of a P&L as percentages are so much more powerful than the absolute numbers, especially when looking at trends in the P&L. You typically normalize a P&L using revenues as the denominator.
How do you normalize data?
Any time you are trying to compare data, performance, and productivity from different entities, make sure you are comparing apples to apples. Here are some of the best practices for normalizing data:
Create percentages and ratios
Any time you create a percentage or ratio, you are normalizing data, with the denominator being the common scale you are adjusting different scales to.
Watch out for absolutes
Anytime you see a list of absolute numbers used for comparison, question whether they need to be normalized before you start deriving conclusions or insights from the data.
Figure out the right normalization variable
What are some of the common threads that would make comparing things apples to apples? Some of the common normalization variables include hours worked, population, sales, costs, customers, locations, days, or people.
Rank order the normalized data
Once you normalize data, rank-ordering the data set will open your eyes to performance or potential. And, if you have trend data, look into the trends of the normalized data to see how much performance or potential is improving or declining.