Averages in reporting are frequently misleading. In many cases, numbers have a wide range but cluster towards one end or the other.
Calculating a mean can result in a number that gets pulled in the direction of an outlier. Medians, which order every answer and then pick the center one, can be a better indicator of what is typical. Modes provide what is most frequent.
Here is an example of how each average type can answer a different question:
A data sharing user is interested in optimizing the number of sequence steps in their sequences.
Mode: They calculate how frequently a prospect unsubscribes to each email step in a sequence. The most frequent (the actual mode) is the 1st email, so they look for the 2nd most frequent, and find that prospects most frequently unsubscribe to the 4th email. They review the content of the 4th email to make it more engaging.
Median: They calculate the median steps a prospect gets through before responding. Most responsive prospects respond around step 6 of 11. They shorten the sequence.
Mean: They calculate the median steps a prospect gets through before responding, and then calculate the mean of that median across all their sequences. They find that most sequences get the most responses around step 4, and set a best practice for their org to build new sequences no more than 6 steps long.
Distributions, or the shape of a line chart, can also make your numbers misleading. For example, if two different personas of prospects tend to cluster together at two different points (a bimodal distribution), any average that covers both might end up somewhere in the middle, where no one actually is.
Some methods to help you understand distributions are:
- When you are analyzing a number for the first time, draw a line chart and examine that first.
- When you present a median, also present the minimum and maximum values.
- Rather than calculate one number, calculate percentiles.
- Calculating 90%ile or 95%ile will provide insight into the majority of data. It may help you gain insights like “90% of prospects who responded did so by sequence step N.”
In certain cases, data should be converted to absolute values. For example, when comparing performance to an expected number or baseline, actual performance may be above or below the line. If these numbers are taken as an average, negative and positive values may cancel out.
Here is an example of when this might matter:
A data sharing user is interested in how closely their reps hit a quota on a regular basis. If quota is X and actual performance is Y, they might calculate Y-X for each rep, and take the average.
However, if X is 100%, and in a list of 10 reps Y is: 73%, 82%, 71%, 77%, 75%, 127%, 118%, 129%, 123%, and 125%, then the average across reps of Y-X is 0. In reality, reps either significantly overshoot or undershoot quota, with hardly anyone hitting it exactly.
Taking the absolute value of Y-X, which turns negative numbers into positive numbers, gives the real typical deviation of 24.4% which, paired with the knowledge that on average you hit quota across the team, tells you that performance is highly inconsistent from rep to rep with the high performers compensating for the low performers.