I am confident I know what a Likert scale is.
- 5 = Strongly Agree
- 4 = Agree
- 3 = Neutral
- 2 = Disagree
- 1 = Strongly Disagree
Chances are you’ve encountered a question like the one above countless times over the years. This is what is called a Likert scale, named after its inventor Rensis Likert; an American psychologist who first used the scale in 1932. Since then, countless variations of this type of scale have been used on surveys and questionnaires around the world.
According to Wikipedia, a Likert scale is “a psychometric scale commonly involved in research that employs questionnaires… When responding to a Likert item, respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements. Thus, the range captures the intensity of their feelings for a given item.”
The five-point scale shown above is the most common form of a Likert scale, but there are others. Some surveys have four-point scales that leave out the “Neutral” option – these are referred to as “forced-choice” questions because they don’t give the respondent the option to remain neutral. Some surveys include more options, some less.
We know these scales are used widely, but are they used correctly?
The answer, unfortunately, is that it is very common for people using items like this to report results in a way that is not statistically appropriate.
Let’s say I included the question above on a survey, and I wanted to report out the results. I could present information on how frequently each option was chosen, like in the following graph:
Graphs like the one above are called frequency distributions, or histograms.
Still using frequencies, I could also say that:
- 30% of respondents chose “Strongly Agree”
- 13% chose “Agree”
- 23% chose “Neutral”
- 3% chose “Disagree”
- 30% chose “Strongly Disagree”
Or in simpler terms:
- 44% chose “Agree” or “Strongly Agree”
- 23% chose “Neutral”
- 33% chose “Disagree” or “Strongly Disagree”
Up to this point, we are using the data correctly and everyone is happy.
However, often you will see people report the results of such a question like this:
- On a scale of 1 to 5, with 1 = “Strongly Disagree” and 5 = “Strongly Agree,” the average score for the question was 3.1 with a standard deviation of 1.6.
No doubt this sounds impressive, and if you had to pick, you might think it would be the most statistically appropriate of the group. But it is not.
The problem is that Likert scales with each point labeled are what statisticians call “ordinal data.” This means that the items have an order or sequence to them, but it is not necessarily true that the distance between items is consistent across the scale. Scales that have items with an equal distance between them are called “interval data.”
Before you stop reading for fear of things getting too heavy, think about this question:
- Is the distance between “Neutral” and “Agree” the same as the distance between “Agree” and “Strongly Agree”?
As it turns out, research suggests the answer to that question is no. Conceptually, the distance between “Agree” and “Strongly Agree” is much shorter than the distance between “Neutral” and “Agree.”
I am a visual thinker, so it helps me to draw pictures. To illustrate what I’ve just said, you can say that when you use averages and standard deviations, you are assuming the scale looks like this:
When in reality, it looks more like this:
I could tell you about all the studies that have been done on this subject, but only the very nerdy numbers folks like myself would really care, so I will skip it. Suffice to say that a lot of research has been done in this area, and the general consensus is that Likert questions with all points labeled result in ordinal data, not interval, and therefore it is inappropriate to report averages for this data.
You might be asking yourself, “So what?”
The answer is this: If you see a question like the one I showed at the beginning of this article, and then you see an average or mean and maybe even a standard deviation reported, you will know that the wrong statistic has been used, and you will further know that the numerical average provided doesn’t really mean anything.
As I said earlier, this mistake is made often, particularly by people who create, administer, score, and report on surveys who don’t have a strong statistical background. The mechanics of doing a survey are not difficult, but if you have not been trained to ask the questions and use the answers appropriately, it is easy to make mistakes.
If you believe me that it is inappropriate to report averages on these Likert-type questions, then what can you do about it?
There are two ways to fix this problem. The first is just to use the frequencies when reporting results for Likert-type questions. However, many are not satisfied with this because they feel the frequency reporting is too cumbersome and they’d really like to be able to report a single number for each question.
The second solution, which would allow you to report the average numerical value for each question, is to revise the items by removing all the labels but the two used for the end points. Using this method, our original question would look like this:
I am confident I know what a Likert scale is.
- 5 = Strongly Agree
- 4
- 3
- 2
- 1 = Strongly Disagree
Or, you can actually just present the question with the instructions to “Rate you level of agreement with the following statement, with 5 = “Strongly Agree” and 1 = “Strongly Disagree.”
Because the points in between the ends are not labelled, we conceptually recognize that they are supposed to have equal distance between them. We will still see the middle point as “Neutral,” but we will see the other points as “halfway between Neutral and Strongly Agree/Disagree,” which is not quite the same thing as “Agree/Disagree.”
The next time you see a survey, look at how the questions asking for level of agreement are worded, and think about how you would expect to see the results reported. Likewise, the next time you read about the results of a survey, look at how the data is presented and, if available, look at how the questions were presented. You will be surprised at how often data for these Likert-type questions are reported incorrectly.