Explaining My PhD: What is Likert-type Item data?

To kick off explaining my general research area I’m going to explain my data-type of interest: Likert-type item data.

Most of us are familiar with Likert item data (or things masquerading as Likert items). The item generates responses from participants and usually look something like this:

I’m awesome.

| Strongly Agree   |   Agree   |   Neither Agree nor Disagree   |   Disagree   |   Strongly Disagree |

So, you have some sort of statement which you can agree or disagree with and then you have some ordinal responses. Ordinal data is a special form of categorical data. Categorical data is non-numerical. It’s like recording someone’s height as “tall” rather than 6’6”. If the responses have a natural order it’s called ordinal data. So “short”, “medium-height” and “tall” would make up an ordinal variable “height”. This is distinct from “no pet”, “cat”, “dog” and “other” which make up the variable “pet” but have no inherent order. Level of agreement has an inherent order.

The response options are referred to as the response format. Some researchers will only have three responses and take away the strongest ones, others will remove the middle neutral option to force a response in one direction or the other. Sometimes you will see the phrase “n-point Likert scale”. This means we have a response format that has n responses. If n is odd there is a neutral response, if n is even then there isn’t.

If we have a set of items all relating to the same thing we may be dealing with a Likert scale, which I will explain later, otherwise what we have are Likert-type items. This just tells us the item and response format is based on true Likert items (i.e. items intended to be combined in a scale) but they items were not designed to function together as a scale. True Likert items will always use level of agreement as the response but the similarly structured Likert-type items may use another kind of ordinal response such as frequency (“never”,  “sometimes”, “always”). Likert-type item responses can be analysed using methods designed for ordinal data. In this case every item is a response variable and so if we have more than one item we’re dealing with a multivariate model. A multivariate model simply has more than one response variable which is quite a bit more complicated to model so I will save the details for another post. For now, we just need to know that methods to analyse ordinal data can cope with Likert-type items.

But why not just use Likert scale data? A Likert scale is a set of eight or more Likert items designed to work together and be combined into a single measure. The responses will be scored and then summed to generate a score on the Likert scale. So, if we score our responses 1-5 with 1 being “strongly disagree” and we have ten items then our Likert scale ranges from 10-50.  Now here’s the thing about Likert scales: they are constructed very carefully. Researchers spend a lot of time and effort checking them for validity and reliability. Validity refers to

  1. Content: does it look like these items will measure what I think they should? *
  2. Criterion: does it relate well to similar measures? Can I predict an outcome, which relates to the attribute I’m trying to measure, using scores on my test?

Reliability is broken down into

  1. Internal: do the items agree with each other? Do they elicit similar responses? I.e. do people who agree with one item tend to agree with the others?
  2. External: If I gave the same person this test twice would they get a similar measurement? **

The above is a very concise summary of information available at simplypsychology.org, references [1] and [2] are the specific pages. Items are generated and refined to cover these criteria. However, there is one massive glaring error is the process: at no point does it make sense to treat the scores as numbers and add them together. The responses are ordinal this means we have no numerical information about the responses. In the “height” example, we could define “short” as “under 6 feet” or “under 4 feet” and because we like in a world with no way to directly measure height (analogous to the situation of trying to numerically measure depression or another psychological attribute) we wouldn’t know which was true. This means that we don’t actually know how much taller “tall” people are than “short” people. Consequently, summing the responses (or doing any sort of maths with them) makes no sense.

Calling something “23” does not make it a number. You are simply labelling it with a number.

I have never seen a single piece of literature try to defend the summation of Likert items, so I can only assume that researchers are using an “it works okay” justification. However, I do not think it is sensible to do something totally illogical just because it works okay. It is preferable to use a method that actually applies to the data you have.

*There’s also the more complex construct validity which relates to the theoretical basis of the attribute of interest, refer to [2] for more information.

**This is assuming the person doesn’t remember their previous answers and is generally in the same state as they were the first time. When testing this the researcher tries to make sure this is a reasonable assumption.


[1] McLeod, S. A. (2007). What is Reliability?. Retrieved from www.simplypsychology.org/reliability.html

[2] McLeod, S. A. (2013). What is Validity?. Retrieved from www.simplypsychology.org/validity.html


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s