Reading the Mind in the Eyes



In 1997, Baron-Cohen and his colleagues conducted a study using the Eyes test. After reviewing their results, they decided that a “revised” version of the test needed to be made so that the study could be improved and conducted again.

The problems with the original test are listed below.

  1. Answering the question involved a forced choice option between only two responses. The range of scores was too narrow to produce accurate results.
  2. The test scores were too narrow to differentiate between people with actual autism and “lesser variant” or “broader phenotype” groups (e.g. first relative of the autistic person).
  3. Ceiling effects are likely to occur due to the narrow score range; this means individual differences are difficult to detect.
  4. The emotions described in the test were of two types: Basic and Complex. Basic emotions (e.g. happy and sad) were too easy and made ceiling effects more likely to occur.
  5. Some items were linked to gaze direction and perception (e.g. ignoring and noticing). This factor was considered to be a clue that made the test easier to complete for some people.
  6. There were more female faces than male faces, so a possible bias could have occurred.
  7. The two response options were always semantic opposites of each other (e.g. sympathetic and unsympathetic), which was making the test far too easy. It’s like asking: is this black or white?
  8. Since the test involved linking pictures and words, there may have been issues with participants not correctly understanding the meanings of certain words.

To fix the first three limitations, the following steps were taken:

  1. The number of items (photographs) was increased from 25 to 36.
  2. The number of forced-choice response items was increased from 2 to 4.

To fix the fourth andย fifth limitations, the following steps were taken:

  1. The test items only involved Complex mental states; Basic states were eliminated.
  2. All items linked to gaze perception were removed from the test.

To fix the last three limitations, the following steps were taken:

  1. An equal number of female faces and male faces were used.
  2. The correct answer was accompanied by three similar words, as opposed to opposites.
  3. A glossary was given to participants; they could consult it at any time during the study.
Simon Baron-Cohen is a Professor of Developmental Psychopathology at the University of Cambridge. His most popular studies are on autism.

Simon Baron-Cohen is a Professor of Developmental Psychopathology at the University of Cambridge. His most popular studies are on autism.

Simon is the cousin of Sacha Baron-Cohen, famous comedian with super fab facial hair


  • To test a group of adults with Asperger’s or HF autism with the revised Eyes test.
  • To test a sample of normal adults with the revised Eyes test and investigate if an inverse correlation between scores on the AQ and Eyes test could be found.
  • To test if previous sex differences could be replicated with the revised Eyes test.


  • Adults with AS/HFA will score significantly lower on mental state judgements on the revised Eyes test. However, they will not be impaired on the control task of gender judgements.
  • Adults with AS/HFA will score significantly higher in the AQ test.
  • Normal adult females will score higher than males on the revised Eyes test.
  • Scores on the AQ and Eyes test will be inversely correlated.

The participants were sorted into four groups.

Group 1 – A group of 15 males diagnosed with AS or HF-A, who were recruited through support groups and the U.K. National Autistic magazine. They were from different educational levels and socioeconomic classes.

Group 2 – A large group of 122 “normal” adults from adult community/educational classes in Exeter and public libraries in Cambridge. They came from a variety of educational and professional backgrounds.

Group 3 – A group of 103 “normal” male (53) and female (50) undergraduate students at Cambridge University. As this university has relatively high eligibility criteria, it can be assumed that members of this group have high IQ.

Group 4 – A group of 14 randomly selected individuals who were similar to Group 1 in their age and IQ.

characteristics of the four groups

Two main tests were used in this study: the Autism Quotient (AQ) and the revised Eyes test.

  1. Autism Quotient test: This is a self-report questionnaire that measures the degree to which a normal adult of normal IQ possesses traits that are linked to autism. It is scored from 0 to 50.
  2. Eyes test: This test measures a person’s “theory of mind”. It consists of 35 items, which are pictures of eyes expressing different feelings and emotions. A person chooses a word that best describes the expression from four pre-written options (multiple-choice format). In this study, there was also a glossary of words at the end of this test.

All participants were tested with the revised Eyes test in a quiet room at Cambridge or in Exeter. The test was administered to them inidividually.

Group 1 also completed a gender recognition task while answering the Eyes test; they had to judge the gender of the person whose eyes were being shown. Normal adults were not given this task because previous studies showed that they always achieved ceiling results for this test so, to save time and effort, the researchers only gave the gender recognition task to Group 1.

Group 1, 3 and 4 also completed the AQ test.

All participants were also asked to read the glossary (found at the end of the Eyes test). They could use this glossary during the test to find meanings of any difficult or unfamiliar words.

The following results were collected from the study:

  • There was no difference of glossary-usage between groups (no one checked more than two words).
  • Group 1 performed significantly worse than the other groups on the Eyes test.
  • Group 1 scored 33 or above (out of 36) on the gender recognition control task.
  • Group 2, 3 and 4’s performance on the Eyes test was not significantly different from each other.
  • Females scored higher than males on the Eyes test.
  • Group 1 scored higher on the AQ test than Group 3 and 4.
  • In Group 3, males scored higher on the AQ than females.
  • There was no correlation between Eyes test and IQ.
  • There was no correlation between AQ and IQ.
  • In Group 3, the Eyes test was inversely correlated with the social skills and communication categories.

Type of research method
This was a quasi (natural) experiment with self-report.

Type of data collected
Quantitative data (numerical scores on the test).

Independent variable
The factor of autism.

Dependent variable
Self-report scores.


  1. Collection of quantitative data: Quantitative data (i.e. scores on tests) was collected in this study. This data is all numerical, which means it can be analysed easily while comparisons can also be made (such as the comparison of group scores). It also means that the study’s results are objective, which makes them free from personal bias.
  2. Replicability and reliability: Because the psychometric tests all have a fixed format with close-ended questions, they can be taken again and again. This makes the study replicable, so other researchers can confirm the consistency of the findings. As the study’s procedure and tasks are replicable, the results are more likely to be reliable.
  3. Control: The researchers controlled variables like age, sex and IQ. This means that they could be more sure that it was only the factor of Autism that was affecting the scores, as opposed to a factor like age confounding the results.


  1. Validity: Psychometric tests do not always test what they claim to test. For example, was the Eyes test measuring theory of mind or was it just measuring the participants’ capability of completing an Eyes test? How do we know for sure that theory of mind was tested alone? However, in the original paper, the researchers do attempt to justify the validity of the test.
  2. Ecological validity and mundane realism: The stimuli were just static images of eyes. In real-life social situations, we interpret emotions of real people who are not stuck in one expression; the situation is much more different. This lowers the ecological validity of the study and creates an issue of mundane realism with regards to the task. Also, the lab setting for some participants also lacks ecological validity.

Ethical issues

  1. Informed consent:ย This was obtained from all participants.
  2. Confidentiality: Confidentiality was respected.
  3. Emotional or physical harm: There is no evidence of the study or tests causing any type of harm.
  4. The right to withdraw: Participants could withdraw from the study.
  5. Debriefing: There was no mention of debriefing.

Reference: Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y. and Plumb, I. (2001). The ‘Reading the Mind in the Eyes’ Test Revised Version: A Study with Normal Adults, and Adults with Asperger Syndrome or High-functioning Autism. Journal of Child Psychology and Psychiatry. 42(2): 241-251.

26 thoughts on “Reading the Mind in the Eyes

  1. Hey Maryam!

    I’m sitting for A Level Psychology this May/June 2017

    I’m looking for a simplified page on Psychology and Education A2. Will you be posting anytime soon? Or please can you suggest further reads in the form of recommended books by Cambridge?

    I found your page on Abnormal Psychology more comprehensive and easier to understand.However,I am aware the textbook only serves as a brief guide for understanding the specialist choices at A2 level.So please can you kindly suggest further material (by CIE or your own suggestion)that could aid my understanding and deepen my knowledge in Abnormality?

    Finally I find designing studies to be pretty challenging and want to ask you for efficient ways of tackling the questions (i.e) Paper 3 “Specialist Choices” in Psychology and Education, and Models of Abnormality.

    Thank you,
    Eagerly awaiting your reply ๐Ÿ™‚


    • Hi Yusra,

      This is actually a very commonly-used term – you could Google it and find out all there is to know.

      To put it simply: in psychology, “ceiling effects” are when the DV’s measurement always results in very high or the highest possible scores, which makes it difficult to detect the effect of the IV. For example, in Baron-Cohen, the gender recognition task always produces ceiling effects in normal adults, most probably because it is too easy for them.


  2. Hey could you put the results and the strengths, weaknesses and ethics for the study in this answer, Please answer this question I need For the last minute revision for tomorrow’s paper.Thank you!

  3. Is this study really a questionnaire? Someone suggested to me that it’s an “experiment with matched participants based on IQ”. o_e I don’t believe they’re right, but want to make sure jic.

    • Also, if you’re planning to update this study can you add ‘results’ too? The table is a bit difficult to decipher. Thanks in advance!

      • Apologies, I will do that too! Honestly every study needs a major update as these were written almost two years ago and there’s a lot of information I want to add. I’ll start with Baron-Cohen.

        Maryam X

    • Hi Ayesha!

      “Questionnaire” isn’t really a research method; it’s more of technique used in studies and experiments. Baron-Cohen’s study can be considered a *quasi (natural) experiment with self-report methodology* because the IV was not manipulated – so the person’s suggestion is partly right, although you MUST specify it as a natural experiment.

      And yes, as for the second part of the suggestion, matched pairs was used for the AS/HFA group and Group 4. However, it was not only matched on IQ – they were also of a very similar age range.

      So yes, that suggestion was not far off the mark! ๐Ÿ™‚ I hope this helps to clear it out.

      Maryam X

    • Hi Assawer,

      I’m glad the blog has been helpful. I’m assuming you mean the evaluation (strengths, weaknesses, ethics) part of the study, right? If you give me a few hours, I’ll make sure this page is updated and completed by tonight. Hope that’s alright!


    • Hi there,

      Which table? If you’re referring to the table of participants’ characteristics then this is actually from the 2001 study, which involves the revised version. You can also find it in the original research paper! ๐Ÿ™‚

      Thank you,

  4. I still cannot see any content, I do not know if something is wrong with my laptop or the blog. please help!

    • Hi mimik!

      The content on this page is being uploaded – can you see it? My laptop is being a huge nuisance today (poor thing is being overworked and needs a reboot) but this will hopefully be resolved tomorrow, which means the content will be uploaded much faster.

      Maryam XXX

    • As I mentioned on a previous comment, I’ll be updating incomplete pages like this over the next few days! Really sorry for the delay, but thank you for your patience!

      Maryam XXX

  5. Hey, there seems to be something wrong with this particular experiment, there’s absolutely no content that’s showing up.

    • Hi Kritika,

      Yes, some pages had incomplete content – or none at all, in this case! I’ll be updating the blog over the next few days so all the information should become available soon.

      Maryam XXX

    • Hi Usama,

      I apologise for such a late response! I’ve been caught up with life in general; didn’t even realise people were commenting! I’ll keep my eyes open next time.

      As for your question, Pablo answered it well. You could use video clips or real-time actors as an alternative. If you want a more specific answer, send me the question you’ve been looking at.

      Thanks for commenting,

Leave your comments below!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s