Testing Theory of Mind in Large Language Models and Humans


At the core of what defines us as humans is the concept of Theory of Mind; the ability to track other people’s mental states. The recent development of large language models (LLMs) such as ChatGPT has led to intense debate about the possibility that these models might have developed an artificial Theory of Mind. Here, we present results from a comprehensive battery of measurements spanning different Theory of Mind abilities, from understanding false beliefs, to interpreting indirect requests and recognizing irony and faux pas. We tested two families of LLMs (GPT and LLaMA2) repeatedly against these measures and compared their performance with those from a large sample of human participants. Across the battery of Theory of Mind tests, we found that GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs, and misdirection, but struggled with detecting faux pas. Faux pas was also the only test where LLaMA2 outperformed humans. Follow-up manipulations of the belief likelihood revealed that LLaMA2’s superiority was illusory, possibly reflecting a bias towards attributing ignorance. In contrast, GPT’s poor performance originated from a hyper conservative approach towards committing to conclusions rather than from a genuine failure of inference. These findings not only demonstrate that LLMs exhibit behaviour that is consistent with the outputs of mentalistic inference in humans, but also highlight the importance of systematic testing to ensure a non-superficial comparison between human and artificial intelligences.

Under revision at Nature Human Behaviour