Testing Theory of Mind in Large Language Models and Humans


Interacting with other people involves reasoning about and prediction of others’ mental states, or Theory of Mind. This capacity is a distinguishing feature of human cognition but recent advances in Large Language Models (LLMs) such as ChatGPT suggest that they may possess some emergent capacity for human-like Theory of Mind. Such claims merit a systematic approach to explore the limits of GPT models’ emergent Theory of Mind capacity and compare it against humans. We show that while GPT models show impressive Theory of Mind-like capacity in controlled tests, there are key deviations from human performance that call into question how human-like this capacity is. Specifically, across a battery of Theory of Mind tests, we found that GPT models performed at human levels when recognising indirect requests, false beliefs, and higher-order mental states like misdirection, but were specifically impaired at recognising faux pas. Follow-up studies revealed that this was due to GPT’s conservatism in drawing conclusions that humans took to be self-evident. Our results suggest that while GPT may demonstrate the competence for sophisticated mentalistic inference, its lack of embodiment within an action-oriented environment make this capacity qualitatively different from human cognition.

Mar 17, 2024 10:30 AM
University of Regensburg, Regensburg, Germany
James W.A. Strachan
James W.A. Strachan
Humboldt Fellow
he/him 🏳️‍🌈