When the LM misunderstood the human chuckled Analyzing garden path effects in humans and language models

Abstract

Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs’ and humans’ language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.

Samuel Joseph Amouyal
Samuel Joseph Amouyal
PhD candidate @ TAU, Research scientist @ Blinq.io

I am a PhD candidate in computer science interested in the intersection between Natural Language Processing (NLP) and other fields (psycholinguistics, economy, game-theory, literature …)