Publications

When the LM misunderstood the human chuckled Analyzing garden path effects in humans and language models

Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs’ …

Samuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant

GLEE A Unified Framework and Benchmark for Language-based Economic Environments

Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural …

Eilam Shapira, Omer Madmon, Itamar Reinman, Samuel Joseph Amouyal, Roi Reichart, Moshe Tennenholtz

AssistantBench Can Web Agents Solve Realistic and Time-Consuming Tasks?

Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web. …

Ori Yoran, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant

Large Language Models for Psycholinguistic Plausibility Pretesting

In psycholinguistics, the creation of controlled materials is crucial to ensure that research outcomes are solely attributed to the …

Samuel Joseph Amouyal, Aya Meltzer-Asscher, Jonathan Berant

Rationality Report Cards Assessing the Economic Rationality of Large Language Models

There is increasing interest in using LLMs as decision-making “agents.” Doing so includes many degrees of freedom which …

Narun Raman, Taylor Lundy, Samuel Joseph Amouyal, Yoav Levine, Kevin Leyton-Brown, Moshe Tennenholtz

QAMPARI A Benchmark for Open-domain Questions with Many Answers

Existing benchmarks for open-domain question answering (ODQA) typically focus on questions whose answers can be extracted from a single …

Samuel Joseph Amouyal, Tomer Wolfson, Ohad Rubin, Ori Yoran, Jonathan Herzig, Jonathan Berant