Papers
arxiv:2511.00261

Spot The Ball: A Benchmark for Visual Social Inference

Published on Nov 19, 2025
Authors:
,
,
,
,
,

Abstract

Vision-language models struggle with visual social inference tasks compared to human performance, relying on surface-level spatial cues rather than nuanced behavioral indicators.

Humans excel at visual social inference, the ability to infer hidden elements of a scene from subtle behavioral cues such as other people's gaze, pose, and orientation. This ability drives everyday social reasoning in humans and is critical for developing more human-like AI agents. We introduce Spot The Ball, a challenging benchmark for evaluating visual social inference in vision-language models (VLMs) using sports as a test domain. The task is to localize a removed sports ball from soccer, basketball, and volleyball images. We present a curated evaluation set with human baselines and a scalable pipeline for generating additional test items. We evaluate four state-of-the-art VLMs (Gemini, GPT, LLaMA, Qwen) using three prompting strategies, finding that humans are consistently two to three times more accurate (20-34%) than models (leq 17%) across all sports. Our analyses show that models rely on superficial spatial heuristics--such as guessing near the image center or nearby players--while humans leverage social cues like gaze direction and body pose. These findings reveal a persistent human-model gap in visual social reasoning and underscore the need for architectures that explicitly encode structured behavioral cues to achieve robust, human-like inference.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2511.00261
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.00261 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.00261 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.00261 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.