arxiv:2606.21083

Coherence Under Commitment: Probing Generalization and Vacuous Memorization in LLM Logical Reasoning

Published on Jun 19

Authors:

Abstract

Coherence Under Commitment (CUC) evaluates large language models' logical reasoning by measuring both consistency and decisiveness, revealing that models can achieve apparent coherence through systematic abstention rather than genuine reasoning.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large language models (LLMs) deployed for logical reasoning in knowledge-intensive domains exhibit a subtle but critical failure: coherence can be vacuously achieved through systematic abstention. A model that withholds commitment to either entailment or refutation satisfies negation consistency while providing no utility. We introduce Coherence Under Commitment (CUC), a dual-query evaluation paradigm that jointly measures consistency and decisiveness. CUC contributes three innovations: (1) a commitment score c(φ) = p(φ) + p(lnotφ) quantifying probability mass allocated to decisive outcomes; (2) a deterministic elicitation protocol via normalized YES/NO log probabilities, eliminating sampling variance; and (3) a 3-way decision framework (True/False/Uncertain) operationalizing the coherence-commitment trade-off into metrics. Experiments on four open-weight LLMs (1B-3B) across 204 FOLIO examples expose a sharp frontier. Qwen2.5-3B achieves near-zero contradiction (E[v_{neg}]{=}0.025) but only 7.4% coverage, while TinyLlama-1.1B reaches 79.4% coverage with violations on every example. Coherence-only evaluation would rank the abstaining model first; CUC exposes this as vacuous, and the frontier generalizes to LogiQA~v2 (ρ{=}0.97). We argue that evaluation must report both coherence and non-vacuous commitment and release a toolkit for standardized assessment.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.21083

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.21083 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.21083 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.21083 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.