MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection Paper • 2605.30288 • Published 8 days ago • 21
ClawBench — Browser Agent Benchmark Suite Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 25 days ago • 1
Dr. Bench: A Multidimensional Evaluation for Deep Research Agents, from Answers to Reports Paper • 2510.02190 • Published Jan 29 • 20
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published May 3 • 122
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published Apr 30 • 90
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published Mar 13, 2025 • 25
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 36
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 236