Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Paper • 2606.12191 • Published 4 days ago • 58
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios Paper • 2604.25914 • Published Apr 28 • 41
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published Apr 26 • 33
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper • 2604.14142 • Published Apr 15 • 30