MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination Paper • 2603.24579 • Published Mar 25 • 1
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published Mar 10 • 6
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Paper • 2606.03980 • Published 8 days ago • 10
Search Self-play: Pushing the Frontier of Agent Capability without Supervision Paper • 2510.18821 • Published Oct 21, 2025 • 19
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published Mar 26 • 54
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application Paper • 2510.19631 • Published Oct 22, 2025 • 28
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking Paper • 2510.20168 • Published Oct 23, 2025 • 28