MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination Paper • 2603.24579 • Published Mar 25 • 1
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published Mar 10 • 6
On Diversified Preferences of Large Language Model Alignment Paper • 2312.07401 • Published Dec 12, 2023
Self-playing Adversarial Language Game Enhances LLM Reasoning Paper • 2404.10642 • Published Apr 16, 2024
Search Self-play: Pushing the Frontier of Agent Capability without Supervision Paper • 2510.18821 • Published Oct 21, 2025 • 19
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published Mar 10 • 6
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published Mar 26 • 54
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination Paper • 2603.24579 • Published Mar 25 • 1
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Paper • 2606.03980 • Published 8 days ago • 13
Search Self-play: Pushing the Frontier of Agent Capability without Supervision Paper • 2510.18821 • Published Oct 21, 2025 • 19
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published Mar 26 • 54
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application Paper • 2510.19631 • Published Oct 22, 2025 • 28
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking Paper • 2510.20168 • Published Oct 23, 2025 • 28