SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models Paper • 2511.05459 • Published Nov 7, 2025 • 5
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 5 days ago • 95
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published Apr 20 • 22