view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org PeterKruger • Aug 20, 2025 • 6
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model PeterKruger • Apr 29, 2025 • 6