Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".
Xuan Yang
TorresYang
·
AI & ML interests
LLM reasoning, agent
Recent Activity
updated a collection about 3 hours ago
RUT-Bench new activity about 4 hours ago
Miaow-Lab/RUT-Bench:Add task categories and link to paper updated a collection about 23 hours ago
RUT-Bench