SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks Paper • 2402.01980 • Published Feb 3, 2024
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans Paper • 2406.15823 • Published Jun 22, 2024
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction Paper • 2606.02540 • Published 13 days ago • 10
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction Paper • 2606.02540 • Published 13 days ago • 10
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 35