-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 86 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Emmanuel Sugutt
Sugutt
AI & ML interests
Reinforcement learning
Transformer models
Recent Activity
updated a Space about 12 hours ago
Sugutt/kln_whisper_v3_turbo published a Space about 12 hours ago
Sugutt/kln_whisper_v3_turbo liked a model about 12 hours ago
Sugutt/whisper-v3-kalenjin-turbo