None defined yet.
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
DUEL: Adversarial Self-Play for Multimodal Reasoning