One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications Paper • 2606.25621 • Published 9 days ago • 18
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains Paper • 2606.14702 • Published 21 days ago • 31
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 23 days ago • 89
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer Paper • 2605.30409 • Published May 28 • 42
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published May 29 • 62