VideoMDM: Towards 3D Human Motion Generation From 2D Supervision Paper • 2606.13364 • Published 1 day ago • 10
Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published 14 days ago • 59
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 16 days ago • 90
Self-Improving Language Models with Bidirectional Evolutionary Search Paper • 2605.28814 • Published 16 days ago • 59
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 18 days ago • 27
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 17 days ago • 23
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 17 days ago • 139
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published 21 days ago • 46
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published about 1 month ago • 159
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published May 12 • 33
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published May 12 • 191