LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 10 days ago • 138
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms Paper • 2505.17190 • Published May 22, 2025 • 3
PPT: Token Pruning and Pooling for Efficient Vision Transformers Paper • 2310.01812 • Published Oct 3, 2023 • 1
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 516
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 54
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model Paper • 2509.20706 • Published Sep 25, 2025 • 3
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets Paper • 2509.21245 • Published Sep 25, 2025 • 40
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24, 2025 • 84
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines Paper • 2509.21320 • Published Sep 25, 2025 • 101
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30, 2025 • 282
Sadeed: Advancing Arabic Diacritization Through Small Language Model Paper • 2504.21635 • Published Apr 30, 2025 • 59
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12, 2025 • 77
Agent models: Internalizing Chain-of-Action Generation into Reasoning models Paper • 2503.06580 • Published Mar 9, 2025 • 20