OpenGVLab/InternVL3_5-241B-A28B-Pretrained
Image-Text-to-Text • 241B • Updated • 73 • 1
Computer Vision
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
RIVER: A Real-Time Interaction Benchmark for Video LLMs