OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B
Video-Text-to-Text • 9B • Updated • 492 • 9
Computer Vision
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
RIVER: A Real-Time Interaction Benchmark for Video LLMs