Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions
Paper • 2606.09150 • Published
Ultra Flash is a cascaded streaming framework capable of real-time high-resolution video generation. It achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU.
Paper | Project Page | Code (GitHub)
While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions. Ultra Flash bridges this gap by cascading three key components after a low-resolution streaming generator:
The framework follows a cascaded pipeline:
conda create -n ultraflash python=3.10 -y
conda activate ultraflash
cd inference
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
# Block Sparse Attention (CUDA kernel, required for SR DiT)
git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
cd Block-Sparse-Attention
pip install -e .
For custom inference at high resolution:
cd inference
python inference.py \
--config_path configs/self_forcing_dmd_4step.yaml \
--checkpoint_path checkpoints/self_forcing_dmd.pt \
--data_path prompts/examples.txt \
--output_folder outputs/ \
--use_ema \
--tiny_decoder \
--torch_compile \
--compile_sr_dit
@inproceedings{luxury2026ultraflash,
title={Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions},
author={Luxury and Huang, Jie and Fan, Zihao and Ma, Xiaoxiao and Li, Yuming and Zhuang, Jun-hao and Xue, Zeyue and Fu, Siming and Li, Haoran and Zhong, Mingchen and Zhang, Guohui and Ma, Shichen and Liu, Yijun and Shi, Jiaqi and Ma, Yanwen and Su, Yaofeng and Wang, Haoyu and Li, Yaowei and Zhang, Songchun and Jin, Weiyang and Bian, Yuxuan and Zhang, Shiyi and Xu, Haojun and Lu, Shuai and Han, Xin and Tang, Wei and Huang, Haoyang and Duan, Nan},
booktitle={arXiv preprint},
year={2026}
}