Text-to-Video

Ultra Flash ⚡

Ultra Flash is a cascaded streaming framework capable of real-time high-resolution video generation. It achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU.

Paper | Project Page | Code (GitHub)

Overview

While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions. Ultra Flash bridges this gap by cascading three key components after a low-resolution streaming generator:

  1. Architecture-Preserving T2V-to-TV2V SR Training with AIGC-oriented degradation.
  2. Causal Streaming Latent Upsampler (~2M params, <5% overhead) for spatiotemporal coherence.
  3. Cascaded Streaming Optimization (sparse distillation, DPO, and dynamic cache management).

Architecture

The framework follows a cascaded pipeline:

  • Self-Forcing Generator: Based on Wan2.1-1.3B, producing 480P streaming latents.
  • Causal Latent Upsampler: Performs 2x or 3x spatial upsampling in the latent space.
  • Sparse SR DiT: Refines high-resolution latents using single-step denoising and block-sparse attention.
  • Tiny Decoder: A causal memory network for efficient latent-to-pixel decoding at 1K/2K.

Quick Start

Installation

conda create -n ultraflash python=3.10 -y
conda activate ultraflash
cd inference
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

# Block Sparse Attention (CUDA kernel, required for SR DiT)
git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
cd Block-Sparse-Attention
pip install -e .

Inference

For custom inference at high resolution:

cd inference
python inference.py \
    --config_path configs/self_forcing_dmd_4step.yaml \
    --checkpoint_path checkpoints/self_forcing_dmd.pt \
    --data_path prompts/examples.txt \
    --output_folder outputs/ \
    --use_ema \
    --tiny_decoder \
    --torch_compile \
    --compile_sr_dit

Citation

@inproceedings{luxury2026ultraflash,
  title={Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions},
  author={Luxury and Huang, Jie and Fan, Zihao and Ma, Xiaoxiao and Li, Yuming and Zhuang, Jun-hao and Xue, Zeyue and Fu, Siming and Li, Haoran and Zhong, Mingchen and Zhang, Guohui and Ma, Shichen and Liu, Yijun and Shi, Jiaqi and Ma, Yanwen and Su, Yaofeng and Wang, Haoyu and Li, Yaowei and Zhang, Songchun and Jin, Weiyang and Bian, Yuxuan and Zhang, Shiyi and Xu, Haojun and Lu, Shuai and Han, Xin and Tang, Wei and Huang, Haoyang and Duan, Nan},
  booktitle={arXiv preprint},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for xin1u/UltraFlash