PiD — Pixel Diffusion Decoder

PiD teaser

Yifan Lu, Qi Wu, Jay Zhangjie Wu, Zian Wang, Huan Ling, Sanja Fidler, Xuanchi Ren

News

[July 2026] PiD v1.5 checkpoints for FLUX, FLUX.2, and Qwen-Image are released. See the comparison page for the improvements:
- Improved decoding color fidelity
- Removed grid artifacts in image corners
- Improved anime and facial details

PiD reformulates the latent-to-pixel decoder as a conditional pixel-space diffusion model, unifying decoding and upsampling into a single generative module. It denoises directly in high-resolution pixel space and produces a super-resolved image in one pass. This repository hosts the released decoder checkpoints, plus the encoder/decoder ("VAE") weights they depend on.

The distilled PiD_* checkpoints in this repo are 4-step distilled. The non-PiD_* entries (ae.safetensors, flux2_ae.safetensors, sdxl_vae.safetensors, QwenImage_VAE_2d.pth, sd3_vae/, rae/, scale_rae/) are the corresponding encoder/decoder VAE weights that PiD plugs into — they're not PiD checkpoints themselves.

License/Terms of Use

This model is released under the NSCLv1 License. The work and any derivative works may only be used for non-commercial (research or evaluation) purposes.

Deployment Geography:

Global

PiD checkpoints

PiD checkpoint variants:

2k - trained at 2048px, used as a 4× decoder (512 LDM → 2048 px), or as an 8× decoder for the Scale-RAE backbone (256 → 2048).
2kto4k_v1pt5 - the recommended up-to-4K decoder for FLUX, FLUX.2, and Qwen-Image latent spaces.
2kto4k - the legacy up-to-4K decoder still used for SD3 and SDXL. The previous FLUX / FLUX.2 / Qwen-Image 2kto4k checkpoints are deprecated and have been moved to checkpoints_deprecated/.

Each checkpoint directory contains a single file, model_ema_bf16.pth, which is the EMA weights cast to bfloat16, the format the inference scripts load by default.

Distilled checkpoints

Backbone	decode 2k resolution only	decode 2k resolution to 4k resolution
flux	`checkpoints/PiD_res2k_sr4x_official_flux_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux_distill_4step`
flux2	`checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux2_distill_4step`
flux2-klein-4b	`checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux2_distill_4step`
flux2-klein-9b	`checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux2_distill_4step`
zimage	`checkpoints/PiD_res2k_sr4x_official_flux_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux_distill_4step`
zimage-turbo	`checkpoints/PiD_res2k_sr4x_official_flux_distill_4step`	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux_distill_4step`
qwenimage	-	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_qwenimage_distill_4step`
qwenimage-2512	-	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_qwenimage_distill_4step`
sd3	`checkpoints/PiD_res2k_sr4x_official_sd3_distill_4step`	`checkpoints/PiD_res2kto4k_sr4x_official_sd3_distill_4step`
sdxl	-	`checkpoints/PiD_res2kto4k_sr4x_official_sdxl_distill_4step`
dinov2	`checkpoints/PiD_res2k_sr4x_official_dinov2_distill_4step`	-
siglip	`checkpoints/PiD_res2k_sr8x_official_siglip_distill_4step`	-

Undistilled checkpoints

VAE	decode 2k resolution to 4k resolution
flux	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux_undistilled`
flux2	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_flux2_undistilled`
qwenimage (wan2.1)	`checkpoints/PiD_v1pt5_res2kto4k_sr4x_official_qwenimage_undistilled`

PixelDiT 2k-to-4k checkpoint

Model	Checkpoint path
PixelDiT	`checkpoints/PixelDiT_finetune_2kto4k`

Latent space → compatible LDMs

A PiD decoder is tied to a latent space, not to a single generative model. Any LDM that produces latents in that space can reuse the same checkpoint. The --backbone aliases below pick the right LDM pipeline; they all decode through the latent space's checkpoint above.

Latent space	VAE / vision encoder weights	compatible `--backbone`	Corresponding LDM Links
Flux1-dev	`checkpoints/ae.safetensors`	`flux`, `zimage`, `zimage-turbo`	FLUX.1-dev, Z-Image, Z-Image-Turbo
Flux2-dev	`checkpoints/flux2_ae.safetensors`	`flux2`, `flux2-klein-4b`, `flux2-klein-9b`	FLUX.2-dev, FLUX.2-klein-4B, FLUX.2-klein-9B
SD3 medium	`checkpoints/sd3_vae/`	`sd3`	SD3-medium
SDXL	`checkpoints/sdxl_vae.safetensors`	`sdxl`	SDXL-base-1.0
Qwen-Image	`checkpoints/QwenImage_VAE_2d.pth`	`qwenimage`, `qwenimage-2512`	Qwen-Image, Qwen-Image-2512
DINOv2-B	`checkpoints/rae/`	`dinov2`	RAE (class-conditional; DINOv2-B)
SigLIP-2	`checkpoints/scale_rae/`	`siglip`	Scale-RAE (text-conditional; nyu-visionx/Scale-RAE-Qwen1.5B_DiT2.4B)

For example, Z-Image and Z-Image-Turbo share Flux1-dev's VAE, so they reuse the flux checkpoints (both 2k and 2kto4k_v1pt5) — no separate zimage checkpoint is shipped. Likewise qwenimage-2512 reuses the qwenimage decoder (same VAE, different transformer).

Usage

The decoder checkpoints are loaded by the inference scripts in the PiD codebase. The exact (backbone, ckpt_type) → path mapping is the single source of truth in pid/_src/inference/checkpoint_registry.py — clone the repo, point it at this snapshot, and the demos pick the right file automatically:

# Pull just the checkpoints/ tree into the repo root (skips this README and
# the teaser figure so they don't clobber the files in the source repo).
hf download nvidia/PiD --local-dir . --include "checkpoints/*"

# Then run any of the demos, e.g.:
PYTHONPATH=. python -m pid._src.inference.from_ldm --backbone flux \
    --prompt "A photorealistic half-body portrait of a brown tabby cat with bold stripes sitting attentively on a rustic wooden kitchen table, soft morning light streaming sideways through a large window, fine fur detail and stripe patterns sharply visible, intense amber-green eyes in razor-sharp focus, warm farmhouse kitchen softly out of focus, cinematic shallow depth of field, ultra-detailed fur texture, photorealistic" \
    --ldm_inference_steps 28 --save_xt_steps 24 \
    --output_dir ./results/official_demo/flux \
    --pid_inference_steps 4

Pick --pid_ckpt_type 2kto4k_v1pt5 for FLUX / FLUX.2 / Qwen-Image 4K decoding. Use --pid_ckpt_type 2kto4k for SD3 / SDXL 4K decoding.

Citation

@article{lu2026pid,
    title={PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion},
    author={Lu, Yifan and Wu, Qi and Wu, Jay Zhangjie and Wang, Zian and Ling, Huan and Fidler, Sanja and Ren, Xuanchi},
    journal={arXiv preprint arXiv:2605.23902},
    year={2026}
}

Downloads last month: 225

Model tree for nvidia/PiD

Base model

Tongyi-MAI/Z-Image

Finetuned

(63)

this model

Finetunes

1 model

Quantizations

2 models

Spaces using nvidia/PiD 13

Paper for nvidia/PiD

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Paper • 2605.23902 • Published May 22 • 47