-
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper • 2411.00225 • Published • 11 -
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
Paper • 2410.22901 • Published • 8 -
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Paper • 2506.18898 • Published • 35
Zhongwei Zhang
zzwustc
AI & ML interests
AIGC
Recent Activity
upvoted a paper about 4 hours ago
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models upvoted a paper about 8 hours ago
MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation upvoted a paper about 11 hours ago
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models