AI for Content Creation Workshop

@ CVPR 2026

3rd June 2026 — 8:25am MDT

Location: Room 610/612, Colorado Convention Center, Denver, CO, USA

Remote (Zoom): TBD



Summary

Content creation plays a crucial role in domains such as photography, videography, virtual reality, gaming, art, design, fashion, and advertising design. Recent progress in machine learning and AI has transformed hours of manual, painstaking content creation work into minutes or seconds of automated or interactive work. For instance, generative modeling approaches can produce photorealistic images of 2D and 3D items such as humans, landscapes, interior scenes, virtual environments, clothing, or even industrial designs. New large text, image, and video models that share latent spaces let us imaginatively describe scenes and have them realized automatically—with new multi-modal approaches able to generate consistent video and audio across long timeframes. Such approaches can also super-resolve and super-slomo videos, interpolate and extrapolate with novel views, decompose scene objects and appearance, and transfer styles to convincingly render and reinterpret content. Learned priors of images, videos, and 3D data can also be combined with explicit appearance and geometric constraints, perceptual understanding, or even functional and semantic constraints of objects. While often creating awe-inspiring artistic images, such techniques offer unique opportunities for generating diverse synthetic training data for downstream computer vision tasks, both in 2D, video, and 3D domains.

The AI for Content Creation workshop explores this exciting and fast-moving research area. We bring together invited speakers of world-class expertise in content creation, up-and-coming researchers, and authors of submitted workshop papers, to engage in a day filled with learning, discussion, and network building.

Welcome! -
James Tompkin (Brown University)
Krishna Kumar Singh (Adobe)
Jun-Yan Zhu (Carnegie Mellon University)
Yuheng Li (Adobe)
Deqing Sun (Google)
Lingjie Liu (University of Pennsylvania)
Lu Jiang (ByteDance)
Yiqing Liang (Luma AI)
Thao Nguyen (UW-Madison)



Firefly Video (Adobe, 2025), Genie 2 (DeepMind, 2024), SORA (OpenAI, 2024).

Topics

We seek contributions across content creation, including but not limited to techniques for content creation:

We also seek contributions in domains and applications for content creation:


2026 Schedule

All times in MDT (Mountain Daylight Time, UTC-6) — Room 610/612, Colorado Convention Center, Denver, CO, USA

Time
8:25 Welcome and introductions 👋
8:30 Rana Hanocka (University of Chicago)
9:00 Alan Yuille (Johns Hopkins University)
9:30 Christian Theobalt (Max-Planck-Institute for Informatics)
10:00 Coffee Break / Poster Session
11:00 Orals
11:30 Taesung Park (Reve)
12:00 Saining Xie (New York University, AMI Labs)
12:30 Closing Remarks 👋


Cat4D (Google, 2024), AssetGen (Meta, 2024), DreamFusion (Google, 2022).

2026 Accepted Papers

Congratulations to all accepted authors!

Oral presentations
  1. RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
    Xiang Fan, Yuheng Wang, Bohan Fang, Jason Ren, Ranjay Krishna [https://refdecoder.github.io/]
  2. Style-Instructed Mask-Free Virtual Try On
    Mengqi Zhang, Qi Li, Mehmet Saygin Seyfioglu, Karim Bouyarmane [https://smf-vto.github.io/]
  3. Pixels to Layers: Turning Generated Infographics into Editable Assets
    Abhay Bhandarkar, Raghav Kaushik Ravi, Vineeth Balasubramanian
  4. Magnitude-preserving Layers for GANs Can Produce Small High-quality Models
    Nick Huang, Jackson Woodleigh, Aaron Gokaslan, Xinjie Yi, James Tompkin
  5. Teaching an Agent to Sketch One Part at a Time
    Xiaodan Du [https://xiaodan.io/teaching-an-agent-to-sketch/]
  6. FlowStyle: Flow-Guided Diffusion for Image-Guided Shot-Level Video Stylization
    Boyuan Zhu, Ruiqi Liu, Rui Ma
Poster presentations
  1. DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers
    Dahye Kim, Deepti Ghadiyaram, Raghudeep Gadde [https://ddit-fast.github.io/ddit/]
  2. Copy-Transform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints
    Rotem Gatenyo, Ohad Fried [https://rotemgat.github.io/CopyTransformPaste/] — CVPR 2026
  3. SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
    YU Yuan, Tharindu Wickremasinghe, Zeeshan Nadir, Xijun Wang, Yiheng Chi, Stanley Chan [https://yuyuanspace.com/SeeU/]
  4. Learning to Place Objects with Programs and Iterative Self Training
    Adrian Chang, Kai Wang, Yuanbo Li, Manolis Savva, Angel Chang, Daniel Ritchie
  5. Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
    Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Jackson Wang [https://snap-research.github.io/kontinuouskontext/]
  6. Padding Tokens as Cross-Modal Registers in Multimodal Diffusion Transformers
    Jiafeng Mao, Qianru Qiu, Xueting Wang
  7. Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
    Hyunsoo Cha, Byungjun Kim, Hanbyul Joo [https://hyunsoocha.github.io/durian/] — ICLR 2026
  8. SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals
    Soyeon Yoon — CVPR 2026
  9. Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models
    Ruchit Rawal, Reza Shirkavand, Sayak Paul, Yuxin Wen, Heng Huang, Yizheng Chen, Tom Goldstein, Gowthami Somepalli
  10. Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards
    Seungwook Kim, Minsu Cho [https://wookiekim.github.io/SOLACE/]
  11. Latent Scaffolding: Training-Free Image Variations via Vision-Language Weight Splicing
    Gowthami Somepalli, Sravani Somepalli
  12. SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
    Jiongze Yu, Xiangbo Gao, Pooja Verlani, Akshay Gadde, Yilin Wang, Balu Adsumilli, Zhengzhong Tu
  13. VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
    Kim Sung-Bin [https://voicecraft-dub.github.io/] — ICCV 2025
  14. It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models
    Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros [https://akoepke.github.io/divgen/index.html]
  15. PAVAS: Physics-Aware Video-to-Audio Synthesis
    Oh Hyun-Bin, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh, Yuki Mitsufuji (Sony Group Corporation, SonyAI) [https://physics-aware-video-to-audio-synthesis.github.io/]
  16. Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision
    Hyunsoo Cha, Wonjung Woo, Byungjun Kim, Hanbyul Joo [https://hyunsoocha.github.io/vanast/]
  17. MIST: Debiasing Pre-trained T2I Models with Cross-Attention Steering via Global-Token Alignment
    Hidir Yesiltepe, Kiymet Akdemir, Pinar Yanardag [https://mist-diffusion.github.io/]
  18. Cognitive Canvas: Cognitive Enhancement through Text-to-Image Diffusion Models
    Kiymet Akdemir, Matthew Zheng, Pinar Yanardag
  19. Relightful Video Portrait Harmonization
    Jun Myeong Choi, Jae Shin Yoon, Luchao Qi, Roni Sengupta, Joon-Young Lee

Please email Thao (thao.nguyen@wisc.edu) to add/edit paper arXiv or project pages. Thank you!



Dall-E 2 (OpenAI, 2022), Imagen (Google, 2022), GauGAN2 (NVIDIA, 2021).

Previous Workshops (including session videos)