Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

1UIUC, 2Georgia Tech
* denotes equal contribution

TL;DR: Immune2V is an image immunization framework that protects images against unauthorized dual-stream image-to-video generation, preventing illicit video synthesis from protected content.

💭 Abstract

Image-to-video (I2V) generation has the potential for societal harm because it enables the unauthorized animation of static images to create realistic deepfakes. While existing defenses effectively protect against static image manipulation, extending these to I2V generation remains underexplored and non-trivial. In this paper, we systematically analyze why modern I2V models are highly robust against naive image-level adversarial attacks (i.e., immunization). We observe that the video encoding process rapidly dilutes the adversarial noise across future frames, and the continuous text-conditioned guidance actively overrides the intended disruptive effect of the immunization. Building on these findings, we propose the Immune2V framework which enforces temporally balanced latent divergence at the encoder level to prevent signal dilution, and aligns intermediate generative representations with a precomputed collapse-inducing trajectory to counteract the text-guidance override. Extensive experiments demonstrate that Immune2V produces substantially stronger and more persistent degradation than adapted image-level baselines under the same imperceptibility budget.

🧠 Method

Pipeline

Immune2V Framework. Our method simultaneously targets the spatial-temporal and semantic streams to ensure persistent disruption. The Spatial-Temporal Attack employs a balanced encoder loss and dense targets to recover vanishing optimization signals across temporal segments. The Semantic Attack hijacks DiT guidance by forcing intermediate representations to mimic a precomputed collapse trajectory, neutralizing the model's iterative semantic correction. Together, these joint perturbations induce severe structural breakdown across the entire generated video.

📹 Results

Immune2V on Wan 2.1

Clean images generate realistic motion, while immunized images disrupt generation and produce structurally implausible outputs.

Geoffrey Hinton bursts forward with fast, angry strikes.
Clean Image
Immunized Image
Yoshua Bengio slams the table with sudden anger.
Clean Image
Immunized Image
A man furiously shaking a mounted orchid plant
Clean Image
Immunized Image
A dancer twirling during a street performance
Clean Image
Immunized Image
A car driving through a roundabout
Clean Image
Immunized Image
A fishing boat moving across the water
Clean Image
Immunized Image
A miniature train circling along a curved model railway
Clean Image
Immunized Image
A car is turning on the road
Clean Image
Immunized Image

Qualitative Comparisons

A BMX rider riding and jumping on a dirt track
Clean
Random Gaussian
PhotoGuard-E
PhotoGuard-D
Ours
A horse rider jumping over a hurdle
Clean
Random Gaussian
PhotoGuard-E
PhotoGuard-D
Ours

Immune2V on Other Architectures

DynamiCrafter [1]
I2VGenXL [2]
Lucia Clean
Lucia Immunized
Parkour Clean
Parkour Immunized
Dog Clean
Dog Immunized
Swing Clean
Swing Immunized
Violin Clean
Violin Immunized
Tandem Clean
Tandem Immunized

BibTeX

@article{long2026immune2v,
  title={Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation},
  author={Long, Zeqian and Kara, Ozgur and Xue, Haotian and Chen, Yongxin and Rehg, James M},
  journal={arXiv preprint arXiv:2604.10837},
  year={2026}}

References

  1. Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, and Aleksander Madry. Raising the Cost of Malicious AI-Powered Image Editing. Proceedings of the International Conference on Machine Learning (ICML), 2023.
  2. Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, and Tien-Tsin Wong. DynamiCrafter: Animating Open-Domain Images with Video Diffusion Priors. Proceedings of the European Conference on Computer Vision (ECCV), 2024.
  3. Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou. I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145, 2023.