Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

💭 Abstract

Image-to-video (I2V) generation has the potential for societal harm because it enables the unauthorized animation of static images to create realistic deepfakes. While existing defenses effectively protect against static image manipulation, extending these to I2V generation remains underexplored and non-trivial. In this paper, we systematically analyze why modern I2V models are highly robust against naive image-level adversarial attacks (i.e., immunization). We observe that the video encoding process rapidly dilutes the adversarial noise across future frames, and the continuous text-conditioned guidance actively overrides the intended disruptive effect of the immunization. Building on these findings, we propose the Immune2V framework which enforces temporally balanced latent divergence at the encoder level to prevent signal dilution, and aligns intermediate generative representations with a precomputed collapse-inducing trajectory to counteract the text-guidance override. Extensive experiments demonstrate that Immune2V produces substantially stronger and more persistent degradation than adapted image-level baselines under the same imperceptibility budget.

🧠 Method

Immune2V Framework. Our method simultaneously targets the spatial-temporal and semantic streams to ensure persistent disruption. The Spatial-Temporal Attack employs a balanced encoder loss and dense targets to recover vanishing optimization signals across temporal segments. The Semantic Attack hijacks DiT guidance by forcing intermediate representations to mimic a precomputed collapse trajectory, neutralizing the model's iterative semantic correction. Together, these joint perturbations induce severe structural breakdown across the entire generated video.

📹 Results

Immune2V on Wan 2.1

Clean images generate realistic motion, while immunized images disrupt generation and produce structurally implausible outputs.

Geoffrey Hinton bursts forward with fast, angry strikes.

Clean Image

Immunized Image

Yoshua Bengio slams the table with sudden anger.

Clean Image

Immunized Image

A man furiously shaking a mounted orchid plant

Clean Image

Immunized Image

A dancer twirling during a street performance

Clean Image

Immunized Image

A car driving through a roundabout

Clean Image

Immunized Image

A fishing boat moving across the water

Clean Image

Immunized Image

A miniature train circling along a curved model railway

Clean Image

Immunized Image

A car is turning on the road

Clean Image

Immunized Image

Qualitative Comparisons

A BMX rider riding and jumping on a dirt track

Clean

Random Gaussian

PhotoGuard-E

PhotoGuard-D

Ours

A horse rider jumping over a hurdle

Clean

Random Gaussian

PhotoGuard-E

PhotoGuard-D

Ours

Immune2V on Other Architectures

DynamiCrafter [1]

I2VGenXL [2]

Lucia Clean

Lucia Immunized

Parkour Clean

Parkour Immunized

Dog Clean

Dog Immunized

Swing Clean

Swing Immunized

Violin Clean

Violin Immunized

Tandem Clean

Tandem Immunized

BibTeX

@article{long2026immune2v,
  title={Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation},
  author={Long, Zeqian and Kara, Ozgur and Xue, Haotian and Chen, Yongxin and Rehg, James M},
  journal={arXiv preprint arXiv:2604.10837},
  year={2026}}

References

Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, and Aleksander Madry. Raising the Cost of Malicious AI-Powered Image Editing. Proceedings of the International Conference on Machine Learning (ICML), 2023.
Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, and Tien-Tsin Wong. DynamiCrafter: Animating Open-Domain Images with Video Diffusion Priors. Proceedings of the European Conference on Computer Vision (ECCV), 2024.
Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou. I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145, 2023.

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

TL;DR: Immune2V is an image immunization framework that protects images against unauthorized dual-stream image-to-video generation, preventing illicit video synthesis from protected content.

💭 Abstract

🧠 Method

📹 Results

Immune2V on Wan 2.1

Qualitative Comparisons

Immune2V on Other Architectures

BibTeX

References