Research Progress and Prospects of Fragile Watermarking for Model Integrity Protection

Zixin Zhou

doi:10.70267/ic-aimees.20260243251

Zixin Zhou

School of Computer Science, Wuhan University, Wuhan, 430072, China

DOI: https://doi.org/10.70267/ic-aimees.20260243251

Keywords

fragile watermarking, model integrity, artificial intelligence security, black-box verification

Abstract

Traditional deep learning models represented by convolutional neural networks face severe security threats in open environments, including model tampering and backdoor injection. As an active defense technique, fragile model watermarking aims to produce a sensitive response to any unauthorized modification of a model, thereby providing a “digital seal” for model integrity protection. This paper systematically reviews fragile watermarking techniques for model integrity protection and categorizes three mainstream paradigms: black-box verification based on sensitive samples, white-box/gray-box authentication based on parameter hashing and reversible embedding, and self-embedding and recovery mechanisms. Through comparative analysis, these three paradigms exhibit distinct advantages. Black-box watermarking offers convenient deployment and is well suited for model-as-a-service scenarios centered on classification tasks; parameter-level watermarking provides fine-grained authentication with cryptographic strength; self-embedding mechanisms extend the protection boundary from models to input content, offering a proactive solution for countering deepfakes. Finally, this paper discusses technical challenges and future development trends, providing references for building a trustworthy AI ecosystem.

Abstract 20 | PDF Downloads 11

References

[1] Kuttichira, D. P., Gupta, S., Nguyen, D., Rana, S., & Venkatesh, S. (2022). Verification of integrity of deployed deep learning models using Bayesian optimization. Knowledge-Based Systems, 241, 108238. https://doi.org/10.1016/j.knosys.2022.108238
[2] Yuan, Z., Zhang, X., Wang, Z., & Yin, Z. (2024). Semi-fragile neural network watermarking based on adversarial examples. IEEE Transactions on Emerging Topics in Computational Intelligence, 8(4), 2775–2790. https://doi.org/10.1109/TETCI.2024.3370576
[3] Botta, M., Cavagnino, D., & Esposito, R. (2021). NeuNAC: A novel fragile watermarking algorithm for integrity protection of neural networks. Information Sciences, 576, 228–241. https://doi.org/10.1016/j.ins.2021.07.004
[4] Abuadbba, A., Kim, H., & Nepal, S. (2021). DeepiSign: Invisible fragile watermark to protect the integrity and authenticity of CNN. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 952–959). ACM. https://doi.org/10.1145/3412841.3441981
[5] Zhao, G., Qin, C., Yao, H., & Han, Y. (2022). DNN self-embedding watermarking: Towards tampering detection and parameter recovery for deep neural network. Pattern Recognition Letters, 164, 16–22. https://doi.org/10.1016/j.patrec.2022.10.011
[6] Yuan, Z., Li, L., Wang, Z., & Zhang, X. (2025). Integrity protection of generative adversarial networks using fragile watermarking. ACM Transactions on Multimedia Computing, Communications, and Applications, 21(12), 1–21. https://doi.org/10.1145/3724332
[7] Huang, Y., & Zhang, H. (2025). Hierarchical recovery of convolutional neural networks via self-embedding watermarking. In International Conference on Information and Communications Security (pp. 424–441). Springer Nature Singapore.
[8] Yao, Y., Song, J., & Jin, J. (2026). Hashed watermark as a filter: A unified defense against forging and overwriting attacks in neural network watermarking. In Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35994–36002.
[9] Yin, Z., Yin, H., Su, H., Zhang, X., & Gao, Z. (2023). Decision-based iterative fragile watermarking for model integrity verification. arXiv. https://arxiv.org/abs/2305.09684
[10] Robinette, P. K., Nguyen, T. D., Sasaki, S., & Johnson, T. T. (2025). Trigger-based fragile model watermarking for image transformation networks. In European Symposium on Research in Computer Security (pp. 346–365). Springer Nature Switzerland.
[11] Yin, Y., Yin, H., Yin, Z., Lyu, W., & Wei, S. (2023). High-quality triggers based fragile watermarking for optical character recognition model. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 468–475). IEEE. https://doi.org/10.1109/APSIPAASC58517.2023.10317376
[12] Xiong, C., Feng, G., Li, X., Zhang, X., & Qin, C. (2022). Neural network model protection with piracy identification and tampering localization capability. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 2881–2889). ACM. https://doi.org/10.1145/3503161.3548206

PDF

Published

Jun 10, 2026

Conference Proceedings Volume

Vol. 14 (2026): Proceedings of the 2nd International Conference on Artificial Intelligence, Modern Engineering and Environmental Sustainability (IC-AIMEES 2026)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Zhou, Zixin. “Research Progress and Prospects of Fragile Watermarking for Model Integrity Protection”. Exploring Science Academic Conference Series, vol. 14, June 2026, pp. 243-51, https://doi.org/10.70267/ic-aimees.20260243251.

Download Citation

Research Progress and Prospects of Fragile Watermarking for Model Integrity Protection

Main Article Content

Keywords

Abstract

References

Article Sidebar

How to Cite