Research Progress and Prospects of Fragile Watermarking for Model Integrity Protection
Main Article Content
Keywords
fragile watermarking, model integrity, artificial intelligence security, black-box verification
Abstract
Traditional deep learning models represented by convolutional neural networks face severe security threats in open environments, including model tampering and backdoor injection. As an active defense technique, fragile model watermarking aims to produce a sensitive response to any unauthorized modification of a model, thereby providing a “digital seal” for model integrity protection. This paper systematically reviews fragile watermarking techniques for model integrity protection and categorizes three mainstream paradigms: black-box verification based on sensitive samples, white-box/gray-box authentication based on parameter hashing and reversible embedding, and self-embedding and recovery mechanisms. Through comparative analysis, these three paradigms exhibit distinct advantages. Black-box watermarking offers convenient deployment and is well suited for model-as-a-service scenarios centered on classification tasks; parameter-level watermarking provides fine-grained authentication with cryptographic strength; self-embedding mechanisms extend the protection boundary from models to input content, offering a proactive solution for countering deepfakes. Finally, this paper discusses technical challenges and future development trends, providing references for building a trustworthy AI ecosystem.
References
- [1] Kuttichira, D. P., Gupta, S., Nguyen, D., Rana, S., & Venkatesh, S. (2022). Verification of integrity of deployed deep learning models using Bayesian optimization. Knowledge-Based Systems, 241, 108238. https://doi.org/10.1016/j.knosys.2022.108238
- [2] Yuan, Z., Zhang, X., Wang, Z., & Yin, Z. (2024). Semi-fragile neural network watermarking based on adversarial examples. IEEE Transactions on Emerging Topics in Computational Intelligence, 8(4), 2775–2790. https://doi.org/10.1109/TETCI.2024.3370576
- [3] Botta, M., Cavagnino, D., & Esposito, R. (2021). NeuNAC: A novel fragile watermarking algorithm for integrity protection of neural networks. Information Sciences, 576, 228–241. https://doi.org/10.1016/j.ins.2021.07.004
- [4] Abuadbba, A., Kim, H., & Nepal, S. (2021). DeepiSign: Invisible fragile watermark to protect the integrity and authenticity of CNN. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 952–959). ACM. https://doi.org/10.1145/3412841.3441981
- [5] Zhao, G., Qin, C., Yao, H., & Han, Y. (2022). DNN self-embedding watermarking: Towards tampering detection and parameter recovery for deep neural network. Pattern Recognition Letters, 164, 16–22. https://doi.org/10.1016/j.patrec.2022.10.011
- [6] Yuan, Z., Li, L., Wang, Z., & Zhang, X. (2025). Integrity protection of generative adversarial networks using fragile watermarking. ACM Transactions on Multimedia Computing, Communications, and Applications, 21(12), 1–21. https://doi.org/10.1145/3724332
- [7] Huang, Y., & Zhang, H. (2025). Hierarchical recovery of convolutional neural networks via self-embedding watermarking. In International Conference on Information and Communications Security (pp. 424–441). Springer Nature Singapore.
- [8] Yao, Y., Song, J., & Jin, J. (2026). Hashed watermark as a filter: A unified defense against forging and overwriting attacks in neural network watermarking. In Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35994–36002.
- [9] Yin, Z., Yin, H., Su, H., Zhang, X., & Gao, Z. (2023). Decision-based iterative fragile watermarking for model integrity verification. arXiv. https://arxiv.org/abs/2305.09684
- [10] Robinette, P. K., Nguyen, T. D., Sasaki, S., & Johnson, T. T. (2025). Trigger-based fragile model watermarking for image transformation networks. In European Symposium on Research in Computer Security (pp. 346–365). Springer Nature Switzerland.
- [11] Yin, Y., Yin, H., Yin, Z., Lyu, W., & Wei, S. (2023). High-quality triggers based fragile watermarking for optical character recognition model. In 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 468–475). IEEE. https://doi.org/10.1109/APSIPAASC58517.2023.10317376
- [12] Xiong, C., Feng, G., Li, X., Zhang, X., & Qin, C. (2022). Neural network model protection with piracy identification and tampering localization capability. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 2881–2889). ACM. https://doi.org/10.1145/3503161.3548206
