Mamba Convolutional Hybrid Spatio-Temporal Medical Image Generation Based on Diffusion Probabilistic Models

Main Article Content

Mei Zhang
Zhengjie Liang

Keywords

medical image generation, Mamba, 4-dimensional data, long-range dependencies, diffusion model

Abstract

We introduce a novel hybrid deep learning module, termed the Mamba-Spatial-Temporal Generator (MSTG), which integrates the strengths of Convolutional Neural Networks (CNNs) with the advanced Mamba architecture. While conventional CNNs are effective in extracting local features within diffusion models, their limited receptive field restricts their capacity to capture long-range dependencies. To overcome this limitation, MSTG first employs CNN-based convolutional and pooling layers to extract multi-level local features, and subsequently incorporates Mamba blocks founded on State Space Models (SSMs). Owing to its linear computational complexity and powerful long-sequence modeling capability, Mamba adaptively selects and fuses global contextual information. Through this synergistic design, MSTG retains the local perceptual advantages of CNNs while simultaneously leveraging the global dynamic modeling capacity of Mamba. As a result, it significantly improves the understanding of complex spatial and sequential dependencies without compromising computational efficiency. This module has a clear structure and good scalability, providing a new and effective way to improve the performance of cardiac medical image generation tasks for 4D data.

Abstract 0 | PDF Downloads 0

References

  • Bao, F., Li, C., Zhu, J., & Zhang, B. (2022). ANALYTIC-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models [Paper presentation]. ICLR 2022 - 10th International Conference on Learning Representations, Virtual.
  • Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models [Paper presentation]. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
  • Huang, Y. (2024). Research advanced in image generation based on diffusion probability model. Highlights in Science, Engineering and Technology, 85, 452–456. https://doi.org/10.54097/WAYBGZ41
  • Jiang, H., Imran, M., Zhang, T., Zhou, Y., Liang, M., Gong, K., & Shao, W. (2025). Fast-DDPM: Fast denoising diffusion probabilistic models for medical image-to-image generation. IEEE Journal of Biomedical and Health Informatics, 29(10), 7326–7335. https://doi.org/10.1109/JBHI.2025.3565183
  • Joshi, A., & Hong, Y. (2022). Diffeomorphic image registration using lipschitz continuous residual networks. Proceedings of Machine Learning Research, 172, 1–13.
  • Joshi, A., & Hong, Y. (2023). R2Net: Efficient and flexible diffeomorphic image registration using lipschitz continuous residual networks. Medical Image Analysis, 89, Article 102917. https://doi.org/10.1016/J.MEDIA.2023.102917
  • Khadra, K., & Türkbey, U. (2024). Evaluating utility of memory efficient medical image generation: A study on lung nodule segmentation. arXiv preprint, arXiv:2410.12542. https://doi.org/10.48550/arXiv.2410.12542
  • Khazrak, I., Takhirova, S., Rezaee, M. M., Yadollahi, M., Green II, R. C., & Niu, S. (2024). Addressing small and imbalanced medical image datasets using generative models: A comparative study of DDPM and PGGANs with random and greedy K sampling. arXiv preprint, arXiv:2412.12532. https://doi.org/10.48550/arXiv.2412.12532
  • Khosravi, B., Rouzrokh, P., Mickley, J. P., Faghani, S., Mulford, K., Yang, L., Larson, A. N., Howe, B. M., Erickson, B. J., Taunton, M. J., & Wyles, C. C. (2023). Few-shot biomedical image segmentation using diffusion models: Beyond image generation. Computer Methods and Programs in Biomedicine, 242, Article 107832. https://doi.org/10.1016/J.CMPB.2023.107832
  • Kim, B., & Ye, J. C. (2022). Diffusion deformable model for 4d temporal medical image generation [Paper presentation]. Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, Singapore.
  • Krishna, A., Wang, G., & Mueller, K. (2024). Multi-conditioned denoising diffusion probabilistic model (mDDPM) for medical image synthesis. arXiv preprint, arXiv:2409.04670. https://doi.org/10.48550/arXiv.2409.04670
  • Li, C., Liu, X., Li, W., Wang, C., Liu, H., Liu, Y., Chen, Z., & Yuan, Y. (2025). U-kan makes strong backbone for medical image segmentation and generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4652–4660. https://doi.org/10.1609/aaai.v39i5.32491
  • Liu, Y., Yang, Z., Yu, Z., Liu, Z., Liu, D., Lin, H., Li, M., Ma, S., Avdeev, M., & Shi, S. (2023). Generative artificial intelligence and its applications in materials science: Current situation and future perspectives. Journal of Materiomics, 9(4), 798–816. https://doi.org/10.1016/J.JMAT.2023.05.001
  • Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint, arXiv:1411.1784. https://doi.org/10.48550/arXiv.1411.1784
  • Müller-Franzes, G., Niehues, J. M., Khader, F., Arasteh, S. T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nebelung, S., Kather, J. N., & Truhn, D. (2022). Diffusion probabilistic models beat GANs on medical images. arXiv preprint, arXiv:2212.07501. https://doi.org/10.1038/s41598-023-39278-0
  • Yang, Z., Chen, Z., Sun, Y., Strittmatter, A., Raj, A., Allababidi, A., Rink, J. S., & Zöllner, F. G. (2025). seg2med: A bridge from artificial anatomy to multimodal medical images. arXiv preprint, arXiv:2504.09182. https://doi.org/10.48550/arXiv.2504.09182