A Review of Multimodal Affective Computing in the Field of Mental Health Monitoring

Main Article Content

Shunyao Zhang

Keywords

multimodal affective computing, mental health, depression monitoring, multimodal data processing

Abstract

Mental health is of great significance for an individual's development. As the worldwide mental health crisis grows more severe, depression and anxiety have turned into the primary causes of disability. Traditional approaches to evaluating mental health mainly depend on patients' self-reports and clinical interviews, which are frequently limited by strong subjectivity, high latency and social stigma, resulting in missed diagnoses. To overcome these limitations, multimodal affective computing integrates multi-source data such as text, speech, vision, and physiological signals to provide a revolutionary technical means for the objective quantification and monitoring of psychological states in real-time. This review systematically summarizes the technological advancements in this field, covering multimodal data processing, feature extraction, fusion architectures, and emotion recognition models. It also explores various application scenarios such as depression monitoring, stress management, and crisis intervention. Nevertheless, its application also encounters some difficulties, including technical problems, worries about data privacy and ethical dilemmas. Corresponding strategies involving technological invention, privacy-protecting computing, and ethical frameworks are discussed. Finally, this review concludes that multimodal affective computing has potential to transform mental health care. Its achievement depends on continued technological refinement and responsible, human-centered implementation within interdisciplinary collaboration.

Abstract 16 | PDF Downloads 13

References

  • [1] World Health Organization, World mental health today: latest data. Geneva: World Health Organization, 2025. [Online]. Available: https://iris.who.int/
  • [2] L. Taylor, “One billion people have mental health conditions, WHO says,” BMJ, vol. 390, p. r1860, Sep. 2025. doi: 10.1136/bmj.r1860
  • [3] P. Cruz-Gonzalez et al., “Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications,” Psychological Medicine, vol. 55, no. 3, p. e18, Feb. 2025. doi: 10.1017/S0033291724003295
  • [4] Y.-H. Hu, R.-Y. Wu, M.-Y. Su, I.-L. Lin, and C.-C. Shen, “Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application,” JMIR Mental Health, vol. 11, p. e53457, Jan. 2024. doi: 10.2196/53457
  • [5] M. Schlicher, Y. Li, S. M. K. Murthy, Q. Sun, and B. W. Schuller, “Emotionally adaptive support: a narrative review of affective computing for mental health,” npj Digital Medicine, 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC12568696/
  • [6] Y. Wu et al., “A Comprehensive Review of Multimodal Emotion Recognition,” Biomimetics, vol. 10, no. 7, p. 418, 2025. doi: 10.3390/biomimetics10070418
  • [7] D. Mamieva et al., “Attention-Based Approach to Multimodal Emotion Recognition,” Sensors, vol. 23, no. 12, p. 5475, Jun. 2023. doi: 10.3390/s23125475
  • [8] J. Chen et al., “Multimodal digital assessment of depression: integration of actigraphy and a mobile app,” Translational Psychiatry, vol. 14, no. 1, p. 154, Mar. 2024. doi: 10.1038/s41398-024-02873-4
  • [9] R. Huang et al., “Exploring the Role of First-Person Singular Pronouns in Detecting Suicidal Ideation: A Machine Learning Analysis of Clinical Transcripts,” Diagnostics, vol. 14, no. 3, p. 225, Jan. 2024. doi: 10.3390/diagnostics14030225
  • [10] C. Á. Casado et al., “Depression Recognition Using Remote Photoplethysmography From Facial Videos,” IEEE Transactions on Affective Computing, vol. 14, no. 4, pp. 3125-3137, Oct. 2023. doi: 10.1109/TAFFC.2023.3274567
  • [11] M. Zhao et al., “Decoupled Multi-Perspective Fusion for Speech Depression Detection,” IEEE Transactions on Affective Computing, vol. 16, no. 3, pp. 1772-1786, Jul.-Sep. 2025. doi: 10.1109/TAFFC.2025.3538519
  • [12] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR, 2021. doi: 10.48550/arXiv.2010.11929
  • [13] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 12449-12460, 2020. [Online]. Available: https://arxiv.org/abs/2006.11477
  • [14] N. Janardhan et al., “Improving Depression Prediction Accuracy Using Fisher Score-Based Feature Selection and Dynamic Ensemble Selection Approach Based on Acoustic Features of Speech,” Traitement du Signal, vol. 39, no. 1, pp. 87-107, Feb. 2022. doi: 10.18280/ts.390109.
  • [15] S. Minaee, M. Minaei, and A. Abdolrashidi, “Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network,” Sensors, vol. 21, no. 9, p. 3046, May 2021. doi: 10.3390/s21093046.
  • [16] A. Shenoy and A. Sardana, “Multilogue-Net: A Context-Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020, pp. 19-30. doi: 10.18653/v1/2020.acl-main.3.
  • [17] S. Siriwardhana, A. Kaluarachchi, M. Billinghurst, and S. Nanayakkara, “Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion,” IEEE Access, vol. 8, pp. 176274-176285, Sep. 2020. doi: 10.1109/ACCESS.2020.3026823.
  • [18] N. K. Iyortsuun, S.-H. Kim, H.-J. Yang, S.-W. Kim, and M. Jhon, “Additive cross-modal attention network (ACMA) for depression detection based on audio and textual features,” IEEE Access, vol. 12, pp. 20479-20489, 2024. doi: 10.1109/ACCESS.2024.3361111
  • [19] G. J. Martinez et al., “Alignment Between Heart Rate Variability From Fitness Trackers and Perceived Stress: Perspectives From a Large-Scale In Situ Longitudinal Study of Information Workers,” JMIR Mhealth Uhealth, vol. 10, no. 3, p. e33754, Mar. 2022. doi: 10.2196/33754
  • [20] W. Huang et al., “Mobile apps, AI, and teletherapy: a comprehensive review of digital mental health tools for nurse,” Frontiers in Public Health, vol. 13, 2025. doi: 10.3389/fpubh.2025.1686766
  • [21] S. Jere and A. P. Patil, “Detection of Suicidal Ideation Based on Relational Graph Attention Network with DNN Classifier,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 10s, pp. 321–332, Jul. 2023. [Online]. Available: https://www.ijisae.org/index.php/IJISAE/article/view/3255
  • [22] R. Xiao, C. Ding, and X. Hu, “Time Synchronization of Multimodal Physiological Signals through Alignment of Common Signal Types and Its Technical Considerations in Digital Health,” Journal of Imaging, vol. 8, no. 5, p. 120, 2022. doi: 10.3390/jimaging8050120.
  • [23] K. Zhao et al., “Multimodal Sentiment Analysis—A Comprehensive Survey From a Fusion Methods Perspective,” IEEE Access, vol. 11, pp. 91321-91345, 2023. doi: 10.1109/ACCESS.2023.3308316.
  • [24] Y. Li et al., “Automated Depression Detection From Text and Audio: A Systematic Review,” IEEE Journal of Biomedical and Health Informatics, May 2025. doi: 10.1109/JBHI.2025.3570900.
  • [25] M. Richter et al., “Generalizability of clinical prediction models in mental health,” Molecular Psychiatry, 2025.
  • [26] V. Farsadaki et al., “AI affective computing and behavioral health,” Frontiers in Computer Science, vol. 7, 2025. doi: 10.3389/fcomp.2025.1692728.
  • [27] A. Mandal et al., “Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities,” arXiv preprint arXiv:2502.00451, Feb. 2025. [Online]. Available: https://arxiv.org/abs/2502.00451
  • [28] P. Dubey et al., “Federated learning for privacy-enhanced mental health prediction with multimodal data integration,” Computer Methods in Biomechanics and Biomedical Engineering, 2025. doi: 10.1080/21681163.2025.2509672
  • [29] S. T. Shah et al., “Federated Learning in Public Health: A Systematic Review of Decentralized, Equitable, and Secure Disease Prevention Approaches,” International Journal of Environmental Research and Public Health, 2025. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC12607528/
  • [30] P. Singhal et al., “Domain adaptation for bias mitigation in affective computing: use cases for facial emotion recognition and sentiment analysis systems,” Discover Applied Sciences, vol. 7, p. 229, 2025. doi: 10.1007/s42452-025-06659-1
  • [31] J. Huang et al., “Multimodal alignment and hierarchical fusion network for multimodal sentiment analysis,” Electronics, vol. 14, no. 19, p. 3828, 2024. doi: 10.3390/electronics14193828.
  • [32] Z. Liu et al., “Intelligent assessment of English teachers’ classroom language interaction and emotional behaviour based on artificial intelligence,” Sci Rep, 2025. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC12528482/
  • [33] E. A. Mantey et al., “Federated Learning Approach for Secured Medical Recommendation in Internet of Medical Things Using Homomorphic Encryption,” IEEE Journal of Biomedical and Health Informatics, Jun. 2024. doi: 10.1109/JBHI.2024.3411032
  • [34] L. Xia et al., “Enabling University Mental Health Monitoring Through 6G Edge-Cloud IoT Environmental Perception Frameworks,” IEEE Communications Standards Magazine, Jan. 2025. doi: 10.1109/MCOMSTD.2024.00045
  • [35] S. Hameed et al., “Explainable AI-driven depression detection from social media using natural language processing and black box machine learning models,” Frontiers in Artificial Intelligence, vol. 7, 2025. doi: 10.3389/frai.2025.1627078
  • [36] S. J. M. Smith et al., “AI–Supported Shared Decision-Making (AI-SDM): Conceptual Framework,” Journal of Medical Internet Research, vol. 27, p. e75866, Jan. 2025. doi: 10.2196/75866
  • [37] A. Kerasidou, “Artificial intelligence and the ongoing need for empathy, compassion and trust in healthcare,” Bulletin of the World Health Organization, 2020.