A BERT-Based Interpretable Deep Learning Model for Medical Diagnosis Assisted by Natural Language Processing

Main Article Content

Yutong Chen

Keywords

deep learning, natural language processing, medical diagnosis, electronic health records, interpretability

Abstract

The combination of deep learning and natural language processing holds great potential in the medical field. This study aims to explore and develop a deep learning-based natural language processing model to assist in medical diagnosis. We used a large, de-identified dataset of electronic health records, which includes patient complaints, medical histories, and final diagnoses. First, we preprocessed these unstructured text data, for example, by tokenization and removing stop words. Then, we built a deep learning model based on a pre-trained language model like BERT. This model can automatically extract key features from clinical texts and learn the complex relationships between these features and specific diseases. Experimental results show that our proposed model achieves significantly higher accuracy and recall than traditional baseline models on several disease prediction tasks. More importantly, we also introduced an attention mechanism, which gives the model a certain level of interpretability: it can highlight the keywords or phrases that most influence the diagnostic decision. We conclude that deep learning and natural language processing can not only improve the accuracy of disease prediction but also provide valuable references for clinicians, thereby enhancing the overall quality and efficiency of medical services.

Abstract 16 | PDF Downloads 10

References

  • [1] Jensen, P. B., Jensen, L. J., & Brunak, S. (2012). Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6), 395-405.
  • [2] Ford, E., Carroll, J. A., Smith, H. E., Scott, D., & Cassell, J. A. (2016). Extracting information from the text of electronic medical records to improve case detection: a systematic review. Journal of the American Medical Informatics Association, 23(5), 1007-1015.
  • [3] Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., ... & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24-29.
  • [4] Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1), 1-9.
  • [5] De Faria, C. L., & Santos, R. P. (2025). Preprocessing narrative texts in electronic medical records to identify hospital adverse events: A scoping review. Artificial Intelligence in Medicine, 162, 102987.
  • [6] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • [7] Ford, E., Carroll, J., Smith, H., Davies, K., Koeling, R., Petersen, I., ... & Cassell, J. (2016). What evidence is there for a delay in diagnostic coding of RA in UK general practice records? An observational study of free text. BMJ Open, 6(6), e010393.
  • [8] Zhang, Y., Li, Q., & Liu, X. (2025). Generalizing machine learning models from clinical free text. Scientific Reports, 15, 31668.
  • [9] Fleuren, L. M., Thoral, P., Shillan, D., Ercole, A., Elbers, P. W., & Right Data, Right Now, Right Here Collaborative. (2026). Discovery of data quality issues in electronic health records: profound consequences for critical care medicine applications – a systematized review. Critical Care, 30, 19.
  • [10] Wang, H., Chen, L., & Zhang, J. (2025). Semantics-driven improvements in electronic health records data quality: a systematic review. BMC Medical Informatics and Decision Making, 25, 298.