Design and Evaluation of an Intelligent Agent-Based Home Healthcare System for Clinical Decision Support
Main Article Content
Keywords
medical intelligent agent, clinical decision support, retrieval-augmented generation, medication counseling, clinical guideline retrieval, differential diagnosis, medical AI safety
Abstract
The rapid development of large language models (LLMs) has stimulated growing interest in medical intelligent agents for clinical decision support. However, existing systems often suffer from limited grounding in authoritative medical knowledge, potential safety risks, and a tendency to generate definitive diagnostic conclusions without sufficient clinical context. In this work, we present the design of a medical intelligent agent aimed at supporting clinical decision-making through evidence-grounded information retrieval and safety-aware interaction. The proposed system focuses on two primary functions: (i) providing drug usage guidance, dosage information, and food–drug interaction warnings based on authoritative medical knowledge sources, and (ii) retrieving relevant clinical guidelines in response to patient-reported symptoms to assist clinicians with differential diagnostic considerations rather than definitive diagnoses. To mitigate safety risks, the agent is explicitly constrained to avoid diagnostic claims and instead emphasizes guideline-based recommendations and referral suggestions when appropriate. The agent integrates structured medical knowledge retrieval with natural language interaction, enabling users to obtain context-aware, interpretable and clinically relevant responses. By grounding outputs in curated medical references and enforcing non-diagnostic constraints, the system aims to reduce hallucinations and enhance reliability in medical consultations. This work highlights the potential of retrieval-augmented medical intelligent agents as supportive tools for clinical decision support, medical education, and patient-facing health information services, while underscoring the importance of safety, transparency, and scope limitation in medical AI deployment.
References
- [1] L. Laranjo, A. G. Dunn, H. L. Tong, A. B. Kocaballi, J. Chen, R. Bashir, D. Surian, B. Gallego, F. Magrabi, A. Y. S. Lau, and E. Coiera, “Conversational agents in healthcare: A systematic review,” Journal of the American Medical Informatics Association, vol. 25, no. 9, pp. 1248– 1258, 2018.
- [2] R. Shan, X. Ding, L. Chen, A. B. Kocaballi, L. Laranjo, and E. Coiera, “Language use in conversational agent-based health communication: Systematic review,” Journal of Medical Internet Research, vol. 24, no. 7, p. e37403, 2022.
- [3] H. Zhang, X. Li, Y. Wang, Y. Li, Y. Zhang, Y. Shen, R. Zhang, Z. Liu, and M. Sun, “Huatuogpt: Towards taming language model to be a doctor,” 2023.
- [4] C. Zakka, A. Chaurasia, R. Shad, A. R. Dalal, J. L. Kim, M. Moor, K. Alexander, E. Ashley, J. Boyd, K. Boyd, K. Hirsch, C. Langlotz, J. Nelson, and W. Hiesinger, “Almanac: Retrieval-augmented language models for clinical medicine,” 2023.
- [5] G. Xiong, Q. Jin, Z. Lu, and A. Zhang, “Benchmarking retrieval- augmented generation for medicine,” 2024.
- [6] X. Tang, A. Zou, Z. Zhang, Z. Li, Y. Zhao, X. Zhang, A. Cohan, and M. Gerstein, “Medagents: Large language models as collaborators for zero-shot medical reasoning,” 2023.
- [7] L. Yue, S. Xing, J. Chen, and T. Fu, “Clinicalagent: Clinical trial multi- agent system with large language model-based reasoning,” 2024.
- [8] Y. Zhu, C. Ren, S. Xie, S. Liu, H. Ji, Z. Wang, T. Sun, L. He, Z. Li, X. Zhu, and C. Pan, “Realm: Rag-driven enhancement of multimodal electronic health records analysis via large language models,” 2024.
- [9] K. Wu, E. Wu, A. Casasola, A. Zhang, K. Wei, T. Nguyen, S. Riantawan, P. Shi, D. E. Ho, and J. Zou, “How well do llms cite relevant medical references? an evaluation framework and analyses,” 2024.
- [10] K. Wu, E. Wu, K. Wei, A. Zhang, A. Casasola, T. Nguyen, S. Riantawan, P. Shi, D. Ho, and J. Zou, “An automated framework for assessing how well llms cite relevant medical references,” Nature Communications, vol. 16, p. 3615, 2025.
- [11] T. Han, A. Kumar, C. Agarwal, and H. Lakkaraju, “Medsafetybench: Evaluating and improving the medical safety of large language models,” 2024.
- [12] C. Ross, “Ibm’s watson recommended ‘unsafe and in- correct’ cancer treatments, internal documents show,” STAT News, Jul. 2018, accessed: 2026-02-13. [Online]. Avail- able: https://www.statnews.com/2018/07/25/ibm-watson-recommended- unsafe-incorrect-treatments/
- [13] IBM, “Francisco partners to acquire ibm’s healthcare data and analytics assets,” IBM Newsroom, Jan. 2022, accessed: 2026-02-13. [Online]. Available: https://newsroom.ibm.com/2022-01-21-Francisco- Partners-to-Acquire-IBMs-Healthcare-Data-and-Analytics-Assets
- [14] EURORDIS-Rare Diseases Europe, “The diagnosis odyssey of people living with a rare disease,” Rare Barometer survey report, May 2024, accessed: 2026-02-13. [Online]. Available: https://www.eurordis.org/publications/rb-diagnosis-odyssey/
- [15] J. Gillette, M. Lu, and T. F. Heston, “Large language models perform at chance level in the diagnosis of pediatric pneumonia using chest radiographs,” Cureus, vol. 17, no. 9, p. e92596, Sep. 2025.
- [16] UW Medicine Newsroom, “Ai fails to reliably detect pediatric pneumonia on x-ray,” UW Medicine - Newsroom, Dec. 2025, accessed: 2026-02-13. [Online]. Available: https://newsroom.uw.edu/blog/ai-fails- to-reliably-detect-pediatric-pneumonia-on-x-ray
- [17] H. Kim, J. Sohn, A. Gilson, N. Cochran-Caggiano, S. Applebaum, H. Jin, S. Park, Y. Park, J. Park, S. Choi, B. A. H. Contreras, T. Huang, J. Yun, E. F. Wei, R. Jiang, L. Colucci, E. Lai, A. Dave, T. Guo, M. B. Singer, Y. Koo, R. A. Adelman, J. Zou, A. Taylor, A. Cohan, H. Xu, and Q. Chen, “Rethinking retrieval-augmented generation for medicine: A large-scale, systematic expert evaluation and practical insights,” 2025.
