A Review of Automatic Text Summarization Quality Evaluation Based on Attribution Explainability

Main Article Content

Yingying Qi

Keywords

natural language processing, automatic text summarization, explainability, attribution analysis, SHAP

Abstract

As automatic text summarization systems become more widely used, there is growing interest in how to evaluate their outputs in an interpretable and explainable way. The paper reviews existing work on automatic text summarization, explainable evaluation, and attribution-based interpretability methods. Particular attention is given to the differences between ROUGE and BERTScore in evaluating summary quality. Based on these metrics, the study further explores how SHAP can be used to interpret evaluation results generated by ROUGE and BERTScore. Attribution analysis based on ROUGE mainly reflects lexical overlap between generated and reference summaries, and can help identify missing or overemphasized information. In comparison, attribution analysis based on BERTScore is more sensitive to semantic similarity and may provide additional insights into paraphrasing behavior, hallucinations, and missing key information. Future research may focus on improving the efficiency and reliability of attribution-based evaluation methods, as well as exploring how attribution results can be incorporated into summarization model training and refinement.

Abstract 18 | PDF Downloads 11

References

  • [1] Li, J. P., Zhang, C., Chen, X. J., et al. (2021). A review of research on automatic text summarization. Journal of Computer Research and Development, 58(01), 1-21.
  • [2] He, J., Shen, Y., & Xie, R. F. (2025). Identification and optimization of hallucination phenomena in large language models. Journal of Computer Applications, 45(03), 709-714.
  • [3] Yue, Y. F. (2020). Research and application of automatic summarization algorithms based on multi-models [Master’s thesis, China Academy of Electronics and Information Technology]. https://doi.org/10.27728/d.cnki.gdzkx.2020.000013.
  • [4] Wang, Y. R. (2022). Research on generative text summarization methods based on deep learning [Master’s thesis, North China University of Technology]. https://doi.org/10.26926/d.cnki.gbfgu.2022.000619.
  • [5] Chen, C., Chen, J., Zhang, H., et al. (2023). A review of interpretability in deep learning. Computer Science, 50(05), 52-63.
  • [6] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4768-4777). Curran Associates Inc.