Unveiling the Mechanisms Between Writing Style and Academic Impact: A Multidimensional and Machine Learning Perspective

Main Article Content

Ruixuan Yu

Keywords

writing style, scientific impact, academic writing, machine learning, science of science

Abstract

Writing style, as a key non-content factor influencing academic dissemination, remains underexplored in terms of its underlying mechanisms with scientific impact. Drawing on an explainable machine learning perspective, this study investigates the effects of multidimensional writing style features on citation behavior. 23,644 paper abstracts from the Web of Science (WoS) in the fields of computer science, management, and biomedicine were analyzed, extracting four categories of features: readability, emotional and subjective expression, lexical richness, and coherence of expression. These were integrated with 17 baseline indicators to construct an analytical framework. Through ablation experiments, feature importance analysis, and multiple regression, we systematically evaluated the independent contributions and directional effects of each style dimension. The results reveal that readability exerts the strongest predictive power among writing style dimensions and plays the most substantial role in citation behavior. Concise and clear expression significantly enhances citations. Additionally, positive emotional and subjective expressions contribute to improved dissemination outcomes, whereas overly complex vocabulary and excessive cohesion may exert negative effects. This study provides empirical evidence for understanding the relationship between writing style and academic impact, offering practical implications for scholarly writing, journal peer review, and research management.

Abstract 7 | PDF Downloads 3

References

  • [1] Cai, L., Tian, J., Liu, J., Bai, X., Lee, I., Kong, X. and Xia, F. Scholarly impact assessment: a survey of citation weighting solutions. Scientometrics. 2019, 118(2), pp. 453-478. https://doi.org/10.1007/s11192-018-2973-6.
  • [2] Liskiewicz, T., Liskiewicz, G. and Paczesny, J. Factors affecting the citations of papers in tribology journals. Scientometrics. 2021, 126(4), pp. 3321-3336. https://doi.org/10.1007/s11192-021-03870-w.
  • [3] Wang, M., Zhang, J., Jiao, S. and Zhang, T. Evaluating the impact of citations of articles based on knowledge flow patterns hidden in the citations. PLOS ONE. 2019, 14(11), p. e0225276. https://doi.org/10.1371/journal.pone.0225276.
  • [4] Aksnes, D. W., Langfeldt, L. and Wouters, P. Citations, citation indicators, and research quality: An overview of basic concepts and theories. Sage Open. 2019, 9(1), p. 2158244019829575. https://doi.org/10.1177/2158244019829575.
  • [5] McCannon, B. C. Readability and research impact. Economics Letters. 2019, 180, pp. 76-79. https://doi.org/10.1016/j.econlet.2019.02.017.
  • [6] Sienkiewicz, J. and Altmann, E. G. Impact of lexical and sentiment factors on the popularity of scientific papers. Royal Society Open Science. 2016, 3(6), p. 160140. https://doi.org/10.1098/rsos.160140.
  • [7] Mammola, S., Piano, E., Doretto, A., Caprio, E. and Chamberlain, D. Measuring the influence of non-scientific features on citations. Scientometrics. 2022, 127(7), pp. 4123-4137. https://doi.org/10.1007/s11192-022-04421-7.
  • [8] Dowling, M., Hammami, H. and Zreik, O. Easy to read, easy to cite? Economics Letters. 2018, 173, pp. 100-103. https://doi.org/10.1016/j.econlet.2018.09.023.
  • [9] Thelwall, M., Kousha, K., Abdoli, M., Stuart, E., Makita, M., Wilson, P. and Levitt, J. M. Terms in journal articles associating with high quality: can qualitative research be world-leading? Journal of Documentation. 2023, 79(5), pp. 1110-1123. https://doi.org/10.1108/JD-12-2022-0261.
  • [10] Giglio, A. D. and Costa, M. U. P. d. The use of artificial intelligence to improve the scientific writing of non-native English speakers. Revista da Associação Médica Brasileira. 2023, 69(9), p. e20230560. https://doi.org/10.1590/1806-9282.20230560.
  • [11] Vázquez, I. S. Writing with conviction: The use of boosters in modelling persuasion in academic discourses. Revista Alicantina de Estudios Ingleses. 2009, 22, pp. 219-237.
  • [12] Gonsalves, C., Ludwig, S., de Ruyter, K. and Humphreys, A. Writing for Impact in Service Research. Journal of Service Research. 2021, 24(4), pp. 480-499. https://doi.org/10.1177/10946705211024732.
  • [13] Martínez, A. and Mammola, S. Specialized terminology reduces the number of citations of scientific papers. Proceedings of the Royal Society B: Biological Sciences. 2021, 288(1948), p. 20202581. https://doi.org/10.1098/rspb.2020.2581.
  • [14] Ryba, R., Doubleday, Z. A., Dry, M. J., Semmler, C. and Connell, S. D. Better Writing in Scientific Publications Builds Reader Confidence and Understanding. Frontiers in Psychology. 2021, 12, p. 714321. https://doi.org/10.3389/fpsyg.2021.714321.
  • [15] Amon, J. and Hornik, K. Is it all bafflegab? – Linguistic and meta characteristics of research articles in prestigious economics journals. Journal of Informetrics. 2022, 16(2), p. 101284. https://doi.org/10.1016/j.joi.2022.101284.
  • [16] Armstrong, J. S. Unintelligible management research and academic prestige. Interfaces. 1980, 10(2), pp. 80-86. https://doi.org/10.1287/inte.10.2.80.
  • [17] Zhang, Y. and Chen, X. Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval. 2020, 14(1), pp. 1-101. https://doi.org/10.1561/1500000066.
  • [18] Zhou, W. and Wang, X. Human gene therapy: A scientometric analysis. Biomedicine & Pharmacotherapy. 2021, 138, p. 111510. https://doi.org/10.1016/j.biopha.2021.111510.
  • [19] Ruan, Q. Z., Chen, A. D., Cohen, J. B., Singhal, D., Lin, S. J. and Lee, B. T. Alternative metrics of scholarly output: The relationship among altmetric score, mendeley reader score, citations, and downloads in plastic and reconstructive surgery. Plastic and Reconstructive Surgery. 2018, 141(3), pp. 801-809. https://doi.org/10.1097/PRS.0000000000004128.
  • [20] Tonta, Y. and Akbulut, M. Does monetary support increase citation impact of scholarly papers? Scientometrics. 2020, 125(2), pp. 1617-1641. https://doi.org/10.1007/s11192-020-03688-y.
  • [21] Pagel, P. S. and Hudetz, J. A. Scholarly productivity of united states academic cardiothoracic anesthesiologists: Influence of fellowship accreditation and transesophageal echocardiographic credentials on h-index and other citation bibliometrics. Journal of Cardiothoracic and Vascular Anesthesia. 2011, 25(5), pp. 761-765. https://doi.org/10.1053/j.jvca.2011.03.003.
  • [22] Annalingam, A., Damayanthi, H., Jayawardena, R. and Ranasinghe, P. Determinants of the citation rate of medical research publications from a developing country. SpringerPlus. 2014, 3(1), p. 140. https://doi.org/10.1186/2193-1801-3-140.
  • [23] Antoniou, G. A., Antoniou, S. A., Georgakarakos, E. I., Sfyroeras, G. S. and Georgiadis, G. S. Bibliometric analysis of factors predicting increased citations in the vascular and endovascular literature. Annals of Vascular Surgery. 2015, 29(2), pp. 286-292. https://doi.org/10.1016/j.avsg.2014.09.017.
  • [24] Rostami, F., Mohammadpoorasl, A. and Hajizadeh, M. The effect of characteristics of title on citation rates of articles. Scientometrics. 2014, 98(3), pp. 2007-2010. https://doi.org/10.1007/s11192-013-1118-1.
  • [25] So, M., Kim, J., Choi, S. and Park, H. W. Factors affecting citation networks in science and technology: focused on non-quality factors. Quality & Quantity. 2015, 49(4), pp. 1513-1530. https://doi.org/10.1007/s11135-014-0110-z.
  • [26] Baker, H. K., Kumar, S. and Pattnaik, D. Research constituents, intellectual structure, and collaboration pattern in the Journal of Forecasting: A bibliometric analysis. Journal of Forecasting. 2021, 40(4), pp. 577-602. https://doi.org/10.1002/for.2731.
  • [27] Padial, A., Nabout, J., Siqueira, T., Bini, L. and Diniz-Filho, J. Weak evidence for determinants of citation frequency in ecological articles. Scientometrics. 2010, 85(1), pp. 1-12. https://doi.org/10.1007/s11192-010-0231-7.
  • [28] Amara, N., Landry, R. and Halilem, N. What can university administrators do to increase the publication and citation scores of their faculty members? Scientometrics. 2015, 103(2), pp. 489-530. https://doi.org/10.1007/s11192-015-1537-2.
  • [29] Yan, Y., Tian, S. and Zhang, J. The impact of a paper’s new combinations and new components on its citation. Scientometrics. 2020, 122(2), pp. 895-913. https://doi.org/10.1007/s11192-019-03314-6.
  • [30] Chinchilla-Rodríguez, Z., Costas, R., Robinson-García, N. and Larivière, V. Examining the quality of the corresponding authorship field in Web of Science and Scopus. Quantitative Science Studies. 2024, 5(1), pp. 76-97. https://doi.org/10.1162/qss_a_00288.
  • [31] Marx, W. and Bornmann, L. On the causes of subject-specific citation rates in Web of Science. Scientometrics. 2015, 102(2), pp. 1823-1827. https://doi.org/10.1007/s11192-014-1499-9.
  • [32] Liu, W., Ni, R. and Hu, G. Web of Science Core Collection’s coverage expansion: the forgotten Arts & Humanities Citation Index? Scientometrics. 2024, 129(2), pp. 933-955. https://doi.org/10.1007/s11192-023-04917-w.
  • [33] Chi, P.-S. and Glänzel, W. An empirical investigation of the associations among usage, scientific collaboration and citation impact. Scientometrics. 2017, 112(1), pp. 403-412. https://doi.org/10.1007/s11192-017-2356-4.
  • [34] Markusova, V., Bogorov, V. and Libkind, A. Usage metrics vs classical metrics: analysis of Russia’s research output. Scientometrics. 2018, 114(2), pp. 593-603. https://doi.org/10.1007/s11192-017-2597-2.
  • [35] Alyousef, H. S. An SF-MDA of the Textual and the Logical Cohesive Devices in a Postgraduate Accounting Course. Sage Open. 2020, 10(3), p. 2158244020947129. https://doi.org/10.1177/2158244020947129.
  • [36] Lei, L. and Yan, S. Readability and citations in information science: evidence from abstracts and articles of four journals (2003–2012). Scientometrics. 2016, 108(3), pp. 1155-1169. https://doi.org/10.1007/s11192-016-2036-9.
  • [37] Lu, C., Bu, Y., Wang, J., Ding, Y., Torvik, V., Schnaars, M. and Zhang, C. Examining scientific writing styles from the perspective of linguistic complexity. Journal of the Association for Information Science and Technology. 2019, 70(5), pp. 462-475. https://doi.org/10.1002/asi.24126.
  • [38] de Souza, V. M. A. and Feltrim, V. D. An analysis of textual coherence in academic abstracts written in portuguese. Available from: https://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2011/Paper-177.pdf (accessed 26 February 2026).
  • [39] Rabby, G. and Berka, P. Multi-class classification of COVID-19 documents using machine learning algorithms. Journal of Intelligent Information Systems. 2023, 60(2), pp. 571-591. https://doi.org/10.1007/s10844-022-00768-8.
  • [40] Liu, J., Wang, X. and Liang, X. Bibliometric feature identification and analysis of retracted papers in biomedicine: An interpretable machine learning perspective. Information Processing & Management. 2025, 62(5), p. 104176. https://doi.org/10.1016/j.ipm.2025.104176.
  • [41] Alohali, Y. A., Fayed, M. S., Mesallam, T., Abdelsamad, Y., Almuhawas, F. and Hagr, A. A machine learning model to predict citation counts of scientific papers in otology field. BioMed Research International. 2022, 2022(1), p. 2239152. https://doi.org/10.1155/2022/2239152.
  • [42] Xu, Y., Ju, L., Tong, J., Zhou, C.-M. and Yang, J.-J. Machine learning algorithms for predicting the recurrence of stage IV colorectal cancer after tumor resection. Scientific Reports. 2020, 10(1), p. 2519. https://doi.org/10.1038/s41598-020-59115-y.
  • [43] Shanmugasundar, G., Vanitha, M., Čep, R., Kumar, V., Kalita, K. and Ramachandran, M. A comparative study of linear, random forest and adaboost regressions for modeling non-traditional machining. Processes. 2021, 9(11), p. 2015. https://doi.org/10.3390/pr9112015.
  • [44] Yan, R., Huang, C., Tang, J., Zhang, Y. and Li, X. To better stand on the shoulder of giants. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, Washington, DC, 2012; pp. 51-60.
  • [45] Wang, M., Jiao, S., Zhang, J., Zhang, X. and Zhu, N. Identification high influential articles by considering the topic characteristics of articles. IEEE Access. 2020, 8, pp. 107887-107899. https://doi.org/10.1109/ACCESS.2020.3001190.
  • [46] Khan, N., Nauman, M., Almadhor, A. S., Akhtar, N., Alghuried, A. and Alhudhaif, A. Guaranteeing correctness in black-box machine learning: A fusion of explainable AI and formal methods for healthcare decision-making. IEEE Access. 2024, 12, pp. 90299-90316. https://doi.org/10.1109/ACCESS.2024.3420415.
  • [47] Musolf, A. M., Holzinger, E. R., Malley, J. D. and Bailey-Wilson, J. E. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Human Genetics. 2022, 141(9), pp. 1515-1528. https://doi.org/10.1007/s00439-021-02402-z.
  • [48] Żbikowski, K. and Antosiuk, P. A machine learning, bias-free approach for predicting business success using Crunchbase data. Information Processing & Management. 2021, 58(4), p. 102555. https://doi.org/10.1016/j.ipm.2021.102555.
  • [49] Ferguson, C., Merga, M. and Winn, S. Communications in the time of a pandemic: the readability of documents for public consumption. Australian and New Zealand Journal of Public Health. 2021, 45(2), pp. 116-121. https://doi.org/10.1111/1753-6405.13066.
  • [50] Basch, C. H., Mohlman, J., Hillyer, G. C. and Garcia, P. Public health communication in time of crisis: Readability of on-line COVID-19 information. Disaster Medicine and Public Health Preparedness. 2020, 14(5), pp. 635-637. https://doi.org/10.1017/dmp.2020.151.
  • [51] Xu, G., Yu, Z., Yao, H., Li, F., Meng, Y. and Wu, X. Chinese text sentiment analysis based on extended sentiment dictionary. IEEE Access. 2019, 7, pp. 43749-43762. https://doi.org/10.1109/ACCESS.2019.2907772.
  • [52] Sanders, T. J. and Spooren, W. P. Causality and subjectivity in discourse: The meaning and use of causal connectives in spontaneous conversation, chat interactions and written text. Linguistics. 2015, 53(1), pp. 52-93. https://doi.org/10.1515/ling-2014-0034.
  • [53] Vögelin, C., Jansen, T., Keller, S. D., Machts, N. and Möller, J. The influence of lexical features on teacher judgements of ESL argumentative essays. Assessing Writing. 2019, 39, pp. 50-63. https://doi.org/10.1016/j.asw.2018.12.003.
  • [54] Gómez Vera, G., Sotomayor, C., Bedwell, P., Domínguez, A. M. and Jéldrez, E. Analysis of lexical quality and its relation to writing quality for 4th grade, primary school students in Chile. Reading and Writing. 2016, 29(7), pp. 1317-1336. https://doi.org/10.1007/s11145-016-9637-9.
  • [55] Potratz Jill, R., Gildersleeve-Neumann, C. and Redford Melissa, A. Measurement properties of mean length of utterance in school-age children. Language, Speech, and Hearing Services in Schools. 2022, 53(4), pp. 1088-1100. https://doi.org/10.1044/2022_LSHSS-21-00115.
  • [56] Shi, Y., Li, Y. and Li, N. Sentence coherence evaluation based on neural network and textual features for official documents. Electronic Research Archive. 2023, 31(6), pp. 3609-3624. https://doi.org/10.3934/era.2023183.
  • [57] Liang, Z., Mao, J., Lu, K., Ba, Z. and Li, G. Combining deep neural network and bibliometric indicator for emerging research topic prediction. Information Processing & Management. 2021, 58(5), p. 102611. https://doi.org/10.1016/j.ipm.2021.102611.
  • [58] Forster, E. C. Power and paragraphs: academic writing and emotion. Journal of Learning Development in Higher Education. 2020, 667(19), p. 2020. https://doi.org/10.47408/jldhe.vi19.610.
  • [59] Ji, S., Sun, W. and Marttinen, P. Content reduction, surprisal and information density estimation for long documents. arXiv preprint arXiv:2309.06009. 2023. https://doi.org/10.48550/arXiv.2309.06009.
  • [60] Barzilay, R. and Lapata, M. Modeling local coherence: An entity-based approach. Computational Linguistics. 2008, 34(1), pp. 1-34. https://doi.org/10.1162/coli.2008.34.1.1.
  • [61] Khatoon, A., Daud, A. and Amjad, T. Categorization and correlational analysis of quality factors influencing citation. Artificial Intelligence Review. 2024, 57(3), p. 70. https://doi.org/10.1007/s10462-023-10657-3.
  • [62] Alhowyan, A., Mahdi, W. A. and Obaidullah, A. J. Computational intelligence investigations on evaluation of salicylic acid solubility in various solvents at different temperatures. Scientific Reports. 2025, 15(1), p. 7142. https://doi.org/10.1038/s41598-025-90704-x.
  • [63] Chen, T. and Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016; pp. 785–794. https://doi.org/10.1145/2939672.2939785.
  • [64] Wang, H., Liang, Q., Hancock, J. T. and Khoshgoftaar, T. M. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. Journal of Big Data. 2024, 11(1), p. 44. https://doi.org/10.1186/s40537-024-00905-w.
  • [65] Morán-Figueroa, G.-H., Muñoz-Pérez, D.-F., Rivera-Ibarra, J.-L. and Cobos-Lozada, C.-A. Model for predicting maize crop yield on small farms using clusterwise linear regression and GRASP. Mathematics. 2024, 12(21), p. 3356. https://doi.org/10.3390/math12213356.