Optimization of Machine-Learning-Based Enterprise Loan Default Prediction Models: A Comparative Study with Traditional Scoring Methods
Main Article Content
Keywords
enterprise loan default, machine learning, credit scoring, SHAP analysis, dynamic risk
Abstract
This study leverages public datasets such as CSMAR and Wind to conduct a comparative analysis between machine learning models (XGBoost, LightGBM) and traditional credit scoring models (Logistic Regression, Altman Z-score) for optimizing enterprise loan default prediction models. The research addresses three critical challenges: data imbalance, integration of non-financial information, and model interpretability. By employing the SMOTE oversampling technique and SHAP value analysis, the model performance was significantly enhanced. Experimental results demonstrate that the XGBoost model achieved an AUC of 0.85, markedly outperforming the traditional Logistic Regression model (AUC=0.72). Furthermore, incorporating sentiment data improved the recall rate by 15%. The contributions of this study are threefold: first, it systematically compares the performance differences between machine learning and traditional models in enterprise loan default prediction; second, it proposes a dynamic risk assessment framework integrating financial and non-financial features, enhancing the model’s timeliness and adaptability; third, it improves model transparency through interpretable AI techniques (SHAP analysis), aligning with the regulatory requirements of Basel III and providing theoretical and practical support for risk management in commercial banks.
References
- Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, vol. 23, no. 4, pp. 589-609.
- Beck, T., Demirguc-Kunt, A. and Levine, R. (2005). SMEs, growth, and poverty: Cross-country evidence. Journal of economic growth, vol. 10, no. 3, pp. 199-229.
- Bis (2010). Basel III: International framework for liquidity risk measurement, standards and monitoring, Basel, Switzerland: Bank for International Settlements.
- Breiman, L. (2001). Random forests. Machine learning, vol. 45, no. 1, pp. 5-32.
- Campello, M., Graham, J. R. and Harvey, C. R. (2010). The real effects of financial constraints: Evidence from a financial crisis. Journal of financial Economics, vol. 97, no. 3, pp. 470-487.
- Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, vol. 16, pp. 321-357.
- Delong, E. R., Delong, D. M. and Clarke-Pearson, D. L. (1988). Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics, vol. 44, no. 3, pp. 837-845.
- Diamond, D. W. and Rajan, R. G. (2001). Liquidity risk, liquidity creation, and financial fragility: A theory of banking. Journal of political Economy, vol. 109, no. 2, pp. 287-327.
- Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232.
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. (2017). Published. Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems (NeurIPS 2017), 2017 Long Beach, CA. Curran Associates, Inc., pp. 3149-3157.
- Lessmann, S., Baesens, B., Seow, H.-V. and Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, vol. 247, no. 1, pp. 124-136.
- Loughran, T. and Mcdonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of accounting research, vol. 54, no. 4, pp. 1187-1230.
- Lundberg, S. M. and Lee, S.-I. (2017). Published. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS 2017), December 4–9 2017 Long Beach, CA. Curran Associates, Inc., pp. 4765-4774.
- Moscatelli, M., Parlapiano, F., Narizzano, S. and Viggiano, G. (2020). Corporate default forecasting with machine learning. Expert Systems with Applications, vol. 161, p. 113567.
- Ribeiro, M. T., Singh, S. and Guestrin, C. (2016). Published. " Why should i trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), August 13–17 2016 San Francisco, CA. Association for Computing Machinery (ACM), pp. 1135-1144.
- Zhou, Z. (2016). Machine Learning, Beijing: Tsinghua University Press.
