An Integrated Learning-Based Model for Intelligent Classification of Logistics Claims Risk and Prediction of Payout Amounts
Main Article Content
Keywords
logistics claims risk identification, stacking ensemble learning, MLP model, classification evaluation
Abstract
In recent years, the rapid development of the express and logistics industry has led to a continuous increase in shipment volume, accompanied by a growing number of claims disputes and payment risks. The intelligent automation of the claims settlement process has become a crucial direction for enterprises to reduce costs and improve efficiency. For Problem 1, this paper proposes a claims risk labeling model based on quantile distribution analysis and monotonicity constraints. The method employs a combined strategy of equal-frequency binning, quantile estimation, PAVA monotonic regression, and PCHIP continuous interpolation to ensure the classification boundaries are smooth and align with business logic. The results show that the proportions for the three classified risk groups are: Reasonable Claims at 85.12%, Elevated Claims at 12.19%, and Severe Overclaims at 2.70%. To address problem 2, this paper proposes a Stacking ensemble learning-based model for predicting the actual claim payment amount in logistics compensation. By performing feature selection, enhanced feature engineering, and target variable transformation on historical waybill data, we employed multiple machine learning algorithms, including CatBoost, MLP neural networks, XGBoost, and LightGBM, and combined them using a Stacking ensemble strategy to optimize prediction performance. For problem 3, this paper proposes a risk labeling classification model based on an MLP model to determine the risk level category of shipping orders. To address the severe class imbalance in the dataset, the SMOTE oversampling method was employed to balance the sample distribution. At the model level, a CatBoost classifier, a neural network (MLP), and a Support Vector Machine (SVM) were trained separately, and their performance was compared via cross-validation. The results indicate that the MLP model performed best, achieving a weighted F1-Score of 0.9098 and a training accuracy of 95.55%, significantly outperforming CatBoost and SVM. This demonstrates the model's strong discriminative capability.
References
- [1] Zhu Yuke, Yi Jingjing, Liu Peilin, et al. Forecasting of Saltwater Intrusion in the Pearl River Estuary Based on Interpretable Machine Learning Algorithms[J]. Water Resources and Power, 2025, 43(10): 18-22.
- [2] Bao Xueying, Cao Taiyao, Li Fengxia, et al. Spatial Heterogeneity Analysis of Carbon Emissions from Plateau Railway Engineering Construction Based on the CatBoost-SHAP Method[J/OL]. Environmental Science, 2025: 1-16[2025-10-30].
- [3] Zhang Dengpan, Lan Zheng, Du Yiheng. Short-Term Wind Power Prediction Based on PCHIP-VMD Data Analysis and SSA-LSTM Model[J]. Journal of Electronic Measurement and Instrumentation, 2025, 39(5): 251-261.
- [4] Ye Yuanbo, Li Duanchao, Wang Shenghe, et al. Pilot Protection for New Energy Connected Power Grids Based on Generalized S-Transform and Pearson Correlation Coefficient[J]. Journal of Electric Power Science and Technology, 2024, 39(6): 194-202.
- [5] CHILAKARAO M, BEHERA K S, RATHA K A, et al. Wheat Leaf Disease Detection with ANOVA-Driven Feature Selection and Whale Optimization Algorithm[J/OL]. Journal of Harbin Institute of Technology (New Series), 2025: 1-12[2025-10-30].
