An Interpretability Study of the Air Quality Index in Lanzhou City Based on Random Forest

Main Article Content

Guoxuan Zu

Keywords

Lanzhou city, air quality index, random forest, feature importance

Abstract

This study is based on monthly air quality data of Lanzhou City from 2014 to 2022. A Random Forest model is constructed to predict the Air Quality Index (AQI), and interpretability is achieved through feature importance analysis. The results show that PM10, NO2, and PM2.5 are the core factors driving AQI variation, with a cumulative weight of 84.7%. Among them, PM10 (46.1%) plays a dominant role, followed by NO2 (21.9%). The annual average concentration of PM2.5 decreased by 42.4% compared with 2014, indicating significant governance effects, yet it remains a key influencing factor. The concentration of NO2 is higher in winter than in summer, reflecting the impacts of heating and motor vehicles. The model achieves an R2 of 0.900 on the training set, which decreases to 0.405 on the test set, indicating the presence of overfitting. This study quantitatively identifies the key pollution factors affecting AQI in Lanzhou City, providing a basis for air pollution prevention and control. However, future research should incorporate meteorological factors and improve the model’s generalization ability.

Abstract 17 | PDF Downloads 14

References

  • [1] Wang, Y. S., Zhang, J. K., Wang, L. L., Hu, B., Tang, G. Q., Liu, Z. R., … Ji, D. S. (2014). Significance, current status, and prospects of atmospheric haze pollution research in the Beijing–Tianjin–Hebei region. Advances in Earth Science, 29(3), 388–396.
  • [2] Anonymous. (2021). Air Pollution Control Engineering (4th ed.). China University Teaching, (5), 98.
  • [3] Chen, J. C., Dilinuer, Y., Wang, T. Y., Wang, J. Y., Sun, C. X., Xie, X. S., & Feng, W. (2022). Forecasting air pollutant concentrations in Changsha based on machine learning. Environmental Protection Science, 48(4), 103–112. https://doi.org/10.16803/j.cnki.issn.1004-6216.2022.04.017
  • [4] Xia, X. S., Chen, J. J., Wang, J. J., & Cheng, X. F. (2020). Analysis of influencing factors of PM2.5 concentrations in China based on a Random Forest model. Environmental Science, 41(5), 2057–2065. https://doi.org/10.13227/j.hjkx.201910126
  • [5] Dong, J. Q., Hu, D. M., Yan, Y. L., Peng, L., Zhang, P. H., Niu, Y. Y., & Duan, X. L. (2023). Identifying driving factors of urban O3 based on interpretable machine learning. Environmental Science, 44(7), 3660–3668. https://doi.org/10.13227/j.hjkx.202208214
  • [6] Wang, S. S., Wan, Y. Q., Tong, J. L., Liu, Y. L., Liu, H. T., & Ao, C. J. (2025). Source apportionment and quantitative analysis of PM2.5 in the main urban area of Lanzhou based on PMF and XGBoost-SHAP. Acta Scientiae Circumstantiae, 45(4), 313–321. https://doi.org/10.13671/j.hjkxxb.2024.0510
  • [7] Kaggle. (2026, January 4). Kaggle datasets. https://www.kaggle.com/