Research on Data Quality Analysis Based on Data Mining
Main Article Content
Keywords
data mining, data quality, analysis
Abstract
As the process of digitalization continues to advance, big data plays a pivotal role across various industries. However, the quality of data directly impacts the results of data analysis and the accuracy of decision-making. High-quality data can provide a reliable basis for decisions, whereas poor-quality data may lead to erroneous judgments and cause incalculable losses. Therefore, the management and enhancement of data quality have become central issues in contemporary data management. Data mining technology, as a powerful tool, excels not only in extracting valuable information from large-scale datasets but also in demonstrating significant potential in improving data quality. By mining and analyzing the patterns and characteristics hidden within data, it is possible to identify and rectify defects, thereby enhancing the overall quality of data and increasing the reliability of data-driven decision-making. This paper delves into the definition and dimensions of data quality, explores the fundamental principles of data mining technology, and examines its specific applications in data quality analysis, aiming to offer new insights and methods for data quality management.
References
- Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1-52. https://doi.org/10.1145/1541880.1541883
- Berti-Équille, L. (2007). Measuring and modelling data quality for quality-awareness in data mining. In F. J. Guillet & H. J. Hamilton (Eds.), Quality measures in data mining (pp. 101-126). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-44918-8_5
- Deng, W., & Wang, G. (2017). A novel water quality data analysis framework based on time-series data mining. Journal of Environmental Management, 196, 365-375. https://doi.org/10.1016/J.JENVMAN.2017.03.024
- DeRosa, M. (2004). Data mining and data analysis for counterterrorism. CSIS Press.
- Haghverdi, A., Öztürk, H. S., & Cornelis, W. M. (2014). Revisiting the pseudo continuous pedotransfer function concept: Impact of data quality and data mining method. Geoderma, 226-227(1), 31-38. https://doi.org/10.1016/J.GEODERMA.2014.02.026
- Jeihouni, M., Toomanian, A., & Mansourian, A. (2020). Decision tree-based data mining and rule induction for identifying high quality groundwater zones to water supply management: A novel hybrid use of data mining and GIS. Water Resources Management, 34(1), 139-154. https://doi.org/10.1007/S11269-019-02447-W
- Luebbers, D., Grimmer, U., & Jarke, M. (2003). Systematic development of data mining-based data quality tools. In J.-C. Freytag, P. Lockemann, S. Abiteboul, M. Carey, P. Selinger, & A. Heuer (Eds.), Proceedings 2003 vldb conference (pp. 548-559). Morgan Kaufmann. https://doi.org/https://doi.org/10.1016/B978-012722442-8/50055-0
- Nie, G., Zhang, L., Liu, Y., Zheng, X., & Shi, Y. (2009). Decision analysis of data mining project based on Bayesian risk. Expert Systems with Applications, 36(3 PART 1), 4589-4594. https://doi.org/10.1016/J.ESWA.2008.05.014
- Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4 PART 1), 1432-1462. https://doi.org/10.1016/J.ESWA.2013.08.042
- Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers [Paper presentation]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NE, USA.