A Comparative Analysis of FCNN and CNN Architectures for Speech Denoising Across Diverse Noise Frequencies

Xueqing Ma

doi:10.70267/cai.25v2n3.4554

Xueqing Ma

College of Information Engineering, Hangzhou Dianzi University, Hangzhou, China

DOI: https://doi.org/10.70267/cai.25v2n3.4554

Keywords

speech denoising, FCNN, CNN, noise frequency analysis, SNR, RMSE, common voice dataset

Abstract

Speech denoising remains a critical challenge in audio signal processing, especially under non-stationary noise conditions. While convolutional neural networks (CNNs) have been widely adopted for speech enhancement, the potential of fully connected neural networks (FCNNs) remains underexplored, particularly under frequency-varying noise scenarios. This study presents a systematic comparative analysis of FCNN and CNN architectures for speech denoising across multiple noise frequencies. Using the Common Voice dataset, we introduced diverse noise types at 8 kHz, 16 kHz, and 44 kHz to evaluate the denoising performance of both models. Experimental results demonstrate a frequency-dependent performance disparity: at 8 kHz, both models perform similarly, with CNN showing marginally higher Signal-to-Noise Ratio (SNR) and Root Mean Square Error (RMSE). At 16 kHz, CNN achieves significantly higher SNR albeit with increased RMSE, indicating a trade-off between noise suppression and spectral fidelity. At 44 kHz, CNN comprehensively outperforms FCNN, attaining superior SNR (4.80, +0.04) and lower RMSE (2.6826, –0.1556). These findings underscore the architectural advantages of CNNs in broad-frequency and complex noise environments, while revealing FCNN’s applicability in narrowband scenarios. This research highlights the necessity of frequency-aware model selection and provides novel insights into the comparative efficacy of FCNN and CNN in speech denoising.

Abstract 0 | PDF Downloads 0

References

Aroudi, A., Veisi, H., & Sameti, H. (2015). Hidden markov model-based speech enhancement using multivariate laplace and gaussian distributions. IET Signal Processing, 9(2), 177-185. https://doi.org/10.1049/IET-SPR.2014.0032
Azarang, A., & Kehtarnavaz, N. (2020). A review of multi-objective deep learning speech denoising methods. Speech Communication, 122, 1-10. https://doi.org/10.1016/J.SPECOM.2020.04.002
Balasubrahmanyam, M., & Valarmathi, R. S. (2024). An intelligent speech enhancement model using enhanced heuristic-based residual convolutional neural network with encoder-decoder architecture. International Journal of Speech Technology, 27(3), 637-656. https://doi.org/10.1007/S10772-024-10127-3
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113-120. https://doi.org/10.1109/TASSP.1979.1163209
Garg, A., & Sahu, O. P. (2022). Deep convolutional neural network-based speech signal enhancement using extensive speech features. International Journal of Computational Methods, 19(8), Article 1420056. https://doi.org/10.1142/S0219876221420056
Hao, J., Lee, T. W., & Sejnowski, T. J. (2010). Speech enhancement using Gaussian scale mixture models. IEEE Transactions on Audio, Speech and Language Processing, 18(6), 1127-1136. https://doi.org/10.1109/TASL.2009.2030012
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7-8), 588-601. https://doi.org/10.1016/J.SPECOM.2006.12.006
Huang, P., & Wu, Y. (2023). Teacher-student training approach using an adaptive gain mask for LSTM-based speech enhancement in the airborne noise environment. Chinese Journal of Electronics, 32(4), 882-895. https://doi.org/10.23919/CJE.2022.00.307
Le, X., Lei, T., Chen, K., & Lu, J. (2022). Inference skipping for more efficient real-time speech enhancement with parallel RNNs. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 2411-2421. https://doi.org/10.1109/TASLP.2022.3190738
Lin, Z., Wang, J., Li, R., Shen, F., & Xuan, X. (2025). PrimeK-net: Multi-scale spectral learning via group prime-kernel convolutional neural networks for single channel speech enhancement [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Hyderabad, India.
Loizou, P. C. (2013). Speech enhancement: Theory and practice. CRC Press. https://doi.org/10.1201/B14529
Mai, Y., & Goetze, S. (2025). MetricGAN+KAN: Kolmogorov-arnold networks in metric-driven speech enhancement systems [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Hyderabad, India.
Nuha, H. H., & Absa, A. A. (2022). Noise reduction and speech enhancement using wiener filter [Paper presentation]. 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, Bandung, Indonesia.
Pandey, A., & Wang, D. L. (2022). Self-attending RNN for speech enhancement to improve cross-corpus generalization. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 1374-1385. https://doi.org/10.1109/TASLP.2022.3161143
Pascual, S., Serrà, J., & Bonafonte, A. (2019). Time-domain speech enhancement using generative adversarial networks. Speech Communication, 114, 10-21. https://doi.org/10.1016/J.SPECOM.2019.09.001
Reddy, H., Kar, A., & Østergaard, J. (2022). Performance analysis of low complexity fully connected neural networks for monaural speech enhancement. Applied Acoustics, 190, Article 108627. https://doi.org/10.1016/J.APACOUST.2022.108627
Saha, B., Khan, S., Shahnaz, C., Fattah, S. A., Islam, M. T., & Khan, A. I. (2018). Configurable digital hearing aid system with reduction of noise for speech enhancement using spectral subtraction method and frequency dependent amplification [Paper presentation]. IEEE Region 10 Annual International Conference, Proceedings/TENCON, Jeju, Korea.
Shamsa, A., Ghorshi, S., & Joorabchi, M. (2016). Noise reduction using multi-channel FIR warped Wiener filter [Paper presentation]. 13th International Multi-Conference on Systems, Signals and Devices, SSD 2016, Leipzig, Germany.
Soleymanpour, R., Soleymanpour, M., Brammer, A. J., Johnson, M. T., & Kim, I. (2023). Speech enhancement algorithm based on a convolutional neural network reconstruction of the temporal envelope of speech in noisy environments. IEEE Access, 11, 5328-5336. https://doi.org/10.1109/ACCESS.2023.3236242
Tang, X., Du, J., Chai, L., Wang, Y., Wang, Q., & Lee, C. H. (2020). Geometry constrained progressive learning for LSTM-based speech enhancement [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Barcelona, Spain.
Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio Speech and Language Processing, 26(10), 1702-1726. https://doi.org/10.1109/TASLP.2018.2842159
Xiang, Y., Shi, L., Højvang, J. L., Rasmussen, M. H., & Christensen, M. G. (2022). A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence. Eurasip Journal on Audio, Speech, and Music Processing, 2022(1), Article 22. https://doi.org/10.1186/S13636-022-00256-5

PDF

Published

Sep 23, 2025

Issue

Vol. 2 No. 3 (2025)

Section

Research Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Ma, X. (2025). A Comparative Analysis of FCNN and CNN Architectures for Speech Denoising Across Diverse Noise Frequencies. Computers and Artificial Intelligence, 2(3), 45-54. https://doi.org/10.70267/cai.25v2n3.4554

Download Citation

A Comparative Analysis of FCNN and CNN Architectures for Speech Denoising Across Diverse Noise Frequencies

Main Article Content

Keywords

Abstract

References

Article Sidebar

How to Cite

Similar Articles