A Comparative Analysis of FCNN and CNN Architectures for Speech Denoising Across Diverse Noise Frequencies
Main Article Content
Keywords
speech denoising, FCNN, CNN, noise frequency analysis, SNR, RMSE, common voice dataset
Abstract
Speech denoising remains a critical challenge in audio signal processing, especially under non-stationary noise conditions. While convolutional neural networks (CNNs) have been widely adopted for speech enhancement, the potential of fully connected neural networks (FCNNs) remains underexplored, particularly under frequency-varying noise scenarios. This study presents a systematic comparative analysis of FCNN and CNN architectures for speech denoising across multiple noise frequencies. Using the Common Voice dataset, we introduced diverse noise types at 8 kHz, 16 kHz, and 44 kHz to evaluate the denoising performance of both models. Experimental results demonstrate a frequency-dependent performance disparity: at 8 kHz, both models perform similarly, with CNN showing marginally higher Signal-to-Noise Ratio (SNR) and Root Mean Square Error (RMSE). At 16 kHz, CNN achieves significantly higher SNR albeit with increased RMSE, indicating a trade-off between noise suppression and spectral fidelity. At 44 kHz, CNN comprehensively outperforms FCNN, attaining superior SNR (4.80, +0.04) and lower RMSE (2.6826, –0.1556). These findings underscore the architectural advantages of CNNs in broad-frequency and complex noise environments, while revealing FCNN’s applicability in narrowband scenarios. This research highlights the necessity of frequency-aware model selection and provides novel insights into the comparative efficacy of FCNN and CNN in speech denoising.
References
- Aroudi, A., Veisi, H., & Sameti, H. (2015). Hidden markov model-based speech enhancement using multivariate laplace and gaussian distributions. IET Signal Processing, 9(2), 177-185. https://doi.org/10.1049/IET-SPR.2014.0032
- Azarang, A., & Kehtarnavaz, N. (2020). A review of multi-objective deep learning speech denoising methods. Speech Communication, 122, 1-10. https://doi.org/10.1016/J.SPECOM.2020.04.002
- Balasubrahmanyam, M., & Valarmathi, R. S. (2024). An intelligent speech enhancement model using enhanced heuristic-based residual convolutional neural network with encoder-decoder architecture. International Journal of Speech Technology, 27(3), 637-656. https://doi.org/10.1007/S10772-024-10127-3
- Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113-120. https://doi.org/10.1109/TASSP.1979.1163209
- Garg, A., & Sahu, O. P. (2022). Deep convolutional neural network-based speech signal enhancement using extensive speech features. International Journal of Computational Methods, 19(8), Article 1420056. https://doi.org/10.1142/S0219876221420056
- Hao, J., Lee, T. W., & Sejnowski, T. J. (2010). Speech enhancement using Gaussian scale mixture models. IEEE Transactions on Audio, Speech and Language Processing, 18(6), 1127-1136. https://doi.org/10.1109/TASL.2009.2030012
- Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7-8), 588-601. https://doi.org/10.1016/J.SPECOM.2006.12.006
- Huang, P., & Wu, Y. (2023). Teacher-student training approach using an adaptive gain mask for LSTM-based speech enhancement in the airborne noise environment. Chinese Journal of Electronics, 32(4), 882-895. https://doi.org/10.23919/CJE.2022.00.307
- Le, X., Lei, T., Chen, K., & Lu, J. (2022). Inference skipping for more efficient real-time speech enhancement with parallel RNNs. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 2411-2421. https://doi.org/10.1109/TASLP.2022.3190738
- Lin, Z., Wang, J., Li, R., Shen, F., & Xuan, X. (2025). PrimeK-net: Multi-scale spectral learning via group prime-kernel convolutional neural networks for single channel speech enhancement [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Hyderabad, India.
- Loizou, P. C. (2013). Speech enhancement: Theory and practice. CRC Press. https://doi.org/10.1201/B14529
- Mai, Y., & Goetze, S. (2025). MetricGAN+KAN: Kolmogorov-arnold networks in metric-driven speech enhancement systems [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Hyderabad, India.
- Nuha, H. H., & Absa, A. A. (2022). Noise reduction and speech enhancement using wiener filter [Paper presentation]. 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, Bandung, Indonesia.
- Pandey, A., & Wang, D. L. (2022). Self-attending RNN for speech enhancement to improve cross-corpus generalization. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 1374-1385. https://doi.org/10.1109/TASLP.2022.3161143
- Pascual, S., Serrà, J., & Bonafonte, A. (2019). Time-domain speech enhancement using generative adversarial networks. Speech Communication, 114, 10-21. https://doi.org/10.1016/J.SPECOM.2019.09.001
- Reddy, H., Kar, A., & Østergaard, J. (2022). Performance analysis of low complexity fully connected neural networks for monaural speech enhancement. Applied Acoustics, 190, Article 108627. https://doi.org/10.1016/J.APACOUST.2022.108627
- Saha, B., Khan, S., Shahnaz, C., Fattah, S. A., Islam, M. T., & Khan, A. I. (2018). Configurable digital hearing aid system with reduction of noise for speech enhancement using spectral subtraction method and frequency dependent amplification [Paper presentation]. IEEE Region 10 Annual International Conference, Proceedings/TENCON, Jeju, Korea.
- Shamsa, A., Ghorshi, S., & Joorabchi, M. (2016). Noise reduction using multi-channel FIR warped Wiener filter [Paper presentation]. 13th International Multi-Conference on Systems, Signals and Devices, SSD 2016, Leipzig, Germany.
- Soleymanpour, R., Soleymanpour, M., Brammer, A. J., Johnson, M. T., & Kim, I. (2023). Speech enhancement algorithm based on a convolutional neural network reconstruction of the temporal envelope of speech in noisy environments. IEEE Access, 11, 5328-5336. https://doi.org/10.1109/ACCESS.2023.3236242
- Tang, X., Du, J., Chai, L., Wang, Y., Wang, Q., & Lee, C. H. (2020). Geometry constrained progressive learning for LSTM-based speech enhancement [Paper presentation]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Barcelona, Spain.
- Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio Speech and Language Processing, 26(10), 1702-1726. https://doi.org/10.1109/TASLP.2018.2842159
- Xiang, Y., Shi, L., Højvang, J. L., Rasmussen, M. H., & Christensen, M. G. (2022). A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence. Eurasip Journal on Audio, Speech, and Music Processing, 2022(1), Article 22. https://doi.org/10.1186/S13636-022-00256-5