Refining Speech Clarity with Wavelet Denoising under Different Face Mask Conditions: A Subjective Analysis

Marxim Rahula Bharathi B; Adireddy Ramesh; Balaji N.S; Elumalai P.V; Akhilesh Kumar Singh; Satish Chembuly V.V.M.J; Huaizhi Zhang

doi:10.54392/irjmt2522

Authors

Marxim Rahula Bharathi B Department of Mechanical Engineering, Aditya University, Surampalem, Andhra Pradesh, India. Author https://orcid.org/0000-0002-0534-6639
Adireddy Ramesh Department of Electrical and Electronic Engineering, Aditya University, Surampalem, Andhra Pradesh, India. Author
Balaji N.S Department of Mechanical Engineering, SRM Institute of Science and Technology, Tiruchirappalli Campus, Tiruchirappalli, Tamil Nadu, India. Author https://orcid.org/0000-0002-7178-5613
Elumalai P.V Department of Mechanical Engineering, Aditya University, Surampalem, Andhra Pradesh, India. Author
Akhilesh Kumar Singh Department of Mechanical Engineering, Aditya University, Surampalem, Andhra Pradesh, India. Author
Satish Chembuly V.V.M.J Department of Mechanical Engineering, Aditya University, Surampalem, Andhra Pradesh, India. Author
Huaizhi Zhang Faculty of Engineering & Technology, Shinawatra University, Bang Toei, Thailand - 12160. Author

DOI:

https://doi.org/10.54392/irjmt2522

Keywords:

Face masks, Speech Enhancement, Wavelet Transform, Subjective comparison test, Wavelet Denoising, Process Innovation

Abstract

Amid the COVID-19 pandemic, people have adopted various face masks and face shields as protective measures against infection. While these measures have been instrumental in saving countless lives, they pose significant challenges to interpersonal communication, especially in scenarios requiring clear verbal interaction. This study utilizes a microphone to capture speech signals in different scenarios involving face masks, with and without face shields. Participants read vowels and the Grandfather Passage across ten experimental conditions, including surgical masks, cloth masks, double masks (surgical and cloth combination), and N95 masks, both with and without face shields. The obtained speech signals, often distorted by noise and reverberation, undergo enhancement through the wavelet denoising approach using discrete wavelet transform with soft thresholding. The quality of the enhanced signals was compared to the original acquired signals using a subjective comparison test involving 30 listeners who rated the signals based on comparison mean opinion scores (CMOS). Multiple research findings indicate that the signal improvement achieved through wavelet denoising consistently exceeds the quality of the initial signal, even under challenging conditions such as double masks with face shields. This study highlights the practical efficacy of wavelet denoising in addressing speech clarity challenges caused by protective face coverings, offering a valuable solution for improved communication in masked environments.

References

J. Brainard, N. Jones, I. Lake, L. Hooper, P.R. Hunter, Facemasks and similar barriers to prevent respiratory illness such as covid-19: A rapid systematic review. MedRxiv, (2020) 2020 - 04. https://doi.org/10.1101/2020.04.01.20049528

C. J. Worby, H. H. Chang, Face mask use in the general population and optimal resource allocation during the covid-19 pandemic. Nature communications, 11(1), (2020), 1-9. https://doi.org/10.1038/s41467-020-17922-x

A.J. Palmiero, D. Symons, J.W. Morgan III, R.E. Shaffer, Speech intelligibility assessment of protective facemasks and air-purifying respirators. Journal of occupational and environmental hygiene, 13(12), (2016), 960 – 968. https://doi.org/10.1080/15459624.2016.1200723

S.R. Atcherson, L.L. Mendel, W.J. Baltimore, C. Patro, S. Lee, M. Pousson, M.J. Spann, The effect of conventional and transparent surgical masks on speech understanding in individuals with and without hearing loss. Journal of the American Academy of Audiology, 28(01), (2017) 058 - 067. https://doi.org/10.3766/jaaa.15151

R.M. Corey, U. Jones, A.C. Singer, Acoustic effects of medical, cloth, and transparent face masks on speech signals. The Journal of the Acoustical Society of America, 148(4), (2020) 2371 - 2375. https://doi.org/10.1121/10.0002279

B. Gursharan, T. Knowles, Acoustic and perceptual impact of face masks on speech: A scoping review. Plos one, 18(8), (2023) e0285009. https://doi.org/10.1371/journal.pone.0285009

L.J. Radonovich R. Yanke, J. Cheng, B. Bender, Diminished speech intelligibility associated with certain types of respirators worn by healthcare workers. Journal of occupational and environmental hygiene, 7(1), (2009) 63 - 70. https://doi.org/10.1080/15459620903404803

B. Balamurali, T. Enyi, C.J. Clarke, S.Y. Harn, J.M. Chen, Acoustic effect of face mask design and material choice. Acoustics Australia, 49(3), (2021) 505 – 512. https://doi.org/10.1007/s40857-021-00245-2

D.S. Kulkarni, R.R. Deshmukh, P.P. Shrishrimal, A review of speech signal enhancement techniques. International Journal of Computer Applications, 139(14), (2016) 23-26. https://doi.org/10.5120/ijca2016909507

N.V. Lalitha, G. Suresh, M.K. Singh, V. M. Kumar, Speech Signal Splicing Detection System Based on MFCC and DTW. International Research Journal of Multidisciplinary Technovation 6 (6), (2024)186-97. https://doi.org/10.54392/irjmt24613

D.J. Nelson, (2005) Signal reconstruction from concentrated STFT peaks [short time Fourier transform peaks]. Proceedings. (ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, USA. https://doi.org/10.1109/ICASSP.2005.1415962

A. Chaudhari, S.B. Dhonde, (2015) A review on speech enhancement techniques. International Conference on Pervasive Computing (ICPC), IEEE, India. https://doi.org/10.1109/PERVASIVE.2015.7087096

F. Qu, S. Lei, Z. Zhao, J. Zhang, Z. Nie, (2021) A modified a priori SNR estimation for spectral subtraction speech enhancement. In 2021 IEEE 4th International Conference on Electronics Technology (ICET), IEEE, China. https://doi.org/10.1109/ICET51757.2021.9451018

K. Paliwal, K. Wójcicki, B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech communication, 52(5), (2010) 450-475. https://doi.org/10.1016/j.specom.2010.02.004

H.T. Hu, F.J. Kuo, H.J. Wang, Supplementary schemes to spectral subtraction for speech enhancement. Speech Communication, 36(3-4), (2002) 205-218. https://doi.org/10.1016/S0167-6393(00)00086-8

M.F. Kasim, T. Adiono, M. Fahreza, M.F. Zakiy, Real-time architecture and FPGA implementation of adaptive general spectral substraction method. Procedia Technology, 11, (2013) 191-198. https://doi.org/10.1016/j.protcy.2013.12.180

O. Postolache, P. Girao, M. Pereira. Underwater acoustic source localization based on passive sonar and intelligent processing. IEEE Instrumentation & Measurement Technology Conference IMTC, IEEE, Poland. https://doi.org/10.1109/IMTC.2007.379152

M.R.B. Boopathi Rajan, A.R. Mohanty, Time delay estimation using wavelet denoising maximum likelihood method for underwater reverberant environment. IET Radar, Sonar & Navigation, 14(8), (2020) 1183-1191. https://doi.org/10.1049/iet-rsn.2020.0079

S. Zhang, S. Wan, Y. Wang, B. Zhang, Z. Zhang, H. Zhong, J. Shi, J. Sun, X. He, Q. Wu, 2D sound source localization technology based on diaphragm EFPI fiber microphone array. Optics Communications, 519, (2022) 128435. https://doi.org/10.1016/j.optcom.2022.128435

Whitmal, Nathaniel A., and Janet C. Rutledge. Noise reduction in hearing aids: a case for wavelet-based methods. Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol. 20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No. 98CH36286). Vol. 3. IEEE, 1998. https://doi.org/10.1109/IEMBS.1998.747070

M.B. Gur, C. Niezrecki, A wavelet packet adaptive filtering algorithm for enhancing manatee vocalizations. The Journal of the Acoustical Society of America, 129(4), (2011) 2059-2067. https://doi.org/10.1049/iet-rsn.2020.0079

K. Leftwich, J.W. Ioup, Denoising and deconvolving sperm whale data in the northern Gulf of Mexico. The Journal of the Acoustical Society of America, 151(4_Supplement), (2022) A136-A136. https://doi.org/10.1121/10.0010899

C. Beale, C. Niezrecki, M. Inalpolat, An adaptive wavelet packet denoising algorithm for enhanced active acoustic damage detection from wind turbine blades. Mechanical Systems and Signal Processing, 142, (2020) 106754. https://doi.org/10.1016/j.ymssp.2020.106754

S.P. Vaidya, P.V.S.S. Mouli, Robust digital color image watermarking based on compressive sensing and DWT. Multimedia Tools and Applications, 83, (2024) 3357–3371. https://doi.org/10.1007/s11042-023-15349-2

M.K. Singh, DWT and LBP hybrid feature based deep learning technique for image splicing forgery detection. Soft Computing, 28, (2024) 12207–12215. https://doi.org/10.1007/s00500-024-09919-1

M. Bahoura, J. Rouat, Wavelet speech enhancement based on time–scale adaptation. Speech Communication, 48(12), (2006) 1620-1637. https://doi.org/10.1016/j.specom.2006.06.004

A. Bhowmick, M. Chandra, Speech enhancement using voiced speech probability based wavelet decomposition. Computers & Electrical Engineering, 62, (2017) 706-718. https://doi.org/10.1016/j.compeleceng.2017.01.013

Y. Ghanbari, M.R. Karami-Mollaei, A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech communication, 48(8), (2006) 927-940. https://doi.org/10.1016/j.specom.2005.12.002

M.R. Bharathi, N.S. Balaji, A.K. Singh, R. Sundaramurthi, Improving speech communication in the age of face masks: A study on EMD denoising method by subjective speech comparison. e-Prime-Advances in Electrical Engineering, Electronics and Energy, 5, (2023) 100267. https://doi.org/10.1016/j.prime.2023.100267

S.R. Senthamizh, G.G. Naidu, G. Tejaswini, (2024) Speech Enhancement using Discrete Wavelet Transform with Long Short-Term Memory Algorithm. Nanotechnology Perceptions, Brookfield Academic Limited, United Kingdom.

Z.T. Wu, P.F. Li, P.C. Wu, E.S. Li, J.W. Hung, (2023). Exploiting Discrete Wavelet Transform Features in Speech Enhancement Technique Adaptive FullSubNet+. In 2023 International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), IEEE, Taiwan. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226809

J. Reilly, J.L. Fisher, Sherlock Holmes and the strange case of the missing attribution: A historical note on “The Grandfather Passage. Journal of Speech, Language, and Hearing Research, 55(1), (2012) 84-88. https://doi.org/10.1044/1092-4388(2011/11-0158)

X. Tan, J. Chen, H. Liu, J. Cong, C. Zhang, Y. Liu, X. Wang, Y. Leng, L. He, S. Zhao, T. Qin, F. Soong, T.Y. Liu, Naturalspeech: End-to-end text-to-speech synthesis with human-level quality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(6), (2024) 4234 - 4245.

R.K. Kandagatla, V.J. Naidu, P.S. Reddy, M. Gayathri, A. Jahnavi, K. Rajeswari, Analysis of statistical estimators and neural network approaches for speech enhancement. Science and Engineering Journal,17(Supplement), (2024) 17-27. https://doi.org/10.54645/202417SupXBB-31

K. Taira, K. Kondo, (2015) Estimation of binaural intelligibility using the frequency-weighted segmental SNR of stereo channel signals, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hong Kong, China. https://doi.org/10.1109/APSIPA.2015.7415459

P. Selvaraj, S.S. Maidin, & Q. Yang, (2025). Speech Enhancement using Sliding Window Empirical Mode Decomposition with Median Filtering Technique. Journal of Applied Data Sciences, 6(1), 143-154. https://doi.org/10.47738/jads.v6i1.470