Segmentation of Noisy Speech Signals

Protserov, S. D.; Shishkin, A. G.

doi:10.3103/S0147688222050100

Segmentation of Noisy Speech Signals

Published: 06 March 2023

Volume 49, pages 356–363, (2022)
Cite this article

Scientific and Technical Information Processing Aims and scope

S. D. Protserov¹ &
A. G. Shishkin¹

40 Accesses
Explore all metrics

Abstract

One of the most important problems in digital speech-signal processing is distinguishing segments of active speech and of background noise or silence in an input acoustic signal. This problem arises in many important practical applications, such as speech analysis in voice command systems, transmission of speech over a network, automated speech recognition, etc. However, most available systems designed for automated speech analysis cannot efficiently solve this problem if the signal-to-noise ratio is small. In addition, their parameters must be tuned separately for different noise levels. This prevents fully automated segmentation of noisy speech signals. In this work, we design a system for the automated segmentation of speech signals distorted by additive noise of different types and intensities. The developed system is based on three various deep convolutional neural network models and can efficiently detect speech and silence segments in noisy signals over a wide range of the signal-to-noise ratios and different noise types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Rabiner, L.R. and Sambur, M.R., An algorithm for determining the endpoints of isolated utterances, Bell Syst. Tech. J., 1957, vol. 54, no. 2, pp. 297–315. https://doi.org/10.1002/j.1538-7305.1975.tb02840.x
Article Google Scholar
Zhang, R.Z. and Cui, H.J., Speech endpoint detection algorithm analyses based on short-term energy, Audio Eng., 2005, vol. 7, pp. 52–59.
Google Scholar
Ghosh, P.K., Tsiartas, A., and Narayanan, S., Robust voice activity detection using longterm signal variability, IEEE Trans. Audio, Speech, Lang. Process., 2011, vol. 19, no. 3, pp. 600–613. https://doi.org/10.1109/TASL.2010.2052803
Article Google Scholar
Ma, Ya. and Nishihara, A., Efficient voice activity detection algorithm using long-term spectral flatness measure, EURASIP J. Audio, Speech, Music Process., 2013, vol. 2013, p. 87. https://doi.org/10.1186/1687-4722-2013-21
Article Google Scholar
Atal, B.S., Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am., 1974, vol. 55, no. 6, pp. 1304–1322. https://doi.org/10.1121/1.1914702
Article Google Scholar
Davis, S.B. and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Readings Speech Recognit., 1990, vol. 28, no. 4, pp. 65–74.
Article Google Scholar
Eshaghi, M. and Karami Mollaei, M.R., Voice activity detection based on using wavelet packet, Digital Signal Process., 2010, vol. 20, no. 4, pp. 1102–1115. https://doi.org/10.1016/j.dsp.2009.11.008
Article Google Scholar
Li, J., Zhou, P., **g, X., and Du, Zh., Speech endpoint detection method based on TEO in noise environment, Proc. Eng., 2012, vol. 29, no. 4, pp. 2655–2660. https://doi.org/10.1016/j.proeng.2012.01.367
Article Google Scholar
Li, L. and Zhu, J., Research of speech endpoint detection based on wavelet analysis and neural networks, J. Electr. Meas. Instrum., 2013, vol. 27, no. 6, pp. 528–534.
Article Google Scholar
Sehgal, A. and Kehtarnavaz, N., A Convolutional neural network smartphone app for real-time voice activity detection, IEEE Access, 2018, vol. 6, pp. 9017–9026. https://doi.org/10.1109/ACCESS.2018.2800728
Article Google Scholar
Amodei, D., Ananthanarayanan, S., Anubhai, R., et al, Deep speech 2: End-to-end speech recognition in English and Mandarin, ICML’16: Proc. 33rd Int. Conf. on Machine Learning, Balcan, M.F. and Weinberger, K.Q., Eds., New York, 2016, JMLR.org, 2016, vol. 48, pp. 173–182.
Hussain, M.S. and Haque, M.A., SwishNet: A fast convolutional neural network for speech, music and noise classification and segmentation, 2018. ar**v:1812.00149 [cs.LG].
LibriVox. https://librivox.org/.
ChiME-4. http://spandh.des.shef.ac.uk/chime_challenge/chime2016/index.html.
Wang, Z., Vincent, E., Serizel, R., and Yan, Yo., Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments, Comput. Speech Lang., 2018, vol. 49, pp. 37–51. https://doi.org/10.1016/j.csl.2017.11.003
Article Google Scholar
Jia, F., Majumdar, S., and Cinsburg, B., MarbleNet: Deep 1D time-channel separable convolutional neural network for voice activity detection, 2021 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 2021, IEEE, 2021, pp. 6818–6822. https://doi.org/10.1109/ICASSP39728.2021.9414470
Tan, X. and Zhang, X.-L., Speech enhancement aided end-to-end multi-task learning for voice activity detection, 2021 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, 2021, IEEE, 2021, pp. 6823–6827. https://doi.org/10.1109/ICASSP39728.2021.9414445
Panayotov, V., Chen, G., Povey, D., and Khudanpur, S., Librispeech: An ASR corpus based on public domain audio books, IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 2015, IEEE, 2015, pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
Garofolo, J.S., et al., TIMIT acoustic-phonetic continuous speech corpus LDC93S1, Philadelphia: Linguistic Data Consortium, 1993.
Google Scholar
Common Voice. https://commonvoice.mozilla.org/en.
Boersma, P., Praat, a system for doing phonetics by computer, 2002.
Piczak, K.J., ESC: Dataset for environmental sound classification, 2015.
Loshchilov, I. and Hutter, F., Fixing weight decay regularization in Adam, 6th Int. Conf. on Learning Representations, Vancouver, 2018. ar**v:1711.05101 [cs.LG]
Reddi, S., Kale, S., and Kumar, S., On the convergence of Adam and beyond, 6th Int. Conf. on Learning Representations, Vancouver, 2018. ar**v:1904.09237 [cs.LG]
Smith, L.N. and Topin, N., Super-convergence: very fast training of neural networks using large learning rates, SPIE Proc., 2019, vol. 11006, p. 1100612. https://doi.org/10.1117/12.2520589

Download references

Author information

Authors and Affiliations

Moscow State University, 119991, Moscow, Russia
S. D. Protserov & A. G. Shishkin

Authors

S. D. Protserov
View author publications
You can also search for this author in PubMed Google Scholar
A. G. Shishkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to S. D. Protserov or A. G. Shishkin.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by L. Kartvelishvili

About this article

Cite this article

Protserov, S.D., Shishkin, A.G. Segmentation of Noisy Speech Signals. Sci. Tech. Inf. Proc. 49, 356–363 (2022). https://doi.org/10.3103/S0147688222050100

Download citation

Received: 27 August 2020
Revised: 08 November 2020
Accepted: 24 November 2020
Published: 06 March 2023
Issue Date: December 2022
DOI: https://doi.org/10.3103/S0147688222050100

Keywords:

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Segmentation of Noisy Speech Signals

Abstract

Access this article

Subscribe and save

Buy Now

REFERENCES

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now

Search

Navigation