Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality

Arya, Lalaram; Prasanna, S. R. Mahadeva

doi:10.1007/978-3-031-48309-7_19

Lalaram Arya¹³ &
S. R. Mahadeva Prasanna¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

International Conference on Speech and Computer

510 Accesses

Abstract

The perceptual quality of translated speech depends on the quantity of speech data used for training. The translation speech quality is poor when the system is trained with less data. The quality improves by gradually adding more speech data for training. This work demonstrates the significance of post-processing of translated speech by signal processing for improving perceptual quality. Initially, the target speech original residual is used to replace the translated speech residual. It is then replaced using the weighted residual obtained by speech enhancement. The pole modification of translated speech is also done. Finally, both weighted residual and pole modifications are combined. All the experiments show improvement in perceptual quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 74.89; Price includes VAT (Germany)

Softcover Book: EUR 96.29; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR

Article 22 November 2016

Blind Speech Enhancement Using Adaptive Algorithms

A Comprehensive Review of Conventional to Modern Algorithms of Speech Enhancement

References

Arora, K., Arora, S., Roy, M.: Speech to speech translation: a communication boon. CSI Trans. ICT 1, 207–213 (2013)
Article Google Scholar
Arya, L., Agarwal, A., Mishra, J., Mahadeva Prasanna, S.R.: Analysis of layer-wise training in direct speech to speech translation using BI-LSTM. In: 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–6 (2022)
Google Scholar
Deepak, K.T., Prasanna, S.R.M.: Foreground speech segmentation and enhancement using glottal closure instants and MEL cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)
Article Google Scholar
Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. In: INTERSPEECH, pp. 1123–1127 (2019)
Google Scholar
Lee, A., et al.: Direct speech-to-speech translation with discrete units. In: Association for Computational Linguistics, pp. 3327–3339 (2022)
Google Scholar
Liu, Y., et al.: End-to-end speech translation with knowledge distillation. In: INTERSPEECH, pp. 1128–1132 (2019)
Google Scholar
Morimoto, T., et al.: ATR’s speech translation system: ASURA. In: Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech 1993), pp. 1291–1294 (1993)
Google Scholar
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
Nakamura, S., et al.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio Speech Lang. Process. 14(2), 365–376 (2006)
Article Google Scholar
Nomo Sudro, P., Prasanna, S.: Enhancement of cleft palate speech using temporal and spectral processing. Speech Commun. 123, 70–82 (2020)
Article Google Scholar
Rabiner, L.R., Schafer, R.W., et al.: Introduction to digital speech processing. Found. Trends® Signal Process. 1(1–2), 1–194 (2007)
Google Scholar
Rao, K.S., Yegnanarayana, B.: Voice conversion by prosody and vocal tract modification. In: 9th International Conference on Information Technology (ICIT 2006), pp. 111–116 (2006)
Google Scholar
Saritha, B., Shome, N., Laskar, R.H., Choudhury, M.: Enhancement in speaker recognition using sincnet through optimal window and frame shift. In: 2022 2nd International Conference on Intelligent Technologies (CONIT), pp. 1–6 (2022)
Google Scholar
Seligman, M., Waibel, A., Joscelyne, A.: Taus speech-to-speech translation technology report. De Rijp: TAUS BV, pp. 1–58 (2017)
Google Scholar
Tjandra, A., Sakti, S., Nakamura, S.: Speech-to-speech translation between untranscribed unknown languages. In: Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 593–600 (2019)
Google Scholar
Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-662-04230-4
Book MATH Google Scholar
Weninger, F.: Introducing CURRENNT: the Munich open-source CUDA recurrent neural network toolkit. J. Mach. Learn. Res. 16(17), 547–551 (2015)
MathSciNet MATH Google Scholar
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Article Google Scholar

Download references

Acknowledgment

We would like to thank the “AnantGanak” high-performance computation (HPC) facility at IIT Dharwad for enabling us to conduct our experiments.

Author information

Authors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, 580011, India
Lalaram Arya & S. R. Mahadeva Prasanna

Authors

Lalaram Arya
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lalaram Arya .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arya, L., Prasanna, S.R.M. (2023). Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-48309-7_19
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR

Blind Speech Enhancement Using Adaptive Algorithms

A Comprehensive Review of Conventional to Modern Algorithms of Speech Enhancement

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Noise estimation for speech enhancement algorithms with post-smoothness processor incorporating global posterior SNR

Blind Speech Enhancement Using Adaptive Algorithms

A Comprehensive Review of Conventional to Modern Algorithms of Speech Enhancement

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation