Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

  • 510 Accesses

Abstract

The perceptual quality of translated speech depends on the quantity of speech data used for training. The translation speech quality is poor when the system is trained with less data. The quality improves by gradually adding more speech data for training. This work demonstrates the significance of post-processing of translated speech by signal processing for improving perceptual quality. Initially, the target speech original residual is used to replace the translated speech residual. It is then replaced using the weighted residual obtained by speech enhancement. The pole modification of translated speech is also done. Finally, both weighted residual and pole modifications are combined. All the experiments show improvement in perceptual quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 74.89
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 96.29
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arora, K., Arora, S., Roy, M.: Speech to speech translation: a communication boon. CSI Trans. ICT 1, 207–213 (2013)

    Article  Google Scholar 

  2. Arya, L., Agarwal, A., Mishra, J., Mahadeva Prasanna, S.R.: Analysis of layer-wise training in direct speech to speech translation using BI-LSTM. In: 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–6 (2022)

    Google Scholar 

  3. Deepak, K.T., Prasanna, S.R.M.: Foreground speech segmentation and enhancement using glottal closure instants and MEL cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)

    Article  Google Scholar 

  4. Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. In: INTERSPEECH, pp. 1123–1127 (2019)

    Google Scholar 

  5. Lee, A., et al.: Direct speech-to-speech translation with discrete units. In: Association for Computational Linguistics, pp. 3327–3339 (2022)

    Google Scholar 

  6. Liu, Y., et al.: End-to-end speech translation with knowledge distillation. In: INTERSPEECH, pp. 1128–1132 (2019)

    Google Scholar 

  7. Morimoto, T., et al.: ATR’s speech translation system: ASURA. In: Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech 1993), pp. 1291–1294 (1993)

    Google Scholar 

  8. Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  9. Nakamura, S., et al.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio Speech Lang. Process. 14(2), 365–376 (2006)

    Article  Google Scholar 

  10. Nomo Sudro, P., Prasanna, S.: Enhancement of cleft palate speech using temporal and spectral processing. Speech Commun. 123, 70–82 (2020)

    Article  Google Scholar 

  11. Rabiner, L.R., Schafer, R.W., et al.: Introduction to digital speech processing. Found. Trends® Signal Process. 1(1–2), 1–194 (2007)

    Google Scholar 

  12. Rao, K.S., Yegnanarayana, B.: Voice conversion by prosody and vocal tract modification. In: 9th International Conference on Information Technology (ICIT 2006), pp. 111–116 (2006)

    Google Scholar 

  13. Saritha, B., Shome, N., Laskar, R.H., Choudhury, M.: Enhancement in speaker recognition using sincnet through optimal window and frame shift. In: 2022 2nd International Conference on Intelligent Technologies (CONIT), pp. 1–6 (2022)

    Google Scholar 

  14. Seligman, M., Waibel, A., Joscelyne, A.: Taus speech-to-speech translation technology report. De Rijp: TAUS BV, pp. 1–58 (2017)

    Google Scholar 

  15. Tjandra, A., Sakti, S., Nakamura, S.: Speech-to-speech translation between untranscribed unknown languages. In: Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 593–600 (2019)

    Google Scholar 

  16. Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-662-04230-4

    Book  MATH  Google Scholar 

  17. Weninger, F.: Introducing CURRENNT: the Munich open-source CUDA recurrent neural network toolkit. J. Mach. Learn. Res. 16(17), 547–551 (2015)

    MathSciNet  MATH  Google Scholar 

  18. Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)

    Article  Google Scholar 

Download references

Acknowledgment

We would like to thank the “AnantGanak” high-performance computation (HPC) facility at IIT Dharwad for enabling us to conduct our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lalaram Arya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arya, L., Prasanna, S.R.M. (2023). Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation