Abstract
The perceptual quality of translated speech depends on the quantity of speech data used for training. The translation speech quality is poor when the system is trained with less data. The quality improves by gradually adding more speech data for training. This work demonstrates the significance of post-processing of translated speech by signal processing for improving perceptual quality. Initially, the target speech original residual is used to replace the translated speech residual. It is then replaced using the weighted residual obtained by speech enhancement. The pole modification of translated speech is also done. Finally, both weighted residual and pole modifications are combined. All the experiments show improvement in perceptual quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arora, K., Arora, S., Roy, M.: Speech to speech translation: a communication boon. CSI Trans. ICT 1, 207–213 (2013)
Arya, L., Agarwal, A., Mishra, J., Mahadeva Prasanna, S.R.: Analysis of layer-wise training in direct speech to speech translation using BI-LSTM. In: 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–6 (2022)
Deepak, K.T., Prasanna, S.R.M.: Foreground speech segmentation and enhancement using glottal closure instants and MEL cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)
Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. In: INTERSPEECH, pp. 1123–1127 (2019)
Lee, A., et al.: Direct speech-to-speech translation with discrete units. In: Association for Computational Linguistics, pp. 3327–3339 (2022)
Liu, Y., et al.: End-to-end speech translation with knowledge distillation. In: INTERSPEECH, pp. 1128–1132 (2019)
Morimoto, T., et al.: ATR’s speech translation system: ASURA. In: Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech 1993), pp. 1291–1294 (1993)
Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Nakamura, S., et al.: The ATR multilingual speech-to-speech translation system. IEEE Trans. Audio Speech Lang. Process. 14(2), 365–376 (2006)
Nomo Sudro, P., Prasanna, S.: Enhancement of cleft palate speech using temporal and spectral processing. Speech Commun. 123, 70–82 (2020)
Rabiner, L.R., Schafer, R.W., et al.: Introduction to digital speech processing. Found. Trends® Signal Process. 1(1–2), 1–194 (2007)
Rao, K.S., Yegnanarayana, B.: Voice conversion by prosody and vocal tract modification. In: 9th International Conference on Information Technology (ICIT 2006), pp. 111–116 (2006)
Saritha, B., Shome, N., Laskar, R.H., Choudhury, M.: Enhancement in speaker recognition using sincnet through optimal window and frame shift. In: 2022 2nd International Conference on Intelligent Technologies (CONIT), pp. 1–6 (2022)
Seligman, M., Waibel, A., Joscelyne, A.: Taus speech-to-speech translation technology report. De Rijp: TAUS BV, pp. 1–58 (2017)
Tjandra, A., Sakti, S., Nakamura, S.: Speech-to-speech translation between untranscribed unknown languages. In: Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 593–600 (2019)
Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-662-04230-4
Weninger, F.: Introducing CURRENNT: the Munich open-source CUDA recurrent neural network toolkit. J. Mach. Learn. Res. 16(17), 547–551 (2015)
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Acknowledgment
We would like to thank the “AnantGanak” high-performance computation (HPC) facility at IIT Dharwad for enabling us to conduct our experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Arya, L., Prasanna, S.R.M. (2023). Post-processing of Translated Speech by Pole Modification and Residual Enhancement to Improve Perceptual Quality. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)