Development of syllable-based text to speech synthesis system in Bengali

Narendra, N. P.; Rao, K. Sreenivasa; Ghosh, Krishnendu; Vempada, Ramu Reddy; Maity, Sudhamay

doi:10.1007/s10772-011-9094-4

Development of syllable-based text to speech synthesis system in Bengali

Published: 16 June 2011

Volume 14, pages 167–181, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

N. P. Narendra¹,
K. Sreenivasa Rao¹,
Krishnendu Ghosh¹,
Ramu Reddy Vempada¹ &
…
Sudhamay Maity¹

605 Accesses
55 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents the design and development of unrestricted text to speech synthesis (TTS) system in Bengali language. Unrestricted TTS system is capable to synthesize good quality of speech in different domains. In this work, syllables are used as basic units for synthesis. Festival framework has been used for building the TTS system. Speech collected from a female artist is used as speech corpus. Initially five speakers’ speech is collected and a prototype TTS is built from each of the five speakers. Best speaker among the five is selected through subjective and objective evaluation of natural and synthesized waveforms. Then development of unrestricted TTS is carried out by addressing the issues involved at each stage to produce good quality synthesizer. Evaluation is carried out in four stages by conducting objective and subjective listening tests on synthesized speech. At the first stage, TTS system is built with basic festival framework. In the following stages, additional features are incorporated into the system and quality of synthesis is evaluated. The subjective and objective measures indicate that the proposed features and methods have improved the quality of the synthesized speech from stage-2 to stage-4.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Automatic speech recognition: a survey

Article 10 November 2020

References

Basu, J. B., Mitra, T., Mandal, M., & Das, S. K. (2009). Grapheme to phoneme (g2p) conversion for Bangla. In Oriental COCOSDA international conference on speech database and assessments.
Google Scholar
Benesty, J., Sondhi, M. M., & Huang, Y. (2008). Springer handbook of speech processing. Springer, Berlin
Book Google Scholar
Beutnagel, M., Conkie, A., & Syrdal, A. (1998). Diphone synthesis using unit selection. In 3rd ESCA/COCOSDA workshop on speech synthesis, Nov.
Google Scholar
Beutnagel, M., Mohri, M., & Riley, M. (1999). Rapid unit selection from a large speech corpus for concatenative speech synthesis. In Proc. Eurospeech.
Google Scholar
Black, A. W., & Lanzo, K. (2003). Building synthetic voices. Cambridge: Carnegie Mellon University.
Google Scholar
Black, A. W., & Lenzo, K. A. (2000). Limited domain synthesis. In ICSLP, Bei**g, China.
Google Scholar
Black, A. W., & Taylor, P. (1994). Chatr: a generic speech sythesis system. In COLING ’94 (pp. 983–986).
Google Scholar
Black, A. W., & Taylor, P. (1997). Automatically clustering similar units for unit selection in speech synthesis. In Eurospeech’97 (vol. 2, pp. 601–604).
Google Scholar
Blouin, C., Rosec, O., Bagshaw, P., & d’Alessandro, C. (2002). Concatenation cost calculation and optimization for unit selection in tts. In IEEE workshop on speech synthesis, Santa Monica, CA, USA.
Google Scholar
Bozkurt, B., Ozturk, O., & Dutoit, T. (2003). Text design for tts speech corpus building using a modified greedy selection. In 8th European conference on speech communication and technology (Eurospeech), Geneva, Switzerland, September (pp. 277–280).
Google Scholar
Chitturi, R., Mariam, S. H., & Kumar, R. (2005). Rapid methods for optimal text selection. In Recent advances in natural language processing, Borovets, Bulgaria, September.
Google Scholar
Choudhury, M. (2003). Rule-based grapheme to phoneme map** for Hindi speech synthesis. In 9th Indian science congress of the international speech communication association (ISCA), Bangalore.
Google Scholar
Conkie, A., & Isard, S. (1997). Progress in speech synthesis. In Progress in speech synthesis. New York: Springer.
Google Scholar
Deivapalan, P. G., Jha, M., Guttikonda, R., & Murthy, H. A. (2008). Donlabel: an automatic labeling tool for Indian languages. In National conference on communication (NCC), IIT-Bombay, February (pp. 263–266).
Google Scholar
Dong, M., teng Lua, K., & Li, H. (2008). A unit selection-based speech synthesis approach for mandarin Chinese. Journal of Chinese Language and Computing, 16, 135–144.
Google Scholar
Ghosh, K., Reddy, R. V., Narendra, N. P., Maity, S., Koolagudi, S. G., & Rao, K. S. (2010). Grapheme to phoneme conversion in Bengali for festival based tts framework. In 8th international conference on natural language processing (ICON). Macmillan Publishers, New Delhi.
Google Scholar
Gros, J. Z., & Zganec, M. (2008). An efficient unit-selection method for concatenative text-to-speech synthesis systems. Journal of Computing and Information Technology, 1, 69–78.
Google Scholar
Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (vol. 1, pp. 373–376).
Google Scholar
Kaira, S. (1976). Schwa-deletion in Hindi. In Bhari publications: Vol. 2. Language forum.
Google Scholar
Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., & Raptis, S. (2010) One-class classification for spectral join cost calculation in unit selection speech synthesis, IEEE Signal Processing Letters 17.
Kishore, S., & Black, A. W. (2003). Unit size in unit selection speech synthesis. In EUROSPEECH (pp. 1317–1320).
Google Scholar
Kishore, S. P., Sangal, R., & Srinivas, M. (2002). Building Hindi and Telugu voices using festvox. In ICON, Mumbai, India, December.
Google Scholar
Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82, 737–793.
Article Google Scholar
Krishna, N. S., & Murthy, H. A. (2004). Duration modeling of Indian languages Hindi and Telugu. In Proceedings of 5 th ISCA SSW.
Google Scholar
Krishna, N. S., Talukdar, P. P., Bali, K., & Ramakrishnan, A. (2004). Duration modeling for Hindi text-to-speech synthesis system. In ICSLP 2004 (pp. 789–792).
Google Scholar
Lawrence, W. (1953). The synthesis of speech from signals which have a low information rate. London: Butterworths.
Google Scholar
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
MATH Google Scholar
Raghavendra, E., & Prahallad, K. (2010). A multilingual screen reader in Indian languages. In National conference on communications (NCC), Chennai, India, January.
Google Scholar
Rao, M. N., Thomas, S., Nagarajan, T., & Murthy, H. A. (2005). Text-to-speech synthesis using syllable like units. In National conference on communication, IIT Kharagpur, India, January (pp. 227–280).
Google Scholar
Riley, M. (1992). Tree-based modeling for speech synthesis. In G. Bailly, C. Benoit, & T. Sawallis (Eds.), Talking machines: theories, models and designs (pp. 265–273).
Google Scholar
Sreekanth, M., & Ramakrishnan, A. G. (2007). Festival based maiden tts system for Tamil language. In Proc. 3rd language and technology conf., Poznan, Poland, October (pp. 187–191).
Google Scholar
Tahar, S., Mounir, Z., & Mohamed, B. A. (2005). Arabic speech synthesis using a concatenation of polyphones: the results. In Lecture notes in computer science: Vol. 3501. Advances in artificial intelligence (pp. 406–411).
Chapter Google Scholar
van Santen, J. P. H., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In Eurospeech, Rhodes, Greece (pp. 553–556).
Google Scholar
Vepa, J., & King, S. (2004). Join cost for unit selection speech synthesis. In Text to speech synthesis: new paradigms and advances. New York: Prentice Hall (pp. 35–62).
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (1999). Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis. In Proc. Eurospeech (pp. 2347–2350).
Google Scholar
Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51, 1039–1064.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
N. P. Narendra, K. Sreenivasa Rao, Krishnendu Ghosh, Ramu Reddy Vempada & Sudhamay Maity

Authors

N. P. Narendra
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Krishnendu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Ramu Reddy Vempada
View author publications
You can also search for this author in PubMed Google Scholar
Sudhamay Maity
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. P. Narendra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narendra, N.P., Rao, K.S., Ghosh, K. et al. Development of syllable-based text to speech synthesis system in Bengali. Int J Speech Technol 14, 167–181 (2011). https://doi.org/10.1007/s10772-011-9094-4

Download citation

Received: 06 January 2011
Accepted: 31 May 2011
Published: 16 June 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10772-011-9094-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of syllable-based text to speech synthesis system in Bengali

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Development of syllable-based text to speech synthesis system in Bengali

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation