A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Yu, Dahai; Ghita, Ovidiu; Sutherland, Alistair; Whelan, Paul F.

doi:10.1007/978-3-540-92957-4_35

Dahai Yu⁴,
Ovidiu Ghita⁴,
Alistair Sutherland⁴ &
…
Paul F. Whelan⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5414))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

3786 Accesses
2 Citations

Abstract

This paper presents the development of a novel visual speech recognition (VSR) system based on a new representation that extends the standard viseme concept (that is referred in this paper to as Visual Speech Unit (VSU)) and Hidden Markov Models (HMM). The visemes have been regarded as the smallest visual speech elements in the visual domain and they have been widely applied to model the visual speech, but it is worth noting that they are problematic when applied to the continuous visual speech recognition. To circumvent the problems associated with standard visemes, we propose a new visual speech representation that includes not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. To fully evaluate the appropriateness of the proposed visual speech representation, in this paper an extensive set of experiments have been conducted to analyse the performance of the visual speech units when compared with that offered by the standard MPEG-4 visemes. The experimental results indicate that the developed VSR application achieved up to 90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only in the range 62-72%.

Download to read the full chapter text

Chapter PDF

Speech Recognition Using Spectrogram-Based Visual Features

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition

Article 11 June 2020

A Survey on Different Visual Speech Recognition Techniques

Keywords

References

Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in the Automatic Recognition of Audio-Visual Speech. Proc. of IEEE 91(9), 1306–1326 (2003)
Article Google Scholar
Shamaie, A., Sutherland, A.: Accurate Recognition of Large Number of Hand Gestures. In: Iranian Conference on Machine Vision and Image Processing, University of Technology, Tehran. ICMVIP Press (2003)
Google Scholar
Luettin, J., Thacker, N.A., Beet, S.W.: Active Shape Models for Visual Speech Feature Extraction, Speechreading by Humans and Machine: Models, Systems and Applications. NATO ASI Series (1996)
Google Scholar
Dong, L., Foo, S.W., Lian, Y.: A Two-channel Training Algorithm for Hidden Markov Model and its Application to Lip Reading. EURASIP Journal on Applied Signal Processing, 1382–1399 (2005)
Google Scholar
Eveno, N., Caplier, A., Coulon, P.: A new color transformation for lips segmentation. In: 4th Workshop on Multimedia Signal Processing, Cannes, pp. 3–8. IEEE Press, Los Alamitos (2001)
Google Scholar
Roweis, S.: EM Algorithms for PCA and SPCA. Advances in Neural Information Processing Systems 10, 626–632 (1998)
Google Scholar
Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition, Ph.D. dissertation, University of Illinois, Urbana-Champaign, USA (1984)
Google Scholar
Yu, D., Ghita, O., Sutherland, A., Whelan, P.F.: A New Manifold Representation for Visual Speech Recognition. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds.) CAIP 2007. LNCS, vol. 4673, pp. 374–382. Springer, Heidelberg (2007)
Chapter Google Scholar
Pandzic, I.S., Forchheimer, R. (eds.): MPEG-4 Facial Animation – The Standard, Implementation and Applications. John Wiley and Sons Ltd., Chichester (2002)
Google Scholar
Visser, M., Poel, M., Nijholt, A.: Classifying Visemes for Automatic Lip-reading. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS, vol. 1692, pp. 349–352. Springer, Heidelberg (1999)
Chapter Google Scholar
Yau, W., Kumar, D.K., Arjunan, S.P., Kumar, S.: Visual Speech Recognition Using Image Moments and Multi-resolution Wavelet Images. Computer Graphics, Imaging and Visualisation, 194–199 (2006)
Google Scholar
Leszczynski, M., Skarberk, W.: Viseme Recognition – A Comparative Study. In: Conference on Advanced Video and Signal Based Surveillance, pp. 287–292 (2005)
Google Scholar
Scott, K.C., Kagels, D.S., Watson, S.H., Rom, H., Wright, J.R., Lee, M., Hussey, K.J.: Synthesis of Speaker Facial Movement to Match Selected Speech Sequences. In: 5th Australian Conference on Speech, Science and Technology (1994)
Google Scholar
Potamianos, G., Neti, C., Huang, J., Connell, J.H., Chu, S., Libal, V., Marcheret, E., Haas, N., Jiang, J.: Towards Practical Deployment of Audio-Visual Speech Recognition. In: International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 777–780 (2004)
Google Scholar
Ratanamahatana, C.A., Keogh, E.: Everything you know about dynamic time war** is wrong. In: 3rd SIGKDD Workshop on Mining Temporal and Sequential Data (2004)
Google Scholar
Foo, S.W., Dong, L.: Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, Y.-C., Chang, L.-W., Hsu, C.-T. (eds.) PCM 2002. LNCS, vol. 2532, pp. 607–614. Springer, Heidelberg (2002)
Chapter Google Scholar
Silveira, L.G., Facon, J., Borges, D.L.: Visual Speech Recognition: A Solution from Feature Extraction to Words Classification. In: 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Vision Systems Group, School of Electronic Engineering and Computing, Dublin City University, Dublin, Ireland
Dahai Yu, Ovidiu Ghita, Alistair Sutherland & Paul F. Whelan

Authors

Dahai Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ovidiu Ghita
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Sutherland
View author publications
You can also search for this author in PubMed Google Scholar
Paul F. Whelan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Communication Science, Wakayama University, 930 Sakaedani, Wakayama-shi, 640 8510, Wakayama, Japan
Toshikazu Wada
Institute of Computer Science and Information Engineering, National Ilan University, No. 1, Sec. 1, Shen-Lung Rd., 26047, Yi-Lan, Taiwan, ROC
Fay Huang
Microsoft Research Asia, Bei**g Sigma Center, 5003, No. 49, Zhichun Road, 100190, Bei**g, PR China
Stephen Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, D., Ghita, O., Sutherland, A., Whelan, P.F. (2009). A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition. In: Wada, T., Huang, F., Lin, S. (eds) Advances in Image and Video Technology. PSIVT 2009. Lecture Notes in Computer Science, vol 5414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92957-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-92957-4_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92956-7
Online ISBN: 978-3-540-92957-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Speech Recognition Using Spectrogram-Based Visual Features

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition

A Survey on Different Visual Speech Recognition Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Speech Recognition Using Spectrogram-Based Visual Features

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition

A Survey on Different Visual Speech Recognition Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation