A voice command detection system for aerospace applications

Tabibian, Shima

doi:10.1007/s10772-017-9467-4

A voice command detection system for aerospace applications

Published: 26 October 2017

Volume 20, pages 1049–1061, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Shima Tabibian¹

354 Accesses
11 Citations
Explore all metrics

Abstract

Nowadays, according to ever-increasing volumes of audio content, audio processing is a vital need. In the aerospace field, voice commands could be used instead of data commands in order to speed up the command transmission, help crewmembers to complete their tasks by allowing hands-free control of supplemental equipment and as a redundant system for increasing the reliability of command transmission. In this paper, a voice command detection (VCD) framework is proposed for aerospace applications, which decodes the voice commands to comprehensible and executable commands, in an acceptable speed with a low false alarm rate. The framework is mainly based on a keyword spotting method, which extracts some pre-defined target keywords from the input voice commands. The mentioned keywords are input arguments to the proposed rule-based language model (LM). The rule-based LM decodes the voice commands based on the input keywords and their locations. Two keyword spotters are trained and used in the VCD system. The phone-based keyword spotter is trained on TIMIT database. Then, speaker adaptation methods are exploited to modify the parameters of the trained models using non-native speaker utterances. The word-based keyword spotter is trained on a database prepared and specialized for aerospace applications. The experimental results show that the word-based VCD system decodes the voice commands with true detection rate equal to 88% and false alarm rate equal to 12%, in average. Additionally, using speaker adaptation methods in the phone-based VCD system improves the true detection and false alarm rates about 21% and 21%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

References

Ahmed, A., Ahmed, T., Ullah, M., et al. (2012) Controlling and securing a digital home using multiple sensor based perception system integrated with mobile and voice technology. ar**v preprint ar**v:1209.5420.
Bahl, L., Brown, P., De Souza, P., et al. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (pp. 49–52). IEEE.
Benayed, Y., Fohr, D., Haton, J. P., et al. (2003a) Improving the performance of a keyword spotting system by using support vector machines. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03. (pp. 145–149). IEEE.
Butt, M., Khanam, M., Khiyal, M., Khan, A., et al. (2011) Controlling home appliances remotely through voice command. (IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Wireless and Mobile Networks, 35–39. doi:10.14569/SpecialIssue.2011.010206.
Chen, C.-P., Bilmes, J. A., & Kirchhoff, K. (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In Seventh International Conference on Spoken Language Processing.
Chen, G., Parada, C., & Heigold, G. (2014) Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4087–4091). IEEE.
Cornu, E., Destrez, N., Dufaux, A., et al. (2002) An ultra low power, ultra miniature voice command system based on hidden markov models. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. IV-3800–IV-3803). IEEE.
Fernández, S., Graves, A., & Schmidhuber, J. (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks (pp. 220–229). Berlin: Springer.
Google Scholar
Fezari, M., Boumaza, M. S., & Aldahoud, A. (2012) Voice command system based on pipelining classifiers GMM-HMM. In 2012 International Conference on Information Technology and e-Services (ICITeS) (pp. 1–6). IEEE.
Fezari, M., & Bousbia-Salah, M. (2006) A voice command system for autonomous robots guidance. In 9th IEEE International Workshop on Advanced Motion Control (pp. 261–265.). IEEE.
Firdaus, A. M., Yusof, R. M., Saharul, A., et al. (2015) Controlling an electric car starter system through voice. International Journal of Science & Technology Research, 4(4), 5–9.
Gupta, A., Patel, N., & Khan, S. (2014) Automatic speech recognition technique for voice command. In 2014 International Conference on Science Engineering and Management Research (ICSEMR) (pp. 1–5). IEEE.
Hoque, E., Dickerson, R. F., & Stankovic, J. A. (2014) Vocal-diary: A voice command based ground truth collection system for activity recognition. In Proceedings of the Wireless Health 2014 on National Institutes of Health (pp. 1–6). ACM.
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification (pattern recognition). IEEE Transactions on signal processing, 40, 3043–3054.
Article MATH Google Scholar
Keshet, J., Grangier, D., & Bengio, S. (2009). Discriminative keyword spotting. Speech Communication, 51, 317–329.
Article Google Scholar
Lamel, L. F., Kassel, R. H., & Seneff, S. (1989) Speech database development: Design and analysis of the acoustic-phonetic corpus. In Speech Input/Output Assessment and Speech Databases.
Li, J., Deng, L., Gong, Y., et al. (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 745–777.
Article Google Scholar
Liu, W. K., & Fung, P. N. (2000) MLLR-based accent model adaptation without accented data. In Sixth International Conference on Spoken Language Processing (ICSLP 2000), Bei**g.
Manikandan, M., Araghuram, S. D., Vignesh, S., et al. (2015). Device control using voice recognition in wireless smart home system. International Journal of Innovative Research in Computer and Communication Engineering, 3, 104–108.
Google Scholar
Manos, A. S., & Zue, V. W. (1997) A segment-based wordspotter using phonetic filler models. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (pp. 899–902). IEEE.
Morris, R. B., Whitmore, M., & Adam, S. C. (1993). How well does voice interaction work in space? IEEE Aerospace and Electronic Systems Magazine, 8, 26–31.
Article Google Scholar
Mporas, I., Ganchev, T., Siafarikas, M., et al. (2007). Comparison of speech features on the speech recognition task. Journal of Computer Science, 3, 608–616.
Article Google Scholar
Ngo, K., Spriet, A., Moonen, M., et al. (2012). A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids. Signal Processing, 92, 417–426.
Article Google Scholar
Özkartal, S. G. (2015). Development of a system for human language commands and control for a quadcopter application. Journal of Management Research, 7, 1.
Article Google Scholar
Povey, D., & Woodland, P. C. (2002) Minimum phone error and I-smoothing for improved discriminative training. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. I-105–I-108). IEEE.
Principi, E., Squartini, S., Bonfigli, R., et al. (2015). An integrated system for voice command recognition and emergency detection based on audio signals. Expert Systems with Applications, 42, 5668–5683.
Article Google Scholar
Rohlicek, J. R., Russell, W., Roukos, S., et al. (1989) Continuous hidden Markov modeling for speaker-independent word spotting. In 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989. ICASSP-89 (pp. 627–630). IEEE.
Shokri, A., Tabibian, S., Akbari, A., et al. (2011) A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In 2011 IEEE GCC Conference and Exhibition (GCC) (pp. 497–500). IEEE.
Szöke, I., Schwarz, P., Matejka, P., et al. (2005) Comparison of keyword spotting approaches for informal continuous speech. In Interspeech (pp. 633–636). Citeseer.
Tabibian, S., Akbari, A., & Nasersharif, B. (2013). Keyword spotting using an evolutionary-based classifier and discriminative features. Engineering Applications of Artificial Intelligence, 26, 1660–1670.
Article Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2014). Extension of a kernel-based classifier for discriminative spoken keyword spotting. Neural processing letters, 39, 195–218.
Article Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.
Article Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2016). A fast hierarchical search algorithm for discriminative keyword spotting. Information Sciences, 336, 45–59.
Article Google Scholar
Tranter, S., Yu, K., Everinann, G., et al. (2004) Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04) (p. I-753.). IEEE,
Vapnik, V. N., & Vapnik, V. (1998) Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vaseghi, S. V. (2008) Advanced digital signal processing and noise reduction. Hoboken: Wiley.
Book Google Scholar
Vergyri, D., Lamel, L., & Gauvain, J.-L. (2010) Automatic speech recognition of multiple accented English data (pp. 1652–1655). In INTERSPEECH.
Viikki, O., Bye, D., & Laurila, K. (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 733–736). IEEE.
Wang, R., Shen, Z., Zhang, H., & Leung, C. (2015) Follow me: A personal robotic companion system for the elderly. International Journal of Information Technology (IJIT), 21(1).
Watile, Y., Ghotkar, P., & Rohankar, B. (2015) Computer control with voice command using matlab. Computer, doi:10.17148/IJIREEICE.2015.3613.
Google Scholar
Weinstein, C. J. (1995) Military and government applications of human-machine communication by voice. Proceedings of the National Academy of Sciences 92:10011–10016.
Yoshizawa, S., Hayasaka, N., Wada, N., et al. (2004) Cepstral gain normalization for noise robust speech recognition. In Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 201, p. I-209-212). IEEE.
Young, S. J., Woodland, P., & Byrne, W. (1993) HTK: Hidden Markov Model Toolkit V1. 5. Washington D.C.: Cambridge University Engineering Department Speech Group and Entropic Research Laboratories Inc.
Google Scholar

Download references

Author information

Authors and Affiliations

Aerospace Research Institute, Ministry of Science, Research and Technology, Aerospace Research Center Lane, Mahestan Street, Iran Zamin Street, Tehran, 14657-74111, Iran
Shima Tabibian

Authors

Shima Tabibian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shima Tabibian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabibian, S. A voice command detection system for aerospace applications. Int J Speech Technol 20, 1049–1061 (2017). https://doi.org/10.1007/s10772-017-9467-4

Download citation

Received: 27 May 2017
Accepted: 03 October 2017
Published: 26 October 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10772-017-9467-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A voice command detection system for aerospace applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An approach for reducing pitch induced mismatches to detect keywords in children’s speech

Recognition of Voice Commands Using Hybrid Approach

Adversarial Command Detection Using Parallel Speech Recognition Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A voice command detection system for aerospace applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An approach for reducing pitch induced mismatches to detect keywords in children’s speech

Recognition of Voice Commands Using Hybrid Approach

Adversarial Command Detection Using Parallel Speech Recognition Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation