An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms

  • Conference paper
  • First Online:
Proceedings of the 2nd International Conference on Big Data, IoT and Machine Learning (BIM 2023)

Abstract

Voice command-based human-computer interaction (HCI) is becoming useful and practical day by day. Here, we present an open-source voice command-based speech interaction system featuring hands-free interactions of mouse and keyboard without any active internet connection. The usefulness of the application is demonstrated by evaluating the application thoroughly kee** in mind for a motor-disabled person as well as for a normal person. Several participants of different age groups who evaluated the system found that the implemented system worked reliably and helped them complete the task with voice commands only without using mouse and keyboard. In this research, we identify common voice tokens a person would speak to accomplish a human-computer interaction, then we program the tokens to work with major speech recognition platforms such as CMU PocketSphinx, DeepSpeech, and VOSK. Different results were obtained for each platform based on detection rate, accuracy, inference time, CPU usage, system memory usage, and various age group users’ accuracy. In the results section, we present that using our proposed system, the VOSK speech recognition platform outperformed other compared platforms having a 91% successful task completion rate for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 159.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 199.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.tensorflow.org.

  2. 2.

    http://github.com/zszyellow/WER-in-Python.

  3. 3.

    https://www.w3.org/TR/2000/NOTE-jsgf-20000605/.

  4. 4.

    https://cmusphinx.github.io/wiki/tutoriallm.

  5. 5.

    http://github.com/zszyellow/WER-in-Python.

  6. 6.

    https://github.com/AdnanMahmud1/Speech-Recognition-Applications-PocketSphinx-VOSK.git.

References

  1. Rubio-Drosdov E, Dìaz-Sànchez D, Almenàrez F, Arias-Cabarcos P, Marìn A (2017) Seamless human-device interaction in the internet of things. IEEE Trans Consum Electron 63(4):490–498

    Google Scholar 

  2. Ansari JA, Sathyamurthy A, Balasubramanyam R (2016) An open voice command interface kit. IEEE Trans Hum Mach Syst 46(3):467–473

    Google Scholar 

  3. Kwon S, Kim S, Choeh JY (2016) Preprocessing for elderly speech recognition of smart devices. Comput Speech Lang 36:110–121 March

    Article  Google Scholar 

  4. Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC (2017) Silent speech recognition as an alternative communication device for persons With laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25(12):2386–2398

    Google Scholar 

  5. Sahraeian R, Van Compernolle D (2017) Crosslingual and multilingual speech recognition based on the speech manifold. IEEE/ACM Trans Audio Speech Lang Process 25(12):2301–2312

    Google Scholar 

  6. Takashima Y, Takashima R, Takiguchi T, Ariki Y (2019) Knowledge transferability between the speech data of persons With dysarthria speaking different languages for dysarthric speech recognition. IEEE Access 7:164320–164326

    Google Scholar 

  7. Shmyrev N (2022) CMUSphinx. [online] CMUSphinx open source speech recognition. Available at https://cmusphinx.github.io. Accessed 22 Aug 2022

  8. VOSK (2022) VOSK Offline speech recognition API. [online] Available at: https://alphacephei.com/vosk/. Accessed 22 Aug 2022

  9. DeepSpeech (2022) DeepSpeech’s documentation. [online] Available at: https://deepspeech.readthedocs.io/en/r0.9/. Accessed 22 Aug 2022

  10. Lee L-M (2016) Piecewise polynomial high-order Hidden Markov Models with applications in speech recognition. In: IEEE International conference on computer and information technology

    Google Scholar 

  11. Cheng G, Miao H, Yang R, Deng K, Yan Y (2022) ETEH: unified attention-based end-to-end ASR and KWS architecture. IEEE/ACM Trans Audio Speech Lang Process 30:1360–1373

    Google Scholar 

  12. Keefer R,, Liu Y, Bourbakis N (2013) The development and evaluation of an eyes-free interaction model for mobile reading devices. IEEE Trans Hum Mach Syst 43(1):76–91

    Google Scholar 

  13. Garg I, Solanki H, Verma S (2020) Automation and presentation of word document using speech recognition. In: 2020 International conference for emerging technology (INCET), 03 Aug 2020

    Google Scholar 

  14. Lin Y, Guo D, Zhang J, Chen Z, Yang B (2021) A unified framework for multilingual speech recognition in air traffic control Systems. IEEE Trans Neural Networks Learn Syst 32(8):3608–3620

    Google Scholar 

  15. Messaoudia A, Haddada H, Fouratia C, Hmidaa MB, Ben A, Mabrouka E, Graietb M (2021) Tunisian dialectal end-to-end speech recognition based on DeepSpeech. In: 5th International conference on AI in computational linguistics

    Google Scholar 

  16. Carreras A, Morenza-Cinos M, Pous R, Melià-Seguì J, Nur K, Oliver J, De Porrata-Doria R (2013) STORE VIEW: Pervasive RFID; Indoor Navigation Based Retail Inventory Management. In: Proceedings Of The 2013 ACM conference on pervasive and ubiquitous computing adjunct publication, pp 1037–1042 (2013), https://doi.org/10.1145/2494091.2496011

  17. Nur K, Rashid Z, Pous R (2015) A smartphone application for voice browsing RFID smart shelves. In: The 14th International conference on mobile and ubiquitous multimedia (MUM 2015), Linz, Austria

    Google Scholar 

  18. Leem S-G, Yoo I-C, Yook D (2019) Multitask learning of deep neural network-based keyword spotting for IoT Devices. 65(2), 188–194

    Google Scholar 

  19. Tabani H, Arnau J, Tubella J, González A (2018) Performance analysis and optimization of automatic speech recognition. IEEE Trans Multi-Scale Comput Syst 4(4):847–860

    Google Scholar 

  20. Tachioka Y, Narita T (2017) Optimal automatic speech recognition system selection for noisy environments. In: 2016 Asia-Pacific Signal and Information Processing Association, Annual Summit and Conference (APSIPA), 19 Jan 2017

    Google Scholar 

  21. Xu J, Tan X, Ren Y, Qin T, Li J, Zhao S, Liu T-Y (2020) LRSpeech: extremely low-resource speech synthesis and recognition. In: ACM SIGKDD International conference on knowledge discovery and data mining

    Google Scholar 

  22. Tong S, Garner PN, Bourlard H (2018) Cross-lingual adaptation of a CTC-based multilingual acoustic model. Speech Commun 104:39–46 November

    Article  Google Scholar 

  23. Dahl GE, Yu D, Li D, Acero A (2001) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42

    Google Scholar 

  24. Martins V, Queiroz L, Brito A, Barbosa D, Silva T, Patricia A, Magalhaes F, Souza J, Campos J (2021) Comparing Pocketsphinx and Vosk recognition in human speech decoding. In: IV Brazilian humanoid robot workshop (BRAHUR) and the V Brazilian workshop on service robotics (BRASERO), 15 July 2021

    Google Scholar 

  25. Huggins-Daines D, Kumar M, Chan A, Black A, Ravishankar M, Rudnicky A (2006) Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International conference on acoustics speech and signal processing proceedings, vol 1, p I-I

    Google Scholar 

  26. Ramunyisi N, Badenhorst J, Moors C, Gumede T (2018) Rapid development of a command and control interface for smart office environments. In: Proceedings of the annual conference of the South African Institute of Computer Scientists and Information Technologists, pp 188–194

    Google Scholar 

  27. Hair A, Ballard K, Ahmed B, Gutierrez-Osuna R (2019) Evaluating automatic speech recognition for child speech therapy applications. In: Proceedings of the 21st International ACM SIGACCESS conference on computers and accessibility, pp 578-580. https://doi.org/10.1145/3308561.3354606

  28. Ferland F, Chauvin R, LĂ©tourneau D, Michaud F (2014) Hello robot can you come here? Using ROS4iOS to provide remote perceptual capabilities for visual location, speech and speaker recognition. In: Proceedings of the 2014 ACM/IEEE international conference on human-robot interaction, p 101. https://doi.org/10.1145/2559636.2559639

  29. Gao Y, Srivastava B, Salsman J (2018) Spoken english intelligibility remediation with pocketsphinx alignment and feature extraction improves substantially over the state of the art. In: 2nd IEEE Advanced information management, communicates, electronic and automation control conference (IMCEC), pp 924–927

    Google Scholar 

  30. Pant A, Wu K, Tseng Y (2020) Speak to action: offline and hybrid language recognition on embedded board for smart control system. In: 2020 International computer symposium (ICS), pp 85–90

    Google Scholar 

  31. Introduction. Deep Neural Networks in Kaldi. [online] Available at: https://kaldi-asr.org/doc/dnn.html. Accessed 28 Oct 2022

  32. Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A et al (2014) Deep speech: scaling up end-to-end speech recognition. Ar**v Preprint Ar**v:1412.5567

  33. Hearing Health Foundation (2022) Decibel levels—measuring dangerous noise—Hearing Health Foundation. [online] Available at: https://hearinghealthfoundation.org/decibel-levels. Accessed 11 Oct 2022

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adnan Mahmud Fuad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fuad, A.M., Ahmed, S.J., Anannya, N.J., Mridha, M.F., Nur, K. (2024). An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms. In: Arefin, M.S., Kaiser, M.S., Bhuiyan, T., Dey, N., Mahmud, M. (eds) Proceedings of the 2nd International Conference on Big Data, IoT and Machine Learning. BIM 2023. Lecture Notes in Networks and Systems, vol 867. Springer, Singapore. https://doi.org/10.1007/978-981-99-8937-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8937-9_36

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8936-2

  • Online ISBN: 978-981-99-8937-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation