Abstract
Voice command-based human-computer interaction (HCI) is becoming useful and practical day by day. Here, we present an open-source voice command-based speech interaction system featuring hands-free interactions of mouse and keyboard without any active internet connection. The usefulness of the application is demonstrated by evaluating the application thoroughly kee** in mind for a motor-disabled person as well as for a normal person. Several participants of different age groups who evaluated the system found that the implemented system worked reliably and helped them complete the task with voice commands only without using mouse and keyboard. In this research, we identify common voice tokens a person would speak to accomplish a human-computer interaction, then we program the tokens to work with major speech recognition platforms such as CMU PocketSphinx, DeepSpeech, and VOSK. Different results were obtained for each platform based on detection rate, accuracy, inference time, CPU usage, system memory usage, and various age group users’ accuracy. In the results section, we present that using our proposed system, the VOSK speech recognition platform outperformed other compared platforms having a 91% successful task completion rate for real-time applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Rubio-Drosdov E, Dìaz-Sà nchez D, Almenà rez F, Arias-Cabarcos P, Marìn A (2017) Seamless human-device interaction in the internet of things. IEEE Trans Consum Electron 63(4):490–498
Ansari JA, Sathyamurthy A, Balasubramanyam R (2016) An open voice command interface kit. IEEE Trans Hum Mach Syst 46(3):467–473
Kwon S, Kim S, Choeh JY (2016) Preprocessing for elderly speech recognition of smart devices. Comput Speech Lang 36:110–121 March
Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC (2017) Silent speech recognition as an alternative communication device for persons With laryngectomy. IEEE/ACM Trans Audio Speech Lang Process 25(12):2386–2398
Sahraeian R, Van Compernolle D (2017) Crosslingual and multilingual speech recognition based on the speech manifold. IEEE/ACM Trans Audio Speech Lang Process 25(12):2301–2312
Takashima Y, Takashima R, Takiguchi T, Ariki Y (2019) Knowledge transferability between the speech data of persons With dysarthria speaking different languages for dysarthric speech recognition. IEEE Access 7:164320–164326
Shmyrev N (2022) CMUSphinx. [online] CMUSphinx open source speech recognition. Available at https://cmusphinx.github.io. Accessed 22 Aug 2022
VOSK (2022) VOSK Offline speech recognition API. [online] Available at: https://alphacephei.com/vosk/. Accessed 22 Aug 2022
DeepSpeech (2022) DeepSpeech’s documentation. [online] Available at: https://deepspeech.readthedocs.io/en/r0.9/. Accessed 22 Aug 2022
Lee L-M (2016) Piecewise polynomial high-order Hidden Markov Models with applications in speech recognition. In: IEEE International conference on computer and information technology
Cheng G, Miao H, Yang R, Deng K, Yan Y (2022) ETEH: unified attention-based end-to-end ASR and KWS architecture. IEEE/ACM Trans Audio Speech Lang Process 30:1360–1373
Keefer R,, Liu Y, Bourbakis N (2013) The development and evaluation of an eyes-free interaction model for mobile reading devices. IEEE Trans Hum Mach Syst 43(1):76–91
Garg I, Solanki H, Verma S (2020) Automation and presentation of word document using speech recognition. In: 2020 International conference for emerging technology (INCET), 03 Aug 2020
Lin Y, Guo D, Zhang J, Chen Z, Yang B (2021) A unified framework for multilingual speech recognition in air traffic control Systems. IEEE Trans Neural Networks Learn Syst 32(8):3608–3620
Messaoudia A, Haddada H, Fouratia C, Hmidaa MB, Ben A, Mabrouka E, Graietb M (2021) Tunisian dialectal end-to-end speech recognition based on DeepSpeech. In: 5th International conference on AI in computational linguistics
Carreras A, Morenza-Cinos M, Pous R, Melià -Seguì J, Nur K, Oliver J, De Porrata-Doria R (2013) STORE VIEW: Pervasive RFID; Indoor Navigation Based Retail Inventory Management. In: Proceedings Of The 2013 ACM conference on pervasive and ubiquitous computing adjunct publication, pp 1037–1042 (2013), https://doi.org/10.1145/2494091.2496011
Nur K, Rashid Z, Pous R (2015) A smartphone application for voice browsing RFID smart shelves. In: The 14th International conference on mobile and ubiquitous multimedia (MUM 2015), Linz, Austria
Leem S-G, Yoo I-C, Yook D (2019) Multitask learning of deep neural network-based keyword spotting for IoT Devices. 65(2), 188–194
Tabani H, Arnau J, Tubella J, González A (2018) Performance analysis and optimization of automatic speech recognition. IEEE Trans Multi-Scale Comput Syst 4(4):847–860
Tachioka Y, Narita T (2017) Optimal automatic speech recognition system selection for noisy environments. In: 2016 Asia-Pacific Signal and Information Processing Association, Annual Summit and Conference (APSIPA), 19 Jan 2017
Xu J, Tan X, Ren Y, Qin T, Li J, Zhao S, Liu T-Y (2020) LRSpeech: extremely low-resource speech synthesis and recognition. In: ACM SIGKDD International conference on knowledge discovery and data mining
Tong S, Garner PN, Bourlard H (2018) Cross-lingual adaptation of a CTC-based multilingual acoustic model. Speech Commun 104:39–46 November
Dahl GE, Yu D, Li D, Acero A (2001) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Martins V, Queiroz L, Brito A, Barbosa D, Silva T, Patricia A, Magalhaes F, Souza J, Campos J (2021) Comparing Pocketsphinx and Vosk recognition in human speech decoding. In: IV Brazilian humanoid robot workshop (BRAHUR) and the V Brazilian workshop on service robotics (BRASERO), 15 July 2021
Huggins-Daines D, Kumar M, Chan A, Black A, Ravishankar M, Rudnicky A (2006) Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: 2006 IEEE International conference on acoustics speech and signal processing proceedings, vol 1, p I-I
Ramunyisi N, Badenhorst J, Moors C, Gumede T (2018) Rapid development of a command and control interface for smart office environments. In: Proceedings of the annual conference of the South African Institute of Computer Scientists and Information Technologists, pp 188–194
Hair A, Ballard K, Ahmed B, Gutierrez-Osuna R (2019) Evaluating automatic speech recognition for child speech therapy applications. In: Proceedings of the 21st International ACM SIGACCESS conference on computers and accessibility, pp 578-580. https://doi.org/10.1145/3308561.3354606
Ferland F, Chauvin R, LĂ©tourneau D, Michaud F (2014) Hello robot can you come here? Using ROS4iOS to provide remote perceptual capabilities for visual location, speech and speaker recognition. In: Proceedings of the 2014 ACM/IEEE international conference on human-robot interaction, p 101. https://doi.org/10.1145/2559636.2559639
Gao Y, Srivastava B, Salsman J (2018) Spoken english intelligibility remediation with pocketsphinx alignment and feature extraction improves substantially over the state of the art. In: 2nd IEEE Advanced information management, communicates, electronic and automation control conference (IMCEC), pp 924–927
Pant A, Wu K, Tseng Y (2020) Speak to action: offline and hybrid language recognition on embedded board for smart control system. In: 2020 International computer symposium (ICS), pp 85–90
Introduction. Deep Neural Networks in Kaldi. [online] Available at: https://kaldi-asr.org/doc/dnn.html. Accessed 28 Oct 2022
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A et al (2014) Deep speech: scaling up end-to-end speech recognition. Ar**v Preprint Ar**v:1412.5567
Hearing Health Foundation (2022) Decibel levels—measuring dangerous noise—Hearing Health Foundation. [online] Available at: https://hearinghealthfoundation.org/decibel-levels. Accessed 11 Oct 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fuad, A.M., Ahmed, S.J., Anannya, N.J., Mridha, M.F., Nur, K. (2024). An Open-Source Voice Command-Based Human-Computer Interaction System Using Speech Recognition Platforms. In: Arefin, M.S., Kaiser, M.S., Bhuiyan, T., Dey, N., Mahmud, M. (eds) Proceedings of the 2nd International Conference on Big Data, IoT and Machine Learning. BIM 2023. Lecture Notes in Networks and Systems, vol 867. Springer, Singapore. https://doi.org/10.1007/978-981-99-8937-9_36
Download citation
DOI: https://doi.org/10.1007/978-981-99-8937-9_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8936-2
Online ISBN: 978-981-99-8937-9
eBook Packages: EngineeringEngineering (R0)