Recognition of score words in freestyle kayaking using improved DTW matching

Zhang, Qiyuan; Yuan, **aochen; Lam, Chan-Tong

doi:10.1007/s11042-024-18383-w

Recognition of score words in freestyle kayaking using improved DTW matching

Published: 20 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

26 Accesses
Explore all metrics

Abstract

Voice is the most natural information carrier for human beings, and is likely to become the main method of human–computer interaction in the future. This article focuses on the recognition of score words in freestyle kayaking, and collects words from multiple speakers, each with a specific freestyle kayak action word. In this paper, a new method using mel-scale frequency cepstral coefficients (MFCC) and improved dynamic time war** (DTW) is presented for isolated speech recognition. An endpoint detection method is proposed and implemented based on short-time energy and zero-crossing rate. After preprocessing with endpoint detection, the speech signal was analyzed and converted into speech feature parameters using MFCC. During the training phase, the signals of the training part were trained, and the labeled features were generated. During the identification phase, we improved the DTW algorithm by using multiple constraints to make path matching within the constraints more accurate. Experiments were conducted and the results showed a high recognition rate for a specific score word in freestyle kayaking. In addition, this method provides relatively good results in noisy environments with high signal-to-noise ratios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of Marathi Isolated Spoken Words Using Interpolation and DTW Techniques

Real-time pre-processing for improved feature extraction of noisy speech

Article 26 March 2021

Time warped continuous speech signal matching using Kalman filter

Article 13 May 2015

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Yadav M, Alam MA (2018) Dynamic time war** (DTW) algorithm in speech: A review. Intl J Res Electron Comput Eng 6(1):524–528
Google Scholar
Rabiner LR, Sambur MR (1975) An algorithm for determining the endpoints of isolated utterances. Bell Syst Tech J 54(2):297–315
Article Google Scholar
Lu J, Han X (2020) Novel speech endpoint detection algorithm for voice detectors in interaction of intelligent terminals. Sens Transducers 242(3):1–5
Google Scholar
Yang J, Li Z, Su P (2020) Review of speech segmentation and endpoint detection. J Comput Appl 40(1):1–7
Google Scholar
Rashid M, Abu-Bakar S, Mokji M (2013) Human emotion recognition from videos using spatio-temporal and audio features. Vis Comput 29(12):1269–1275
Article Google Scholar
Zheng Y, Gao S (2020) Speech endpoint detection based on fractal dimension with adaptive threshold. Journal of Northeastern University (Natural Science). 41(1): p. 7
Lu L, Li J, Gong Y (2022) Endpoint Detection for Streaming End-to-End Multi-Talker ASR. in ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Daneshfar F, Kabudian SJ (2020) Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm. Multimed Tools Appl 79(1):1261–1289
Article Google Scholar
Federico M, Furini M (2014) An automatic caption alignment mechanism for off-the-shelf speech recognition technologies. Multimed Tools Appl 72(1):21–40
Article Google Scholar
Huang Z et al (2017) Unsupervised domain adaptation for speech emotion recognition using PCANet. Multimed Tools Appl 76(5):6785–6799
Article Google Scholar
Naithani K, Thakkar V, Semwal A (2018) English Language Speech Recognition Using MFCC and HMM. in 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE)
Badshah AM et al (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78(5):5571–5589
Article MathSciNet Google Scholar
Fahad M et al (2021) DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circ Syst Signal Process 40(1):466–489
Article Google Scholar
Ismail A, Abdlerazek S, El-Henawy IM (2020) Development of smart healthcare system based on speech recognition using support vector machine and dynamic time war**. Sustainability 12(6):2403
Article Google Scholar
AlTalmas T et al. (2018) Analysis of two adjacent articulation Quranic letters based on MFCC and DTW. in 2018 7th International Conference on Computer and Communication Engineering (ICCCE)
Zhao J, Itti L (2018) shapeDTW: Shape dynamic time war**. Pattern Recogn 74:171–184
Article ADS Google Scholar
Agarwal G, Om H (2021) Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition. Multimed Tools Appl 80(7):9961–9992
Article Google Scholar
Hsu C-J et al (2015) Flexible dynamic time war** for time series classification. Procedia Comput Sci 51:2838–2842
Article Google Scholar
Anggraeni D et al. (2018) The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. in IOP Conference Series: Materials Science and Engineering
Liu J et al. (2021) Speech Disorders Classification in Phonetic Exams with MFCC and DTW. in 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC)
Vimala C, Radha V (2015) Isolated speech recognition system for Tamil language using statistical pattern matching and machine learning techniques. J Eng Sci Technol (JESTEC) 10(5):617–632
Google Scholar
Reif R, Walch D (2008) Augmented & Virtual Reality applications in the field of logistics. Vis Comput 24(11):987–994
Article Google Scholar
Palaz D, Collobert R (2015) Analysis of CNN-based speech recognition system using raw speech as input. in REP_WORK
Liang S, Yan W (2022) Multilingual speech recognition based on the end-to-end framework. Multimedia Tools and Applications
Palaz D, Doss MM, Collobert R (2015) Convolutional neural networks-based continuous speech recognition using raw speech signal. in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Fauziya F, Nijhawan G (2014) A Comparative study of phoneme recognition using GMM-HMM and ANN based acoustic modeling. International Journal of Computer Applications. 98(6)
Passricha V, Aggarwal RK (2018) Convolutional neural networks for raw speech recognition. IntechOpen. 21–40
Ali H et al (2014) DWT features performance analysis for automatic speech recognition of Urdu. Springerplus 3(1):204
Article PubMed PubMed Central Google Scholar
Burgos W (2014) Gammatone and MFCC features in speaker recognition. Florida Institute of Technology
Qi J et al. (2013) Auditory features based on gammatone filters for robust speech recognition. in 2013 IEEE International Symposium on Circuits and Systems (ISCAS)
Chuctaya HFC, Mercado RNM, Gaona JJG (2018) Isolated automatic speech recognition of Quechua numbers using MFCC, DTW and KNN. International Journal of Advanced Computer Science and Applications. 9(10)
Permanasari Y, Harahap EH, Ali EP (2019) Speech recognition using dynamic time war** (DTW). in J Phys
Zhang Q, Yuan X, Lam CT (2022) Recognition of Score Word in Freestyle Kayaking. in 2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE

Download references

Acknowledgements

This work is supported by the Science and Technology Development Fund of Macau SAR (Grant number 0045/2022/A), and the Research projects of the Macao Polytechnic University (Project No. RP/FCA-12/2022, RP/ESCA-03/2021).

Author information

Authors and Affiliations

Faculty of Applied Sciences, Macao Polytechnic University, Macau, 999078, China
Qiyuan Zhang, **aochen Yuan & Chan-Tong Lam

Authors

Qiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
**aochen Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Chan-Tong Lam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **aochen Yuan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 7 Summary of the datasets used in this paper with the number of participants and Number of Audio Acquisitions

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Q., Yuan, X. & Lam, CT. Recognition of score words in freestyle kayaking using improved DTW matching. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18383-w

Download citation

Received: 06 January 2023
Revised: 07 September 2023
Accepted: 19 January 2024
Published: 20 February 2024
DOI: https://doi.org/10.1007/s11042-024-18383-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of score words in freestyle kayaking using improved DTW matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recognition of Marathi Isolated Spoken Words Using Interpolation and DTW Techniques

Real-time pre-processing for improved feature extraction of noisy speech

Time warped continuous speech signal matching using Kalman filter

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Recognition of score words in freestyle kayaking using improved DTW matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recognition of Marathi Isolated Spoken Words Using Interpolation and DTW Techniques

Real-time pre-processing for improved feature extraction of noisy speech

Time warped continuous speech signal matching using Kalman filter

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation