Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Boulal, Hossam; Hamidi, Mohamed; Abarkan, Mustapha; Barkani, Jamal

doi:10.1007/s10772-024-10100-0

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Published: 15 April 2024

Volume 27, pages 287–296, (2024)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hossam Boulal¹,
Mohamed Hamidi ORCID: orcid.org/0000-0003-2487-5517^1,2,
Mustapha Abarkan¹ &
…
Jamal Barkani¹

105 Accesses
Explore all metrics

Abstract

The field of speech recognition makes it simpler for humans and machines to engage with speech. Number-oriented communication, such as using a registration code, mobile number, score, or account number, can benefit from speech recognition for digits. This paper presents our Amazigh automatic speech recognition (ASR) experience based on the deep learning approach. The convolutional neural network (CNN) and Mel spectrogram are exploited to evaluate audio samples and produce spectrograms as a part of the deep learning strategy. To attempt the recognition of the Amazigh numerals, we use a database that includes digits ranging from zero to nine collected from 42 native speakers in total, men and women between the ages of 20 and 40. Our experimental results show that spoken digits in Amazigh can be identified with a maximum accuracy of 93.62%, 94% Precision, and 94% Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid feature-extracted deep CNN with reduced parameters substitutes an End-to-End CNN for the recognition of spoken Bengali digits

Article 09 May 2023

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

Article 15 September 2023

A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network

Data availability

The data supporting this research is not available for public access.

References

Abdul, Z. K., & Al-Talabani, A. K. (2022). Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, 122136.
Article Google Scholar
Abdullah, A. A., Hassan, M. M., & Mustafa, Y. T. (2022). A review on Bayesian deep learning in healthcare: Applications and challenges. IEEE Access, 10, 36538.
Article Google Scholar
Alanazi, R., Alhazmi, F., Chung, H., & Nah, Y. (2020). A multi-optimization technique for improvement of hadoop performance with a dynamic job execution method based on artificial neural network. SN Computer Science, 1, 1–11.
Article Google Scholar
Anggraeni, D., Sanjaya, W. S. M., Nurasyidiek, M. Y. S., & Munawwaroh, M. (2018). The implementation of speech recognition using Mel-frequency cepstrum coefficients (mfcc) and support vector machine (svm) method based on python to control robot arm. IOP Conference Series: Materials Science and Engineering, 288, 012042.
Article Google Scholar
Barkani, F., Satori, H., Hamidi, M., Zealouk, O., & Laaidi, N. (2020). Amazigh speech recognition embedded system. In 2020 1st international conference on innovative research in applied science, engineering and technology (IRASET) (pp. 1–5). IEEE.
Boukous, A. (2014). The planning of standardizing Amazigh language the Moroccan experience. Iles d'Imesli, 6, 7–23.
Google Scholar
Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2023). Amazigh spoken digit recognition using a deep learning approach based on mfcc. International Journal of Electrical and Computer Engineering Systems, 14(7), 791–798.
Article Google Scholar
Daouad, M., Allah, F. A., & Dadi, E. W. (2023). An automatic speech recognition system for isolated Amazigh word using 1d & 2d cnn-lstm architecture. International Journal of Speech Technology, 26(3), 775–787.
Article Google Scholar
El Ghazi, A., Daoui, C., & Idrissi, N. (2014). Automatic speech recognition for tamazight enchained digits. World Journal Control Science and Engineering, 2(1), 1–5.
Google Scholar
El Ghazi, A., Daoui, C., Idrissi, N., Fakir, M., & Bouikhalene, B. (2011). Système de reconnaissance automatique de la parole amazigh à base de la transcription en alphabet tifinagh. Revue Méditerranéenne des Télécommunications, 1, 2.
Google Scholar
Fendji, J. L., Tala, D. C., Yenke, B. O., & Atemkeng, M. (2022). Automatic speech recognition using limited vocabulary: A survey. Applied Artificial Intelligence, 36(1), 2095039.
Article Google Scholar
Gunawan, K. W., Hidayat, A. A., Cenggoro, T. W., & Pardamean, B. (2021). A transfer learning strategy for owl sound classification by using image classification model with audio spectrogram. International Journal on Electrical Engineering and Informatics, 13(3), 546–553.
Article Google Scholar
Hamidi, M., Satori, H., Zealouk, O., & Satori, K. (2020). Amazigh digits through interactive speech recognition system in noisy environment. International Journal of Speech Technology, 23(1), 101–109.
Article Google Scholar
Hamidi, M., Satori, H., Zealouk, O., Satori, K., & Laaidi, N. (2018). Interactive voice response server voice network administration using hidden Markov model speech recognition system. In 2018 Second world conference on smart trends in systems, security and sustainability (WorldS4) (pp. 16–21). IEEE.
Hashemi, M., Mirrashid, A., & Shirazi, A. B. (2020). Driver safety development: Real-time driver drowsiness detection system based on convolutional neural network. SN Computer Science, 1, 1–10.
Article Google Scholar
Lounnas, K., Abbas, M., Lichouri, M., Hamidi, M., Satori, H., & Teffahi, H. (2022). Enhancement of spoken digits recognition for under-resourced languages: Case of Algerian and Moroccan dialects. International Journal of Speech Technology, 25(2), 443–455.
Article Google Scholar
Ouissam, Z., Mohamed, H., & Hassan, S. (2022). Investigation on speech recognition accuracy via sphinx toolkits. In 2022 2nd international conference on innovative research in applied science, engineering and technology (IRASET) (pp. 1–6). IEEE.
Parvin, T., Sharif, O., & Hoque, M. M. (2022). Multi-class textual emotion categorization using ensemble of convolutional and recurrent neural network. SN Computer Science, 3(1), 62.
Article Google Scholar
Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using cmu tools. International Journal of Speech Technology, 17, 235–243.
Article Google Scholar
Tailor, J. H., Rakholia, R., Saini, J. R., & Kotecha, K. (2022). Deep learning approach for spoken digit recognition in Gujarati language. International Journal of Advanced Computer Science and Applications, 13, 4.
Article Google Scholar
Telmem, M., & Ghanou, Y. (2018). Estimation of the optimal hmm parameters for Amazigh speech recognition system using cmu-sphinx. Procedia Computer Science, 127, 92–101.
Article Google Scholar
Telmem, M., & Ghanou, Y. (2021). The convolutional neural networks for Amazigh speech recognition system. TELKOMNIKA (Telecommunication Computing Electronics and Control), 19(2), 515–522.
Article Google Scholar
Zhou, Q., Shan, J., Ding, W., Wang, C., Yuan, S., Sun, F., Li, H., & Fang, B. (2021). Cough recognition based on mel-spectrogram and convolutional neural network. Frontiers in Robotics and AI, 8, 580080.
Article Google Scholar

Download references

Acknowledgements

The authors hereby confirm that all the Figures and Tables in the manuscript are ours. Besides, the Figures and images, which are not ours, have been given permission for re-publication attached with the manuscript.

Author information

Authors and Affiliations

LSI Laboratory, FP Taza, USMBA University, Taza, Morocco
Hossam Boulal, Mohamed Hamidi, Mustapha Abarkan & Jamal Barkani
Team of Modeling and Scientific Computing, FPN, UMP, Nador, Morocco
Mohamed Hamidi

Authors

Hossam Boulal
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Hamidi
View author publications
You can also search for this author in PubMed Google Scholar
Mustapha Abarkan
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Barkani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Hamidi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Boulal, H., Hamidi, M., Abarkan, M. et al. Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method. Int J Speech Technol 27, 287–296 (2024). https://doi.org/10.1007/s10772-024-10100-0

Download citation

Received: 10 December 2023
Accepted: 18 March 2024
Published: 15 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10772-024-10100-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A hybrid feature-extracted deep CNN with reduced parameters substitutes an End-to-End CNN for the recognition of spoken Bengali digits

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A hybrid feature-extracted deep CNN with reduced parameters substitutes an End-to-End CNN for the recognition of spoken Bengali digits

Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

A Novel Bangla Spoken Numerals Recognition System Using Convolutional Neural Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation