Log in

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The field of speech recognition makes it simpler for humans and machines to engage with speech. Number-oriented communication, such as using a registration code, mobile number, score, or account number, can benefit from speech recognition for digits. This paper presents our Amazigh automatic speech recognition (ASR) experience based on the deep learning approach. The convolutional neural network (CNN) and Mel spectrogram are exploited to evaluate audio samples and produce spectrograms as a part of the deep learning strategy. To attempt the recognition of the Amazigh numerals, we use a database that includes digits ranging from zero to nine collected from 42 native speakers in total, men and women between the ages of 20 and 40. Our experimental results show that spoken digits in Amazigh can be identified with a maximum accuracy of 93.62%, 94% Precision, and 94% Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data supporting this research is not available for public access.

References

  • Abdul, Z. K., & Al-Talabani, A. K. (2022). Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, 122136.

    Article  Google Scholar 

  • Abdullah, A. A., Hassan, M. M., & Mustafa, Y. T. (2022). A review on Bayesian deep learning in healthcare: Applications and challenges. IEEE Access, 10, 36538.

    Article  Google Scholar 

  • Alanazi, R., Alhazmi, F., Chung, H., & Nah, Y. (2020). A multi-optimization technique for improvement of hadoop performance with a dynamic job execution method based on artificial neural network. SN Computer Science, 1, 1–11.

    Article  Google Scholar 

  • Anggraeni, D., Sanjaya, W. S. M., Nurasyidiek, M. Y. S., & Munawwaroh, M. (2018). The implementation of speech recognition using Mel-frequency cepstrum coefficients (mfcc) and support vector machine (svm) method based on python to control robot arm. IOP Conference Series: Materials Science and Engineering, 288, 012042.

    Article  Google Scholar 

  • Barkani, F., Satori, H., Hamidi, M., Zealouk, O., & Laaidi, N. (2020). Amazigh speech recognition embedded system. In 2020 1st international conference on innovative research in applied science, engineering and technology (IRASET) (pp. 1–5). IEEE.

  • Boukous, A. (2014). The planning of standardizing Amazigh language the Moroccan experience. Iles d'Imesli, 6, 7–23.

    Google Scholar 

  • Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2023). Amazigh spoken digit recognition using a deep learning approach based on mfcc. International Journal of Electrical and Computer Engineering Systems, 14(7), 791–798.

    Article  Google Scholar 

  • Daouad, M., Allah, F. A., & Dadi, E. W. (2023). An automatic speech recognition system for isolated Amazigh word using 1d & 2d cnn-lstm architecture. International Journal of Speech Technology, 26(3), 775–787.

    Article  Google Scholar 

  • El Ghazi, A., Daoui, C., & Idrissi, N. (2014). Automatic speech recognition for tamazight enchained digits. World Journal Control Science and Engineering, 2(1), 1–5.

    Google Scholar 

  • El Ghazi, A., Daoui, C., Idrissi, N., Fakir, M., & Bouikhalene, B. (2011). Système de reconnaissance automatique de la parole amazigh à base de la transcription en alphabet tifinagh. Revue Méditerranéenne des Télécommunications, 1, 2.

    Google Scholar 

  • Fendji, J. L., Tala, D. C., Yenke, B. O., & Atemkeng, M. (2022). Automatic speech recognition using limited vocabulary: A survey. Applied Artificial Intelligence, 36(1), 2095039.

    Article  Google Scholar 

  • Gunawan, K. W., Hidayat, A. A., Cenggoro, T. W., & Pardamean, B. (2021). A transfer learning strategy for owl sound classification by using image classification model with audio spectrogram. International Journal on Electrical Engineering and Informatics, 13(3), 546–553.

    Article  Google Scholar 

  • Hamidi, M., Satori, H., Zealouk, O., & Satori, K. (2020). Amazigh digits through interactive speech recognition system in noisy environment. International Journal of Speech Technology, 23(1), 101–109.

    Article  Google Scholar 

  • Hamidi, M., Satori, H., Zealouk, O., Satori, K., & Laaidi, N. (2018). Interactive voice response server voice network administration using hidden Markov model speech recognition system. In 2018 Second world conference on smart trends in systems, security and sustainability (WorldS4) (pp. 16–21). IEEE.

  • Hashemi, M., Mirrashid, A., & Shirazi, A. B. (2020). Driver safety development: Real-time driver drowsiness detection system based on convolutional neural network. SN Computer Science, 1, 1–10.

    Article  Google Scholar 

  • Lounnas, K., Abbas, M., Lichouri, M., Hamidi, M., Satori, H., & Teffahi, H. (2022). Enhancement of spoken digits recognition for under-resourced languages: Case of Algerian and Moroccan dialects. International Journal of Speech Technology, 25(2), 443–455.

    Article  Google Scholar 

  • Ouissam, Z., Mohamed, H., & Hassan, S. (2022). Investigation on speech recognition accuracy via sphinx toolkits. In 2022 2nd international conference on innovative research in applied science, engineering and technology (IRASET) (pp. 1–6). IEEE.

  • Parvin, T., Sharif, O., & Hoque, M. M. (2022). Multi-class textual emotion categorization using ensemble of convolutional and recurrent neural network. SN Computer Science, 3(1), 62.

    Article  Google Scholar 

  • Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using cmu tools. International Journal of Speech Technology, 17, 235–243.

    Article  Google Scholar 

  • Tailor, J. H., Rakholia, R., Saini, J. R., & Kotecha, K. (2022). Deep learning approach for spoken digit recognition in Gujarati language. International Journal of Advanced Computer Science and Applications, 13, 4.

    Article  Google Scholar 

  • Telmem, M., & Ghanou, Y. (2018). Estimation of the optimal hmm parameters for Amazigh speech recognition system using cmu-sphinx. Procedia Computer Science, 127, 92–101.

    Article  Google Scholar 

  • Telmem, M., & Ghanou, Y. (2021). The convolutional neural networks for Amazigh speech recognition system. TELKOMNIKA (Telecommunication Computing Electronics and Control), 19(2), 515–522.

    Article  Google Scholar 

  • Zhou, Q., Shan, J., Ding, W., Wang, C., Yuan, S., Sun, F., Li, H., & Fang, B. (2021). Cough recognition based on mel-spectrogram and convolutional neural network. Frontiers in Robotics and AI, 8, 580080.

    Article  Google Scholar 

Download references

Acknowledgements

The authors hereby confirm that all the Figures and Tables in the manuscript are ours. Besides, the Figures and images, which are not ours, have been given permission for re-publication attached with the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Hamidi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boulal, H., Hamidi, M., Abarkan, M. et al. Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method. Int J Speech Technol 27, 287–296 (2024). https://doi.org/10.1007/s10772-024-10100-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-024-10100-0

Keywords

Navigation