Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1415))

  • 524 Accesses

Abstract

Large amounts of audio data are available with the advent of technology. The role of audio data is decisive in analysing the data, be it activity recognition, event detection, etc. Classification of audio stream will help us to corroborate the results obtained from other media. We trained a CNN model to classify benchmark data sets ESC-10 and ESC-50. Along with these benchmark data sets, we tried a custom data set as well. CNN is trained on extracted low-level audio features from the custom and benchmark audio snippets which are both from constrained and noisy environments. We are able to identify CNN architecture with minimum layers which works good with both benchmark and custom data set. We also experimented to detect the most influencing feature which alone is sufficient to classify the multiple classes of audio data. Classification accuracy as high as 98% is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 234.33
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 299.59
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Eyben F (2016) Real-time speech and music classification by large audio feature space extraction. Springer International Publishing, Springer theses

    Book  Google Scholar 

  2. Schuller BW (2013) Intelligent audio analysis. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  3. Paraskevas I, Chilton E (2003) Audio classification using acoustic images for retrieval from multimedia databases. In: Proceedings EC-VIP-MC 2003. 4th EURASIP conference focused on video/image processing and multimedia communications (IEEE Cat. No.03EX667), vol 1, pp 187–192. https://doi.org/10.1109/VIPMC.2003.1220460

  4. Piczak KJ, Mohaimenuzzaman Md (2015) ESC: dataset for environmental sound classification

    Google Scholar 

  5. Kumar A, Ithapu VK (2020) A sequential self teaching approach for improving generalization in sound event recognition

    Google Scholar 

  6. Kim J (2020) Urban sound tagging using multi-channel audio feature with convolutional neural networks. AI Research Lab, IVS Inc, Seoul, South Korea

    Google Scholar 

  7. Nanni L, Maguoloa G, Brahnam S, Paci M (2021) An ensemble of convolutional neural networks for audio classification

    Google Scholar 

  8. Sailor HB, Agrawal DM, Patil HA (2017) Unsupervised filterbank learning using convolutional restricted Boltzmann machine for environmental sound classification. INTERSPEECH 2017, August 2017. Stockholm, Sweden, pp 3107–3111

    Google Scholar 

  9. Huang JJ, Leanos JJA (2018) Aclnet: efficient end-to-end audio classification CNN

    Google Scholar 

  10. Wilkinghoff K (2021) On open-set classification with L3-net embeddings for machine listening applications

    Google Scholar 

  11. Tak RN, Agrawal D, Patil H (2017) Novel phase encoded mel filterbank energies for environmental sound classification

    Google Scholar 

  12. Kumar A, Khadkevich M, Fugen C (2018) Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes

    Google Scholar 

  13. Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audio visual learning

    Google Scholar 

  14. Agrawal DM, Sailor HB, Soni MH, Patil HA (2017) Novel TEO-based Gammatone features for environmental sound classification

    Google Scholar 

  15. Xu Y, Kong Q, Wang W, Plumbley MD (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 121–125. https://doi.org/10.1109/ICASSP.2018.8461975

  16. Hershey S et al (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135. https://doi.org/10.1109/ICASSP.2017.7952132

  17. Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. Multimedia Tools Appl (Springer) 78:3705–3722

    Article  Google Scholar 

  18. Nannia L, Costab YM, Luciob DR, Silla CN Jr, Brahnamd S (2017) Combining visual and acoustic features for audio classification tasks. Pattern Recogn Lett 88:49–56

    Article  Google Scholar 

  19. Prathima T, Govardhan A, Ramadevi Y (2018) Rough set based classification of audio data. In: 3rd international conference on computational intelligence & informatics (ICCII-2018), December 2018. Hyderabad, Telangana, India

    Google Scholar 

  20. https://github.com/karolpiczak/ESC-50

  21. Freesound.org

    Google Scholar 

  22. McFee B, Raffel C, Liang D, Ellis PW, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18–25

    Google Scholar 

  23. Nielsen MA (2015) Neural networks and deep learning. Determination Press

    Google Scholar 

  24. Goodfellow, Bengio Y, Courville A (2016) Deep learning. MIT Press (e-book)

    Google Scholar 

  25. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems, 3rd edn. Morgan Kaufmann Publishers. ISBN 978–0123814791

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Prathima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prathima, T., Govardhan, A., Palla, S., Sri Yagna, K. (2022). Constrained and Unconstrained Audio Classifıcation. In: Pandian, A.P., Palanisamy, R., Narayanan, M., Senjyu, T. (eds) Proceedings of Third International Conference on Intelligent Computing, Information and Control Systems. Advances in Intelligent Systems and Computing, vol 1415. Springer, Singapore. https://doi.org/10.1007/978-981-16-7330-6_75

Download citation

Publish with us

Policies and ethics

Navigation