Log in

Parallel and streaming wavelet neural networks for classification and regression under apache spark

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Wavelet neural networks (WNN) have been applied in many fields to solve regression as well as classification problems. After the advent of big data, as data gets generated at a brisk pace, it is imperative to analyze it as soon as it is generated owing to the fact that the nature of the data may change dramatically in short time intervals. This is necessitated by the fact that big data is all pervasive and throws computational challenges for data scientists. Therefore, in this paper, we built an efficient Scalable, Parallelized Wavelet Neural Network (SPWNN) which employs the parallel stochastic gradient algorithm (SGD) algorithm. SPWNN is designed and developed under both static and streaming environments in the horizontal parallelization framework. SPWNN is implemented by using Morlet and Gaussian functions as activation functions. This study is conducted on big datasets like gas sensor data which has more than 4 million samples and medical research data which has more than 10,000 features, which are high dimensional in nature. The experimental analysis indicates that in the static environment, SPWNN with Morlet activation function outperformed SPWNN with Gaussian on the classification datasets. However, in the case of regression, there is no clear trend was observed. In contrast, in the streaming environment i.e., Gaussian outperformed Morlet on the classification and Morlet outperformed Gaussian on the regression datasets. Overall, the proposed SPWNN architecture achieved a speedup of 1.22\(-\)1.78.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The data analysed here is taken from public domain. We provided the URLs for the data sources.

References

  1. Grossmann, A., Morlet, J.: Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J. Math. Anal. 15, 723–736 (1984)

    Article  MathSciNet  Google Scholar 

  2. Chen, Y., Yang, B., Dong, J.: Time-series prediction using a local linear wavelet neural network. Neurocomputing 69, 449–465 (2006)

    Article  Google Scholar 

  3. De Silva, D., Vithanage, H., Fernando, K. & Piyatilake, I. T. S. Multi-path learnable wavelet neural network for image classification. Twelfth International Conference on Machine Vision, ICMV 2019 11433, 114331O (2020)

  4. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18, 602–610 (2005)

    Article  Google Scholar 

  5. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Magaz. 29, 82–97 (2012)

    Article  Google Scholar 

  6. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR, vol. 37, pp. 2048–2057 (2015)

  7. Walczak, S.: An empirical analysis of data requirements for financial forecasting with neural networks. J. Manag. Inform. Syst. 17, 203–222 (2001)

    Article  Google Scholar 

  8. Pati, Y.C., Krishnaprasad, P.S.: Analysis and synthesis of feedforward neural networks using discrete affine wavelet transformations. IEEE Trans. Neural Netw. 4, 73–85 (1993)

    Article  Google Scholar 

  9. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 3, 889–898 (1992)

    Article  Google Scholar 

  10. Wang, G., Guo, L., Duan, H.: Wavelet neural network using multiple wavelet functions in target threat assessment. The Sci. World J. 2013, 632437 (2013). https://doi.org/10.1155/2013/632437

    Article  Google Scholar 

  11. Ishwarappa & Anuradha, J.: A brief introduction on big data 5vs characteristics and hadoop technology. Procedia Comput. Sci. 48, 319–324 (2015)

  12. Holohan, A., Garg, A.: Collaboration online: the example of distributed computing. J. Comput.-Med. Commun. 10, 10415 (2005)

    Google Scholar 

  13. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008). https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  14. Hegde, V., Usmani, S.: Parallel and distributed deep learning. Technical Report. Stanford University, Stanford, CA (2016)

  15. Safaei, A.A.: Real-time processing of streaming big data. Real-Time Syst. 53, 1–44 (2017)

    Article  Google Scholar 

  16. Twomey, J.M., Smith, A.E.: Validation and verification. In: Artificial Neural Networks for Civil Engineers: Fundamentals and Applications, pp. 44–64 (1997)

  17. Ruder, S. An overview of gradient descent optimization algorithms. CoRR ar**v:abs/1609.04747 (2016)

  18. Zhang, J., Walter, G., Miao, Y., Lee, W.N.W.: Wavelet neural networks for function learning. IEEE Trans. Signal Process. 43, 1485–1497 (1995)

    Article  Google Scholar 

  19. Bottou, L., Lechevallier, Y., Saporta, G. (eds).: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Physica-Verlag HD, Heidelberg (2010)

  20. Zhao, P. & Zhang, T. Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v preprint ar**v:1405.3080 (2014)

  21. Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. NIPS 4, 4 (2010)

    Google Scholar 

  22. Kumar, K.V., Ravi, V., Carr, M., Kiran, N.R.: Software development cost estimation using wavelet neural networks. J. Syst. Softw. 81, 1853–1867 (2008)

    Article  Google Scholar 

  23. Ramana, R.V., Krishna, B., Kumar, S., Pandey, N.: Monthly rainfall prediction using wavelet neural network analysis. Water Res. Manag. 27, 3697–3711 (2013)

    Article  Google Scholar 

  24. Yilmaz, S., Oysal, Y.: Fuzzy wavelet neural network models for prediction and identification of dynamical systems. IEEE Trans. Neural Netw. 21, 1599–1609 (2010)

    Article  Google Scholar 

  25. Sarath, D., Ravi, V.: Wavelet neural network for big data analytics in banking via GPU. In: Handbook of Big Data Analytics: Applications in ICT, Security and Business Analytics, vol. 2, p. 273 (2021)

  26. Zhang, J., De Sa, C., Mitliagkas, I. & Ré, C. Parallel sgd: When does averaging help? ar**v preprint ar**v:1606.07365 (2016)

  27. Robbins, H., Monro, S.A.: Stochastic approximation method. The Annals Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  28. Kennedy, R.K., Khoshgoftaar, T.M., Villanustre, F., Humphrey, T.: A parallel and distributed stochastic gradient descent implementation using commodity clusters. J. Big Data 6, 1–23 (2019)

    Article  Google Scholar 

  29. Venkatesan, N.J., Nam, C.S., Kim, E., Shin, D.R., et al.: Analysis of real-time data with spark streaming. J. Adv. Technol. Eng. Res. 3, 108–116 (2017)

    Google Scholar 

  30. Blamey, B., Hellander, A. & Toor, S. Apache spark streaming, kafka and harmonicio: a performance benchmark and architecture comparison for enterprise and scientific computing. International Symposium on Benchmarking, Measuring and Optimization 335–347 (2019)

  31. Apache spark. https://spark.apache.org/, note = Retrieved on January 26 2021

  32. Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets. Retrieved on March 27, 2021

  33. Openml open source datasets. https://www.openml.org/home. Retrieved on March 27, 2021

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

HVE: Methodology, Software, Validation, Formal Analysis, Investigation, Data curation, Writing-original draft, Visualization. VY: Methodology, Software, Validation, Formal Analysis, Investigation, Data curation, Writing-original draft. VR: Conceptualization, Methodology, Validation, Formal Analysis, Investigation, Writing-original draft, Writing-Review and editing, Resources, Visualization, Supervision, Project administration. OSS: Methodology, Software,Investigation, Data curation.

Corresponding author

Correspondence to Vadlamani Ravi.

Ethics declarations

Conflict of interest

The authors have no conflict of interest whatsoever.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

See Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23.

Fig. 4
figure 4

Sensitivity obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 5
figure 5

Specitivity obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 6
figure 6

AUC obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 7
figure 7

Sensitivity obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 8
figure 8

Specitivity obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 9
figure 9

AUC obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 10
figure 10

Sensitivity obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 11
figure 11

Specitivity obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 12
figure 12

AUC obtained by SPWNN-Gaussian on OVA Uterus Dataset

Fig. 13
figure 13

Sensitivity obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 14
figure 14

Specitivity obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 15
figure 15

AUC obtained by SPWNN-Morlet on OVA Uterus Dataset

Fig. 16
figure 16

Error rate obtained for Methane Concentration by SPWNN-Gaussian on Ethylene Methane Dataset

Fig. 17
figure 17

Error rate obtained for Methane Concentration by SPWNN-Morlet on Ethylene Methane Dataset

Fig. 18
figure 18

Error rate obtained for Ethylene Concentration by SPWNN-Morlet on Ethylene Methane Dataset

Fig. 19
figure 19

Error rate obtained for Ethylene Concentration by SPWNN-Morlet on Ethylene Methane Dataset

Fig. 20
figure 20

Error rate obtained for Methane Concentration by SPWNN-Gaussian on Ethylene CO Dataset

Fig. 21
figure 21

Error rate obtained for Methane Concentration by SPWNN-Morlet on Ethylene CO Dataset

Fig. 22
figure 22

Error rate obtained for Ethylene Concentration by SPWNN-Morlet on Ethylene CO Dataset

Fig. 23
figure 23

Error rate obtained for Ethylene Concentration by SPWNN-Morlet on Ethylene CO Dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eduru, H.V., Vivek, Y., Ravi, V. et al. Parallel and streaming wavelet neural networks for classification and regression under apache spark. Cluster Comput 27, 3451–3469 (2024). https://doi.org/10.1007/s10586-023-04150-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-023-04150-3

Keywords

Navigation