Abstract
Wavelet neural networks (WNN) have been applied in many fields to solve regression as well as classification problems. After the advent of big data, as data gets generated at a brisk pace, it is imperative to analyze it as soon as it is generated owing to the fact that the nature of the data may change dramatically in short time intervals. This is necessitated by the fact that big data is all pervasive and throws computational challenges for data scientists. Therefore, in this paper, we built an efficient Scalable, Parallelized Wavelet Neural Network (SPWNN) which employs the parallel stochastic gradient algorithm (SGD) algorithm. SPWNN is designed and developed under both static and streaming environments in the horizontal parallelization framework. SPWNN is implemented by using Morlet and Gaussian functions as activation functions. This study is conducted on big datasets like gas sensor data which has more than 4 million samples and medical research data which has more than 10,000 features, which are high dimensional in nature. The experimental analysis indicates that in the static environment, SPWNN with Morlet activation function outperformed SPWNN with Gaussian on the classification datasets. However, in the case of regression, there is no clear trend was observed. In contrast, in the streaming environment i.e., Gaussian outperformed Morlet on the classification and Morlet outperformed Gaussian on the regression datasets. Overall, the proposed SPWNN architecture achieved a speedup of 1.22\(-\)1.78.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04150-3/MediaObjects/10586_2023_4150_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04150-3/MediaObjects/10586_2023_4150_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10586-023-04150-3/MediaObjects/10586_2023_4150_Fig3_HTML.png)
Similar content being viewed by others
Data availability
The data analysed here is taken from public domain. We provided the URLs for the data sources.
References
Grossmann, A., Morlet, J.: Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J. Math. Anal. 15, 723–736 (1984)
Chen, Y., Yang, B., Dong, J.: Time-series prediction using a local linear wavelet neural network. Neurocomputing 69, 449–465 (2006)
De Silva, D., Vithanage, H., Fernando, K. & Piyatilake, I. T. S. Multi-path learnable wavelet neural network for image classification. Twelfth International Conference on Machine Vision, ICMV 2019 11433, 114331O (2020)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18, 602–610 (2005)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Magaz. 29, 82–97 (2012)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR, vol. 37, pp. 2048–2057 (2015)
Walczak, S.: An empirical analysis of data requirements for financial forecasting with neural networks. J. Manag. Inform. Syst. 17, 203–222 (2001)
Pati, Y.C., Krishnaprasad, P.S.: Analysis and synthesis of feedforward neural networks using discrete affine wavelet transformations. IEEE Trans. Neural Netw. 4, 73–85 (1993)
Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Netw. 3, 889–898 (1992)
Wang, G., Guo, L., Duan, H.: Wavelet neural network using multiple wavelet functions in target threat assessment. The Sci. World J. 2013, 632437 (2013). https://doi.org/10.1155/2013/632437
Ishwarappa & Anuradha, J.: A brief introduction on big data 5vs characteristics and hadoop technology. Procedia Comput. Sci. 48, 319–324 (2015)
Holohan, A., Garg, A.: Collaboration online: the example of distributed computing. J. Comput.-Med. Commun. 10, 10415 (2005)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Hegde, V., Usmani, S.: Parallel and distributed deep learning. Technical Report. Stanford University, Stanford, CA (2016)
Safaei, A.A.: Real-time processing of streaming big data. Real-Time Syst. 53, 1–44 (2017)
Twomey, J.M., Smith, A.E.: Validation and verification. In: Artificial Neural Networks for Civil Engineers: Fundamentals and Applications, pp. 44–64 (1997)
Ruder, S. An overview of gradient descent optimization algorithms. CoRR ar**v:abs/1609.04747 (2016)
Zhang, J., Walter, G., Miao, Y., Lee, W.N.W.: Wavelet neural networks for function learning. IEEE Trans. Signal Process. 43, 1485–1497 (1995)
Bottou, L., Lechevallier, Y., Saporta, G. (eds).: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Physica-Verlag HD, Heidelberg (2010)
Zhao, P. & Zhang, T. Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v preprint ar**v:1405.3080 (2014)
Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. NIPS 4, 4 (2010)
Kumar, K.V., Ravi, V., Carr, M., Kiran, N.R.: Software development cost estimation using wavelet neural networks. J. Syst. Softw. 81, 1853–1867 (2008)
Ramana, R.V., Krishna, B., Kumar, S., Pandey, N.: Monthly rainfall prediction using wavelet neural network analysis. Water Res. Manag. 27, 3697–3711 (2013)
Yilmaz, S., Oysal, Y.: Fuzzy wavelet neural network models for prediction and identification of dynamical systems. IEEE Trans. Neural Netw. 21, 1599–1609 (2010)
Sarath, D., Ravi, V.: Wavelet neural network for big data analytics in banking via GPU. In: Handbook of Big Data Analytics: Applications in ICT, Security and Business Analytics, vol. 2, p. 273 (2021)
Zhang, J., De Sa, C., Mitliagkas, I. & Ré, C. Parallel sgd: When does averaging help? ar**v preprint ar**v:1606.07365 (2016)
Robbins, H., Monro, S.A.: Stochastic approximation method. The Annals Math. Stat. 22, 400–407 (1951)
Kennedy, R.K., Khoshgoftaar, T.M., Villanustre, F., Humphrey, T.: A parallel and distributed stochastic gradient descent implementation using commodity clusters. J. Big Data 6, 1–23 (2019)
Venkatesan, N.J., Nam, C.S., Kim, E., Shin, D.R., et al.: Analysis of real-time data with spark streaming. J. Adv. Technol. Eng. Res. 3, 108–116 (2017)
Blamey, B., Hellander, A. & Toor, S. Apache spark streaming, kafka and harmonicio: a performance benchmark and architecture comparison for enterprise and scientific computing. International Symposium on Benchmarking, Measuring and Optimization 335–347 (2019)
Apache spark. https://spark.apache.org/, note = Retrieved on January 26 2021
Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets. Retrieved on March 27, 2021
Openml open source datasets. https://www.openml.org/home. Retrieved on March 27, 2021
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
HVE: Methodology, Software, Validation, Formal Analysis, Investigation, Data curation, Writing-original draft, Visualization. VY: Methodology, Software, Validation, Formal Analysis, Investigation, Data curation, Writing-original draft. VR: Conceptualization, Methodology, Validation, Formal Analysis, Investigation, Writing-original draft, Writing-Review and editing, Resources, Visualization, Supervision, Project administration. OSS: Methodology, Software,Investigation, Data curation.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest whatsoever.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Eduru, H.V., Vivek, Y., Ravi, V. et al. Parallel and streaming wavelet neural networks for classification and regression under apache spark. Cluster Comput 27, 3451–3469 (2024). https://doi.org/10.1007/s10586-023-04150-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-023-04150-3