Log in

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Optimizing the structure of neural networks remains a hard task. If too small, the architecture does not allow for proper learning from the data, whereas if the structure is too large, learning leads to the well-known overfitting problem. This paper considers this issue, and proposes a new pruning approach to determine the optimal structure. Our algorithm is based on variance sensitivity analysis, and prunes the different types of unit (hidden neurons, inputs, and weights) sequentially. The stop criterion is based on a performance evaluation of the network results from both the learning and validation datasets. Four variants of this algorithm are proposed. These variants use two different estimators of the variance. They are tested and compared with four classical algorithms on three classification and three regression problems. The results show that the proposed algorithms outperform the classical approaches in terms of both computational time and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Rumelhart DE, McClelland JL (1986) Parallel distributed processing: exploration in the microstructure of cognition, vol 1. MIT Press, Cambridge

    Google Scholar 

  2. Han HG, Qiao JF (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357

    Article  Google Scholar 

  3. Patel MC, Panchal M (2012) A review on ensemble of diverse artificial neural networks. Int J Adv Res Comput Eng Technol 1:63–70

    Article  Google Scholar 

  4. Drucker H (2002) Effect of pruning and early stop** on performance of a boosting ensemble. Comput Stat Data Anal 38:393–406

  5. Kerlirzin P, Vallet F (1993) Robustness in multilayer perceptrons. Neural Comput 5:473–482

    Article  MATH  Google Scholar 

  6. Thomas P, Bloch G, Sirou F, Eustache V (1999) Neural modeling of an induction furnace using robust learning criteria. J Integr Comput Aided Eng 6:15–25

    MATH  Google Scholar 

  7. Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7:117–143

    Article  Google Scholar 

  8. Bartlett PL (1997) For valid generalization, the size of the weights is more important than the size of the network. Adv Neural Inf Process Systs 9:134–140. http://yaroslavvb.com/papers/bartlett-for.pdf. Accessed 10 July 2013

  9. Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861

    Google Scholar 

  10. Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford

    Google Scholar 

  11. Beigy H, Meybodi MR (2009) A learning automata-based algorithm for determination of the number of hidden units for three-layer neural networks. Int J Syst Sci 40:101–118

    Article  MATH  Google Scholar 

  12. Kruschke JH (1989) Improving generalization in backpropagation networks. Proc Int Joint Conf Neural Netw 1:443–447

    Article  Google Scholar 

  13. LeCun Y, Denker J, Solla S, Howard RE, Jackel LD (1990) Optimal brain damage. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo

    Google Scholar 

  14. Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SH, Cowan JD, Gilles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo

    Google Scholar 

  15. Reed R (1993) Pruning algorithm—a survey. IEEE Trans Neural Netw 4:740–747

    Article  Google Scholar 

  16. Jutten C, Fambon O (1995) Pruning methods: a review. Proc Eur Symp Artif Neural Netw pp 129–140

  17. Mezard M, Nadal JP (1989) Learning in feedforward neural networks: the tiling algorithm. J Phys 22:1285–1296

    MathSciNet  MATH  Google Scholar 

  18. Fahlman SE, Lebier C (1990) The cascade-correlation learning architecture. Computer Science Department paper 138. http://repository.cmu.edu/compsci/1938. Accessed 10 July 2013

  19. Yeung DY (1991) Automatic determination of network size for supervised learning. IEEE Int J Conf Neural Netw 1:158–164

    MathSciNet  MATH  Google Scholar 

  20. Chentouf R, Jutten C (1996) Combining sigmoids and radial basis function in evolutive neural architecture. Eur Symp Artif Neural Netw 129–134

  21. Hirose Y, Yamashita K, Hijita S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66

    Article  Google Scholar 

  22. Nabhan TM, Zomaya AY (1994) Toward generating neural network structures for function approximation. Neural Netw 7:89–99

    Article  MATH  Google Scholar 

  23. Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71:2831–2847

    Article  Google Scholar 

  24. Whitley D, Bogart C (1990) The evolution of connectivity: pruning neural networks using genetic algorithms. Proc Int J Conf Neural Netw 134–137

  25. Angeline PJ, Saunders GM, Pollack JB (1994) Evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5:54–65

    Article  MATH  Google Scholar 

  26. Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12:1386–1399

    Article  MATH  Google Scholar 

  27. Kolmogorov AN (1957) On the representational of continuous functions of many variables by superpositions of continuous functional of one variable and addition. Doklady Akademii Nauk URSS 114:953–956

    MathSciNet  Google Scholar 

  28. Cybenko G (1989) Approximation by superposition of a sigmoïdal function. Math Control Syst Signals 2:303–314

    Article  MathSciNet  MATH  Google Scholar 

  29. Funahashi K (1989) On the approximate realisation of continuous map** by neural networks. Neural Netw 2:183–192

    Article  MATH  Google Scholar 

  30. Hecht-Nielsen R (1987) Kolmogorov’s map** neural network existence theorem. IEEE First Ann Int Conf Neural Netw 3:11–13

    MATH  Google Scholar 

  31. Huang GB (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14:274–281

    Article  Google Scholar 

  32. Dreyfus G, Martinez JM, Samuelides M, Gordon MB, Badran F, Thiria S, Hérault L (2002) Réseaux de neurones: méthodologies et applications. Editions Eyrolles, Paris

    Google Scholar 

  33. Yao X (1993) Evolutionary artificial neural networks. Int J Neural Syst 4:203–222

    Article  Google Scholar 

  34. Huang DS, Du JX (2008) A constructive hybrid structure approach to solve routing problems. IEEE Trans Neural Netw 19:2099–2115

    Article  MATH  Google Scholar 

  35. Mantzaris D, Anastassopoulos G, Adamopoulos A (2011) Genetic algorithm pruning of probabilitic neural networks in medical disease estimation. Neural Netw 24:831–835

    Article  Google Scholar 

  36. Stathakis D (2009) How many hidden layers and nodes? Int J Remote Sens 30:2133–2147

    Article  Google Scholar 

  37. Yang SH, Chen YP (2012) An evolutionary constructive and pruning algorithm for artificial neural networks and its applications. Neurocomputing 86:140–149

    Article  Google Scholar 

  38. Maniezzo V (1994) Genetic evolution of the topology and weight distribution of neural networks. IEEE Trans Neural Netw 5:39–53

    Article  MATH  Google Scholar 

  39. Wilamowski BM, Cotton NJ, Kaynak O, Dundar G (2007) Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. Proc IEEE ISIE 3298–3789

  40. Puma-Villanueva WJ, Von Zuben FJ (2010) Evolving arbitrarily connected feedforward neural networks via genetic algorithm. Proc Brazilian Symp Artif Neural Netw 23–28

  41. Puma-Villanueva WJ, Dos Santos EP, Von Zuben FJ (2012) A constructive algorithm to synthesize arbitrarily connected feedforward neural networks. Neurocomputing 75:23–28

    Article  MATH  Google Scholar 

  42. Kwok TY, Yeung DY (1997) Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw 8:630–645

    Article  Google Scholar 

  43. Ash T (1989) Dynamic node creation in backpropagation networks. Connect Sci 1:365–375

    Article  Google Scholar 

  44. Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6:273–277

    Article  Google Scholar 

  45. Lee TC (1991) Neural network systems techniques and applications: algorithms and architectures. Academic Press, New-York

    Google Scholar 

  46. Weng W, Khorasani K (1996) An adaptive structural neural network with application to EEG automatic seizure detection. Neural Netw 9:1223–1240

    Article  Google Scholar 

  47. Prechelt L (1997) Investigation of the CasCor family of learning algorithms. Neural Netw 10:885–896

    Article  Google Scholar 

  48. Ma L, Khorasani K (2004) New training strategies for constructive neural networks with application to regression problems. Neural Netw 17:589–609

    Article  Google Scholar 

  49. Ma L, Khorasani K (2005) Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Trans Neural Netw 16:821–833

    Article  Google Scholar 

  50. Islam M, Sattar A, Amin F, Yao X, Murase K (2009) A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst Man Cybern Part B 39:1590–1605

    Article  Google Scholar 

  51. Bebis G, Georgiopoulos M (1994) Feed-forward neural networks: why network size is so important. IEEE Potentials 13:27–31

    Article  MATH  Google Scholar 

  52. Khosravi A, Nahavendi S, Creighton D (2010) A prediction interval-based approach to determine optimal structures of neural network metamodels. Expert Syst Appl 37:2377–2387

    Article  Google Scholar 

  53. Chauvin Y (1988) A back-propagation algorithm with optimal use of hidden units. In: Touretzky DS (ed) Advances in neural information processing, pp 519–526. http://books.nips.cc/papers/files/nips01/0519.pdf. Accessed 10 July 2013

  54. Weigend AS, Rumelhart DE, Huberman BA (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann R, Moody J, Touretzky DS (eds) Advances in neural information processing, pp 875–882. http://books.nips.cc/papers/files/nips03/0875.pdf. Accessed 10 July 2013

  55. Nowland SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493

    Article  Google Scholar 

  56. Larsen J, Hansen LK (1994) Generalization performance of regularized neural network models. IEEE workshop on neural network for signal processing, pp 42–51

  57. Leung CS, Wang HJ, Sum J (2010) On the selection of weight decay parameter for faulty networks. IEEE Trans Neural Netw 21:1232–1244

    Article  MATH  Google Scholar 

  58. Wang J, Wu W, Zurada JM (2012) Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty. Neural Netw 33:127–135

    Article  MATH  Google Scholar 

  59. Setiono R, Leow WK (2000) Pruned neural networks for regression. 6th Pacific RIM international conference on artificial intelligence PRICAI’00, pp 500–509

  60. Norgaard M (1996) System identification and control with neural networks, Ph.D. dissertation, Institute of Automation, Technical University of Denmark

  61. Thomas P, Bloch G (1998) Robust pruning for multilayer perceptrons. IMACS/IEEE multiconference on computational engineering in systems applications CESA’98, pp 17–22

  62. Leung CS, Wang HJ, Sum J, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14:147–174

    Article  Google Scholar 

  63. Tang HS, Xue ST, Chen R, Sato T (2007) \(\text{ H }\infty \) Filtering method for neural network training and pruning. J Comp Civ Eng 21:47–58

    Article  Google Scholar 

  64. Ricotti ME, Zio E (1999) Neural network approach to sensitivity and uncertainty analysis. Reliab Eng Syst Saf 64:59–71

    Article  Google Scholar 

  65. Zeng X, Yeung DS (2006) Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neurocomputing 69:825–837

    Article  Google Scholar 

  66. Chandrasekaran H, Chen HH, Maury MT (2000) Pruning of basis functions in nonlinear approximators. Neurocomputing 34:29–53

    Article  MATH  Google Scholar 

  67. Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a Fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17:273–293

    Article  Google Scholar 

  68. Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258

    Article  Google Scholar 

  69. Levin AU, Leen TK, Moody JE (1994) Fast pruning using principal components. In; Cowan J, Tesauro G, Alspector J (eds) Advances in neural information processing. http://books.nips.cc/papers/files/nips06/0035.pdf. Accessed 10 July 2013

  70. Medeiros CMS, Baretto GA (2013) A novel weight pruning method for MLP classifiers on the MAXCORE principle. Neural Comput Appl 22:71–84

    Article  Google Scholar 

  71. Liang X (2007) Removal of hidden neurons in MLP by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16:57–68

    Article  Google Scholar 

  72. Rivals I, Personnaz L (2003) Neural network construction and selection in nonlinear modeling. IEEE Trans Neural Netw 14:804–819

    Article  Google Scholar 

  73. Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16:57–67

    Article  Google Scholar 

  74. Hsu CF (2008) Adaptive growing and pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21:1153–1163

    Article  Google Scholar 

  75. Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680

    Article  Google Scholar 

  76. Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597

    Article  MathSciNet  Google Scholar 

  77. Hand D, Mannila H, Smyth P (2001) Principles of data mining. The MIT press, Cambridge

    Google Scholar 

  78. Thomas P, Bloch G (1997) Initialization of one hidden layer feed-forward neural networks for non-linear system identification. Proceedings of the 15th IMACS world congress on scientific computation, modelling and applied mathematics WC’97, pp 295–300

  79. Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int J Conf Neural Netw 3:21–26

    Google Scholar 

  80. Demuth H, Beale P (1994) Neural networks toolbox user’s guide V2.0. The MathWorks Inc., Natick

  81. Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101

    Article  MATH  Google Scholar 

  82. Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. Neural Netw Comput 2:40–48

    Google Scholar 

  83. Tarr G (1991) Multilayered feedforward networks for image segmentation, Ph.D. dissertation, Air Force Inst. Technol. Wright-Patterson AFB

  84. Ljung L (1987) System identification: theory for the user. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  85. Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the symposium on computer application and medical care, pp 261–265

  86. WEKA. http://weka.wikispaces.com/Datasets. Accessed 10 July 2013

  87. Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Digest 10:262–266

    Google Scholar 

  88. Noyel M, Thomas P, Charpentier P, Thomas A, Beaupretre B (2013) Improving production process performance thanks to neuronal analysis. Proceedings of 11th IFAC workshop on intelligent manufacturing system IMS’13

  89. Thomas P, Thomas A (2008) Elagage d’un perceptron multicouches: utilisation de l’analyse de la variance de la sensibilité des paramètres. \(5^{{\grave{\rm e}}{\rm me}}\) Conférence Internationale Francophone d’Automatique CIFA’08

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Thomas.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thomas, P., Suhner, MC. A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications. Neural Process Lett 42, 437–458 (2015). https://doi.org/10.1007/s11063-014-9366-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-014-9366-5

Keywords

Navigation