Abstract
Current deep neural networks suffer from two problems; first, they are hard to interpret, and second, they suffer from overfitting. There have been many attempts to define interpretability in neural networks, but they typically lack causality or generality. A myriad of regularization techniques have been developed to prevent overfitting, and this has driven deep learning to become the hot topic it is today; however, while most regularization techniques are justified empirically and even intuitively, there is not much underlying theory. This paper argues that to extract the features used in neural networks to make decisions, it’s important to look at the paths between clusters existing in the hidden spaces of neural networks. These features are of particular interest because they reflect the true decision-making process of the neural network. This analysis is then furthered to present an ensemble algorithm for arbitrary neural networks which has guarantees for test accuracy. Finally, a discussion detailing the aforementioned guarantees is introduced and the implications to neural networks, including an intuitive explanation for all current regularization methods, are presented. The ensemble algorithm has generated state-of-the-art results for Wide-ResNets on CIFAR-10 (top 5 for all models) and has improved test accuracy for all models it has been applied to.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. CoRR abs/1805.09501 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Geifman, Y.: cifar-vgg (2018). https://github.com/geifmany/cifar-vgg
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000)
Ju, C., Bibaut, A., van der Laan, M.J.: The relative performance of ensemble methods with deep convolutional neural networks for image classification. CoRR abs/1704.01664 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6, Article 25 (2007)
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Liu, X., Wang, X., Matwin, S.: Interpretable deep convolutional neural networks via meta-learning. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–9 (2018)
Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill (2017). https://doi.org/10.23915/distill.00007. https://distill.pub/2017/feature-visualization
Schwenk, H., Bengio, Y.: Boosting neural networks. Neural Comput. 12, 1869–1887 (2000)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Steinhaus, H.: Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci. Cl. III 4, 801–804 (1957)
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI (2016)
Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953). https://doi.org/10.1007/BF02289263
Tibshirani, R.: Regression shrinkage and selection via the lasso (1994)
Vapnik, V.N.: The nature of statistical learning theory. In: Statistics for Engineering and Information Science (2000)
Yamada, Y., Iwamura, M., Akiba, T., Kise, K.: Shakedrop regularization for deep residual learning (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. CoRR abs/1605.07146 (2016)
Acknowledgments
I like to call this project the Miracle Project because it’s a true miracle that this project happened...there are too many people to thank. First, thanks to my mom, dad, and brother. Second, thanks to Professors Barnabs Pczos and Majd Sakr. Third, thanks to Theo Yannekis, Michael Tai, Max Mirho, Josh Li, Zach Pan, Sheel Kundu, Shan Wang, Luis Ren Estrella, Eric Mi, Arthur Micha, Rich Zhu, Carter Shaojie Zhang, Eric Hong, Kevin Dougherty, Catherine Zhang, Marisa DelSignore, Elaine Xu, David Skrovanek, Anrey Peng, Bobby Upton, Angelia Wang, and Frank Li. You guys are the real heroes of this project. And, lastly, thanks to you, my dear reader.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tao, S. (2019). Deep Neural Network Ensembles. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-37599-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)