Abstract
Training of Deep Neural Networks (DNNs) is very computationally demanding and resources are typically spent on training-instances that do not provide the most benefit to a network’s learning; instead, the most relevant instances should be prioritized during training. Herein we present an improved version of the Adaptive Sampling (AS) method (Gopal, 2016) extended for the training of DNNs. As our main contribution we formulate a probability distribution for data instances that minimizes the variance of the gradient-norms w.r.t. the network’s loss function. Said distribution is combined with the optimal distribution for the data classes previously derived by Gopal and the improved AS is used to replace uniform sampling with the objective of accelerating the training of DNNs. Our proposal is comparatively evaluated against uniform sampling and against Online Batch Selection (Loshchilov & Hutter, 2015). Results from training a Convolutional Neural Network on the MNIST dataset with the Adadelta and Adam optimizers over different training batch-sizes show the effectiveness and superiority of our proposal.
A. Rojas-Domínguez and S. Ivvan Valdez—CONACYT Research Fellow.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alom, M.Z., et al.: The history began from AlexNet: a comprehensive survey on deep learning approaches. ar**v:1803.01164 (2018)
Wang, L, et al.: Superneurons: dynamic GPU memory management for training deep neural networks. In: 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 41–53 (2018)
Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: International Conference on Machine Learning, PMLR, pp. 2525–2534 (2018)
Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks ar**v:1511.06343 (2015)
Gopal, S.: Adaptive sampling for SGD by exploiting side information. In: International Conference on Machine Learning, PMLR, pp. 364–372 (2016)
Fan, Y., Tian, F., Qin, T., Bian, J., Liu, T.Y.: Learning what data to learn. ar**v:1702.08635 (2017)
Alain, G., Lamb, A., Sankar, C., Courville, A., Bengio, Y.: Variance reduction in SGD by distributed importance sampling. ar**v:1511.06481 (2015)
Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, PMLR, pp. 1–9 (2015)
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. ar**v:1310.5715 (2013)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ar**v:1511.05952 (2015)
Katharopoulos, A., Fleuret, F.: Biased importance sampling for deep neural network training. ar**v:1706.00043 (2017)
Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size. ar**v:1711.00489 (2017)
Joseph, K.J., Singh, K., Balasubramanian, V.N.: Submodular batch selection for training deep neural networks. ar**v:1906.08771 (2019)
Zhao, P., Zhang, T.: Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v:1405.3080 (2014)
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)
Bouchard, G., Trouillon, T., Perez, J., Gaidon, A.: Online learning to sample. ar**v:1506.09016 (2015)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. ar**v:1212.5701.19 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ar**v:1412.6980 (2014)
Acknowledgment
This work was partially supported by the National Council of Science and Technology (CONACYT) of Mexico, through Postgraduate Scholarship: 747189 (J. Ávalos) and Research Grants: cátedras -2598 (A. Rojas) and cátedras-7795 (S.I. Valdez).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Avalos-López, J.I., Rojas-Domínguez, A., Ornelas-Rodríguez, M., Carpio, M., Valdez, S.I. (2021). Efficient Training of Deep Learning Models Through Improved Adaptive Sampling. In: Roman-Rangel, E., Kuri-Morales, Á.F., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2021. Lecture Notes in Computer Science(), vol 12725. Springer, Cham. https://doi.org/10.1007/978-3-030-77004-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-77004-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77003-7
Online ISBN: 978-3-030-77004-4
eBook Packages: Computer ScienceComputer Science (R0)