Efficient Training of Deep Learning Models Through Improved Adaptive Sampling

  • Conference paper
  • First Online:
Pattern Recognition (MCPR 2021)

Abstract

Training of Deep Neural Networks (DNNs) is very computationally demanding and resources are typically spent on training-instances that do not provide the most benefit to a network’s learning; instead, the most relevant instances should be prioritized during training. Herein we present an improved version of the Adaptive Sampling (AS) method (Gopal, 2016) extended for the training of DNNs. As our main contribution we formulate a probability distribution for data instances that minimizes the variance of the gradient-norms w.r.t. the network’s loss function. Said distribution is combined with the optimal distribution for the data classes previously derived by Gopal and the improved AS is used to replace uniform sampling with the objective of accelerating the training of DNNs. Our proposal is comparatively evaluated against uniform sampling and against Online Batch Selection (Loshchilov & Hutter, 2015). Results from training a Convolutional Neural Network on the MNIST dataset with the Adadelta and Adam optimizers over different training batch-sizes show the effectiveness and superiority of our proposal.

A. Rojas-Domínguez and S. Ivvan Valdez—CONACYT Research Fellow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://colab.research.google.com/.

References

  1. Alom, M.Z., et al.: The history began from AlexNet: a comprehensive survey on deep learning approaches. ar**v:1803.01164 (2018)

  2. Wang, L, et al.: Superneurons: dynamic GPU memory management for training deep neural networks. In: 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 41–53 (2018)

    Google Scholar 

  3. Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: International Conference on Machine Learning, PMLR, pp. 2525–2534 (2018)

    Google Scholar 

  4. Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks ar**v:1511.06343 (2015)

  5. Gopal, S.: Adaptive sampling for SGD by exploiting side information. In: International Conference on Machine Learning, PMLR, pp. 364–372 (2016)

    Google Scholar 

  6. Fan, Y., Tian, F., Qin, T., Bian, J., Liu, T.Y.: Learning what data to learn. ar**v:1702.08635 (2017)

  7. Alain, G., Lamb, A., Sankar, C., Courville, A., Bengio, Y.: Variance reduction in SGD by distributed importance sampling. ar**v:1511.06481 (2015)

  8. Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, PMLR, pp. 1–9 (2015)

    Google Scholar 

  9. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. ar**v:1310.5715 (2013)

  10. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ar**v:1511.05952 (2015)

  11. Katharopoulos, A., Fleuret, F.: Biased importance sampling for deep neural network training. ar**v:1706.00043 (2017)

  12. Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size. ar**v:1711.00489 (2017)

  13. Joseph, K.J., Singh, K., Balasubramanian, V.N.: Submodular batch selection for training deep neural networks. ar**v:1906.08771 (2019)

  14. Zhao, P., Zhang, T.: Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v:1405.3080 (2014)

  15. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)

    Google Scholar 

  16. Bouchard, G., Trouillon, T., Perez, J., Gaidon, A.: Online learning to sample. ar**v:1506.09016 (2015)

  17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  18. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. ar**v:1212.5701.19 (2012)

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ar**v:1412.6980 (2014)

Download references

Acknowledgment

This work was partially supported by the National Council of Science and Technology (CONACYT) of Mexico, through Postgraduate Scholarship: 747189 (J. Ávalos) and Research Grants: cátedras -2598 (A. Rojas) and cátedras-7795 (S.I. Valdez).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Ivan Avalos-López .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Avalos-López, J.I., Rojas-Domínguez, A., Ornelas-Rodríguez, M., Carpio, M., Valdez, S.I. (2021). Efficient Training of Deep Learning Models Through Improved Adaptive Sampling. In: Roman-Rangel, E., Kuri-Morales, Á.F., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2021. Lecture Notes in Computer Science(), vol 12725. Springer, Cham. https://doi.org/10.1007/978-3-030-77004-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77004-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77003-7

  • Online ISBN: 978-3-030-77004-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation