Log in

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Fostered by technological and theoretical developments, deep neural networks (DNNs) have achieved great success in many applications, but their training via mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. The computational cost is exacerbated by the inefficiency of the uniform sampling typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal, making the case for the study of improved methods to train DNNs. A better strategy is to sample the training instances under a distribution where the probability of being selected is proportional to the relevance of each individual instance; one way to achieve this is through importance sampling (IS), which minimizes the gradients’ variance w.r.t. the network parameters, consequently improving convergence. In this paper, an IS-based adaptive sampling method to improve the training of DNNs is introduced. This method exploits side information to construct the optimal sampling distribution and is dubbed regularized adaptive sampling (RAS). Experimental comparison using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets shows that when compared against SGD and against another sampling method in the state of the art, RAS produces improvements in the speed and variance of the training process without incurring significant overhead or affecting the classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and material

Data are available from the authors upon reasonable request.

Notes

  1. The term deep network may allude to small networks with relatively few parameters presented in more than one hidden layer. What we call full-scale models refers to DNNs with around a dozen layers and typically millions of parameters.

  2. In this equation and throughout this work, \(\mathbb {E}\) is used to represent mathematical expectation.

  3. This base-line is harder to establish because of the problem complexity, the hyper-parameters and advanced techniques that can be used, but in the literature we encountered similar experiments with results around this value.

  4. https://pytorch.org/.

  5. https://pytorch.org/docs/stable/optim.html#algorithms.

  6. Slow convergence as well as large oscillations of the loss function are considered inconsistent with the desired behavior, even if the final test accuracy is good. In other words, the whole training process is examined, not just the end result.

References

  • Abohashima Z, Elhosen M, Houssein EH, Mohamed WM (2020) Classification with quantum machine learning: a survey 2020. https://doi.org/10.48550/ar**v.2006.12270

  • Ajagekar A, You F (2020) Quantum computing assisted deep learning for fault detection and diagnosis in industrial process systems. Comput Chem Eng 143:107119

    Article  Google Scholar 

  • Al-Waisy AS, Al-Fahdawi S, Mohammed MA, Abdulkareem KH, Mostafa SA, Maashi MS, Arif M, Garcia-Zapirain B (2020) COVID-CheXNet: hybrid deep learning framework for identifying COVID-19 virus in chest X-rays images. Soft Comput 1–16

  • Alain G, Lamb A, Sankar C, Courville A, Bengio Y (2015) Variance reduction in SGD by distributed importance sampling. ar**v:1511.06481

  • Alkadi R, Taher F, El-Baz A, Werghi N (2019) A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J Digit Imaging 32(5):793–807

    Article  Google Scholar 

  • Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AAS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. ar**v:1803.01164

  • Avalos-López JI, Rojas-Domínguez A, Ornelas-Rodríguez M, Carpio M, Valdez SI (2021) Efficient training of deep learning models through improved adaptive sampling. In: Roman-Rangel E, Kuri-Morales ÁF, Martínez-Trinidad JF, Carrasco-Ochoa JA, Olvera-López JA (eds) Pattern recognition. Springer, Cham, pp 141–152

    Chapter  Google Scholar 

  • Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48

  • Bouchard G, Trouillon T, Perez J, Gaidon A (2015) Online learning to sample. ar**v:1506.09016

  • Cerezo M, Coles PJ (2021) Higher order derivatives of quantum neural networks with barren plateaus. Quantum Sci Technol 6(3):035006

    Article  Google Scholar 

  • Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat Commun 12(1):1–12

    Article  Google Scholar 

  • Choudhary M, Tiwari V, Venkanna U (2020) Enhancing human iris recognition performance in unconstrained environment using ensemble of convolutional and residual deep neural network models. Soft Comput 24(15):11477–11491

    Article  Google Scholar 

  • Faghri F, Tabrizian I, Markov I, Alistarh D, Roy D, Ramezani-Kebrya A (2020) Adaptive gradient quantization for data-parallel SGD. ar**v:2010.12460

  • Fan Y, Tian F, Qin T, Bian J, Liu TY (2017) Learning what data to learn. ar**v:1702.08635

  • Gibbons JD, Chakraborti S (2020) Nonparametric statistical inference. CRC Press, London

    Book  MATH  Google Scholar 

  • Gopal S (2016) Adaptive sampling for SGD by exploiting side information. In: International conference on machine learning, PMLR, pp 364–372

  • Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212

    Article  Google Scholar 

  • Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596

    Article  Google Scholar 

  • Hill PD (1985) Kernel estimation of a distribution function. Commun Stat Theory Methods 14(3):605–620

    Article  MathSciNet  Google Scholar 

  • Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, Hoboken

    MATH  Google Scholar 

  • Jiang AH, Wong DLK, Zhou G, Andersen DG, Dean J, Ganger GR, Joshi G, Kaminksy M, Kozuch M, Lipton ZC, et al (2019) Accelerating deep learning by focusing on the biggest losers. ar**v:191000762

  • Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Adv Neural Inf Process Syst 31:1–11

    Google Scholar 

  • Joseph K, Singh K, Balasubramanian VN, et al (2019) Submodular batch selection for training deep neural networks. ar**v:190608771

  • Katharopoulos A, Fleuret F (2017) Biased importance sampling for deep neural network training. ar**v:1706.00043

  • Katharopoulos A, Fleuret F (2018) Not all samples are created equal: deep learning with importance sampling. In: International conference on machine learning, PMLR, pp 2525–2534

  • Kawaguchi K, et al (2020) On optimization and scalability in deep learning. PhD thesis, Massachusetts Institute of Technology

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980

  • Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html

  • Lamata L (2020) Quantum machine learning and quantum biomimetics: a perspective. Mach Learn Sci Technol 1(3):033002

    Article  MathSciNet  Google Scholar 

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Liu H, Lang B (2019) Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci 9(20):4396

    Article  Google Scholar 

  • Liu J, Lim KH, Wood KL, Huang W, Guo C, Huang HL (2021) Hybrid quantum-classical convolutional neural networks. Sci China Phys Mech Astron 64(9):1–8

    Article  Google Scholar 

  • Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. ar**v:1511.06343

  • Mari A, Bromley TR, Izaac J, Schuld M, Killoran N (2020) Transfer learning in hybrid classical-quantum neural networks. Quantum 4:340

    Article  Google Scholar 

  • Owen A, Zhou Y (2000) Safe and effective importance sampling. J Am Stat Assoc 95(449):135–143

    Article  MathSciNet  MATH  Google Scholar 

  • Saggio V, Asenbeck BE, Hamann A, Strömberg T, Schiansky P, Dunjko V, Friis N, Harris NC, Hochberg M, Englund D et al (2021) Experimental quantum speed-up in reinforcement learning agents. Nature 591(7849):229–233

    Article  Google Scholar 

  • Santiago C, Barata C, Sasdelli M, Carneiro G, Nascimento JC (2021) Low: training deep neural networks by learning optimal sample weights. Pattern Recogn 110:107585

    Article  Google Scholar 

  • Schuld M (2021) Supervised quantum machine learning models are kernel methods. ar**v:2101.11020

  • Schuld M, Petruccione F (2021) Machine learning with quantum computers. Springer, Cham

    Book  MATH  Google Scholar 

  • Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556

  • Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. ar**v:1711.00489

  • Sweke R, Wilde F, Meyer J, Schuld M, Fährmann PK, Meynard-Piganeau B, Eisert J (2020) Stochastic gradient descent for hybrid quantum-classical optimization. Quantum 4:314

    Article  Google Scholar 

  • Thakur D, Biswas S (2020) Smartphone based human activity monitoring and recognition using ML and DL: a comprehensive survey. J Ambient Intell Human Comput pp 1–12

  • Tokdar ST, Kass RE (2010) Importance sampling: a review. Wiley Interdiscip Rev Comput Stat 2(1):54–60

    Article  Google Scholar 

  • Varela-Santos S, Melin P (2021) A new approach for classifying coronavirus COVID-19 based on its manifestation on chest X-rays using texture features and neural networks. Inf Sci 545:403–414

    Article  MathSciNet  Google Scholar 

  • Wang X (2020) Example weighting for deep representation learning. PhD thesis, Queen’s University Belfast

  • Wang L, Ye J, Zhao Y, Wu W, Li A, Song SL, Xu Z, Kraska T (2018) Superneurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming, pp 41–53

  • Wang M, Fu W, He X, Hao S, Wu X (2022) A survey on large-scale machine learning. IEEE Trans Knowl Data Eng 34(6):2574–2594. https://doi.org/10.1109/TKDE.2020.3015777

  • Wiebe N, Kapoor A, Svore KM (2015) Quantum deep learning. ar**v:1412.3489

  • Wu CY, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848

  • Yao X, Wang X, Wang SH, Zhang YD (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multimed Tools Appl 2020: 1–45

  • Zahorodko PV, Modlo YO, Kalinichenko OO, Selivanova TV, Semerikov S (2021) Quantum enhanced machine learning: an overview. CEUR workshop proceedings

  • Zeiler MD (2012) Adadelta: an adaptive learning rate method. ar**v:1212.5701

  • Zhao P, Zhang T (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v:1405.3080

  • Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. In: International conference on machine learning, PMLR, pp 1–9

Download references

Funding

This work was partially supported by The National Council of Science and Technology of Mexico (CONACYT) through grants: CÁTEDRAS-2598 (A. Rojas) and CÁTEDRAS-7795 (S.I. Valdez).

Author information

Authors and Affiliations

Authors

Contributions

A.R.D. conceived the study, participated in the design of the algorithms and implementation of the models; carried out computational experiments, analyzed experimental results, and wrote the paper with the other authors. M.O.R. participated in the design of the study and initial versions of the manuscript. M.C. participated in the design of the algorithms and analyzed the results. S.I.V. participated throughout in the preparation of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Martín Carpio.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval and consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Additional information

Communicated by Oscar Castillo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rojas-Domínguez, A., Valdez, S.I., Ornelas-Rodríguez, M. et al. Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling. Soft Comput 27, 13237–13253 (2023). https://doi.org/10.1007/s00500-022-07131-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07131-7

Keywords

Navigation