Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Rojas-Domínguez, Alfonso; Valdez, S. Ivvan; Ornelas-Rodríguez, Manuel; Carpio, Martín

doi:10.1007/s00500-022-07131-7

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Focus
Published: 07 May 2022

Volume 27, pages 13237–13253, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

137 Accesses
Explore all metrics

Abstract

Fostered by technological and theoretical developments, deep neural networks (DNNs) have achieved great success in many applications, but their training via mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. The computational cost is exacerbated by the inefficiency of the uniform sampling typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal, making the case for the study of improved methods to train DNNs. A better strategy is to sample the training instances under a distribution where the probability of being selected is proportional to the relevance of each individual instance; one way to achieve this is through importance sampling (IS), which minimizes the gradients’ variance w.r.t. the network parameters, consequently improving convergence. In this paper, an IS-based adaptive sampling method to improve the training of DNNs is introduced. This method exploits side information to construct the optimal sampling distribution and is dubbed regularized adaptive sampling (RAS). Experimental comparison using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets shows that when compared against SGD and against another sampling method in the state of the art, RAS produces improvements in the speed and variance of the training process without incurring significant overhead or affecting the classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Efficient Training of Deep Learning Models Through Improved Adaptive Sampling

Markov chain importance sampling for minibatches

Article 20 December 2023

Performance Improvement of Convolutional Neural Network Using Random Under Sampling

Availability of data and material

Data are available from the authors upon reasonable request.

Notes

The term deep network may allude to small networks with relatively few parameters presented in more than one hidden layer. What we call full-scale models refers to DNNs with around a dozen layers and typically millions of parameters.
In this equation and throughout this work, \(\mathbb {E}\) is used to represent mathematical expectation.
This base-line is harder to establish because of the problem complexity, the hyper-parameters and advanced techniques that can be used, but in the literature we encountered similar experiments with results around this value.
https://pytorch.org/.
https://pytorch.org/docs/stable/optim.html#algorithms.
Slow convergence as well as large oscillations of the loss function are considered inconsistent with the desired behavior, even if the final test accuracy is good. In other words, the whole training process is examined, not just the end result.

References

Abohashima Z, Elhosen M, Houssein EH, Mohamed WM (2020) Classification with quantum machine learning: a survey 2020. https://doi.org/10.48550/ar**v.2006.12270
Ajagekar A, You F (2020) Quantum computing assisted deep learning for fault detection and diagnosis in industrial process systems. Comput Chem Eng 143:107119
Article Google Scholar
Al-Waisy AS, Al-Fahdawi S, Mohammed MA, Abdulkareem KH, Mostafa SA, Maashi MS, Arif M, Garcia-Zapirain B (2020) COVID-CheXNet: hybrid deep learning framework for identifying COVID-19 virus in chest X-rays images. Soft Comput 1–16
Alain G, Lamb A, Sankar C, Courville A, Bengio Y (2015) Variance reduction in SGD by distributed importance sampling. ar**v:1511.06481
Alkadi R, Taher F, El-Baz A, Werghi N (2019) A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J Digit Imaging 32(5):793–807
Article Google Scholar
Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AAS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. ar**v:1803.01164
Avalos-López JI, Rojas-Domínguez A, Ornelas-Rodríguez M, Carpio M, Valdez SI (2021) Efficient training of deep learning models through improved adaptive sampling. In: Roman-Rangel E, Kuri-Morales ÁF, Martínez-Trinidad JF, Carrasco-Ochoa JA, Olvera-López JA (eds) Pattern recognition. Springer, Cham, pp 141–152
Chapter Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48
Bouchard G, Trouillon T, Perez J, Gaidon A (2015) Online learning to sample. ar**v:1506.09016
Cerezo M, Coles PJ (2021) Higher order derivatives of quantum neural networks with barren plateaus. Quantum Sci Technol 6(3):035006
Article Google Scholar
Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat Commun 12(1):1–12
Article Google Scholar
Choudhary M, Tiwari V, Venkanna U (2020) Enhancing human iris recognition performance in unconstrained environment using ensemble of convolutional and residual deep neural network models. Soft Comput 24(15):11477–11491
Article Google Scholar
Faghri F, Tabrizian I, Markov I, Alistarh D, Roy D, Ramezani-Kebrya A (2020) Adaptive gradient quantization for data-parallel SGD. ar**v:2010.12460
Fan Y, Tian F, Qin T, Bian J, Liu TY (2017) Learning what data to learn. ar**v:1702.08635
Gibbons JD, Chakraborti S (2020) Nonparametric statistical inference. CRC Press, London
Book MATH Google Scholar
Gopal S (2016) Adaptive sampling for SGD by exploiting side information. In: International conference on machine learning, PMLR, pp 364–372
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212
Article Google Scholar
Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596
Article Google Scholar
Hill PD (1985) Kernel estimation of a distribution function. Commun Stat Theory Methods 14(3):605–620
Article MathSciNet Google Scholar
Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, Hoboken
MATH Google Scholar
Jiang AH, Wong DLK, Zhou G, Andersen DG, Dean J, Ganger GR, Joshi G, Kaminksy M, Kozuch M, Lipton ZC, et al (2019) Accelerating deep learning by focusing on the biggest losers. ar**v:191000762
Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Adv Neural Inf Process Syst 31:1–11
Google Scholar
Joseph K, Singh K, Balasubramanian VN, et al (2019) Submodular batch selection for training deep neural networks. ar**v:190608771
Katharopoulos A, Fleuret F (2017) Biased importance sampling for deep neural network training. ar**v:1706.00043
Katharopoulos A, Fleuret F (2018) Not all samples are created equal: deep learning with importance sampling. In: International conference on machine learning, PMLR, pp 2525–2534
Kawaguchi K, et al (2020) On optimization and scalability in deep learning. PhD thesis, Massachusetts Institute of Technology
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. ar**v:1412.6980
Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html
Lamata L (2020) Quantum machine learning and quantum biomimetics: a perspective. Mach Learn Sci Technol 1(3):033002
Article MathSciNet Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Liu H, Lang B (2019) Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci 9(20):4396
Article Google Scholar
Liu J, Lim KH, Wood KL, Huang W, Guo C, Huang HL (2021) Hybrid quantum-classical convolutional neural networks. Sci China Phys Mech Astron 64(9):1–8
Article Google Scholar
Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. ar**v:1511.06343
Mari A, Bromley TR, Izaac J, Schuld M, Killoran N (2020) Transfer learning in hybrid classical-quantum neural networks. Quantum 4:340
Article Google Scholar
Owen A, Zhou Y (2000) Safe and effective importance sampling. J Am Stat Assoc 95(449):135–143
Article MathSciNet MATH Google Scholar
Saggio V, Asenbeck BE, Hamann A, Strömberg T, Schiansky P, Dunjko V, Friis N, Harris NC, Hochberg M, Englund D et al (2021) Experimental quantum speed-up in reinforcement learning agents. Nature 591(7849):229–233
Article Google Scholar
Santiago C, Barata C, Sasdelli M, Carneiro G, Nascimento JC (2021) Low: training deep neural networks by learning optimal sample weights. Pattern Recogn 110:107585
Article Google Scholar
Schuld M (2021) Supervised quantum machine learning models are kernel methods. ar**v:2101.11020
Schuld M, Petruccione F (2021) Machine learning with quantum computers. Springer, Cham
Book MATH Google Scholar
Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ar**v:1409.1556
Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. ar**v:1711.00489
Sweke R, Wilde F, Meyer J, Schuld M, Fährmann PK, Meynard-Piganeau B, Eisert J (2020) Stochastic gradient descent for hybrid quantum-classical optimization. Quantum 4:314
Article Google Scholar
Thakur D, Biswas S (2020) Smartphone based human activity monitoring and recognition using ML and DL: a comprehensive survey. J Ambient Intell Human Comput pp 1–12
Tokdar ST, Kass RE (2010) Importance sampling: a review. Wiley Interdiscip Rev Comput Stat 2(1):54–60
Article Google Scholar
Varela-Santos S, Melin P (2021) A new approach for classifying coronavirus COVID-19 based on its manifestation on chest X-rays using texture features and neural networks. Inf Sci 545:403–414
Article MathSciNet Google Scholar
Wang X (2020) Example weighting for deep representation learning. PhD thesis, Queen’s University Belfast
Wang L, Ye J, Zhao Y, Wu W, Li A, Song SL, Xu Z, Kraska T (2018) Superneurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming, pp 41–53
Wang M, Fu W, He X, Hao S, Wu X (2022) A survey on large-scale machine learning. IEEE Trans Knowl Data Eng 34(6):2574–2594. https://doi.org/10.1109/TKDE.2020.3015777
Wiebe N, Kapoor A, Svore KM (2015) Quantum deep learning. ar**v:1412.3489
Wu CY, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848
Yao X, Wang X, Wang SH, Zhang YD (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multimed Tools Appl 2020: 1–45
Zahorodko PV, Modlo YO, Kalinichenko OO, Selivanova TV, Semerikov S (2021) Quantum enhanced machine learning: an overview. CEUR workshop proceedings
Zeiler MD (2012) Adadelta: an adaptive learning rate method. ar**v:1212.5701
Zhao P, Zhang T (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. ar**v:1405.3080
Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. In: International conference on machine learning, PMLR, pp 1–9

Download references

Funding

This work was partially supported by The National Council of Science and Technology of Mexico (CONACYT) through grants: CÁTEDRAS-2598 (A. Rojas) and CÁTEDRAS-7795 (S.I. Valdez).

Author information

Authors and Affiliations

Tecnológico Nacional de México/Instituto Tecnológico de León, 37290, León, Guanajuato, Mexico
Alfonso Rojas-Domínguez, Manuel Ornelas-Rodríguez & Martín Carpio
CONACYT-Centro de Investigación en Ciencias de Información Geoespacial, CENTROGEO, A.C., C.P., 76703, Querétaro, Mexico
S. Ivvan Valdez

Authors

Alfonso Rojas-Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
S. Ivvan Valdez
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Ornelas-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Martín Carpio
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.R.D. conceived the study, participated in the design of the algorithms and implementation of the models; carried out computational experiments, analyzed experimental results, and wrote the paper with the other authors. M.O.R. participated in the design of the study and initial versions of the manuscript. M.C. participated in the design of the algorithms and analyzed the results. S.I.V. participated throughout in the preparation of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Martín Carpio.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval and consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Additional information

Communicated by Oscar Castillo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rojas-Domínguez, A., Valdez, S.I., Ornelas-Rodríguez, M. et al. Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling. Soft Comput 27, 13237–13253 (2023). https://doi.org/10.1007/s00500-022-07131-7

Download citation

Accepted: 07 April 2022
Published: 07 May 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00500-022-07131-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Training of Deep Learning Models Through Improved Adaptive Sampling

Markov chain importance sampling for minibatches

Performance Improvement of Convolutional Neural Network Using Random Under Sampling

Availability of data and material

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Training of Deep Learning Models Through Improved Adaptive Sampling

Markov chain importance sampling for minibatches

Performance Improvement of Convolutional Neural Network Using Random Under Sampling

Availability of data and material

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation