Machine Learning and Deep Learning Algorithms

  • Chapter
  • First Online:
Natural Language Processing in Biomedicine

Part of the book series: Cognitive Informatics in Biomedicine and Healthcare ((CIBH))

  • 112 Accesses

Abstract

This chapter provides a brief overview of how machine learning and deep learning algorithms are trained for biomedical natural language processing tasks. The contents of this chapter will be familiar to readers who have previously studied machine learning and deep learning methods. It discusses design of inputs and outputs for machine learning models, training algorithms including gradient descent, feature-based methods including logistic regression and decision trees, and deep-learning methods including convolutional, recurrent, and transformer networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2013.

    Google Scholar 

  2. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.

    Google Scholar 

  3. Burkov A. The hundred-page machine learning book; 2019.

    Google Scholar 

  4. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Yee Whye T, Mike T, editors. Proceedings of the thirteenth international conference on artificial intelligence and statistics. In: Proceedings of machine learning research. PMLR; 2010. p. 249–56.

    Google Scholar 

  5. Saxe A, McClelland J, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: International conference on learning representations 2014; 2014.

    Google Scholar 

  6. Kingma DP, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations, ICLR 2015. San Diego, CA, USA; 2017.

    Google Scholar 

  7. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res. 2017;18(1):6765–816.

    Google Scholar 

  8. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, et al. Google’s neural machine translation system: bridging the gap between human and machine translation. ar**v preprint ar**v:160908144. 2016.

    Google Scholar 

  9. Kudo T, Richardson J. Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. Brussels, Belgium: Association for Computational Linguistics; 2018. p. 66–71.

    Google Scholar 

  10. Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36. https://doi.org/10.1136/jamia.2009.002733.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, et al. Multi-domain clinical natural language processing with MedCAT: the medical concept annotation toolkit. Artif Intell Med. 2021;117: 102083. https://doi.org/10.1016/j.artmed.2021.102083.

    Article  PubMed  Google Scholar 

  12. Lindberg DAB, Humphreys BL, McCray AT. The unified medical language system. Yearb Med Inform. 1993;02(01):41–51. https://doi.org/10.1055/s-0038-1637976.

    Article  Google Scholar 

  13. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, vol. 2. Lake Tahoe, Nevada: Curran Associates Inc.; 2013. p. 3111–9.

    Google Scholar 

  14. Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.

    Google Scholar 

  15. Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H. Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol. 1. Long papers; 2015. p. 1681–91.

    Google Scholar 

  16. Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math Program. 1989;45(1–3):503–28.

    Article  Google Scholar 

  17. Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory; 1992. p. 144–52.

    Google Scholar 

  18. Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: primal estimated sub-gradient solver for SVM. Math Program. 2011;127(1):3–30. https://doi.org/10.1007/s10107-010-0420-4.

    Article  Google Scholar 

  19. Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S. A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on machine learning; 2008. p. 408–15.

    Google Scholar 

  20. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  21. Salzberg SL. C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn. 1994;16(3):235–40. https://doi.org/10.1007/BF00993309

  22. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Taylor & Francis; 1984.

    Google Scholar 

  23. Bentley JL. Multidimensional binary search trees used for associative searching. Commun ACM. 1975;18(9):509–17. https://doi.org/10.1145/361002.361007.

    Article  Google Scholar 

  24. Omohundro SM. Five balltree construction algorithms. In: International computer science Institute Berkeley; 1989.

    Google Scholar 

  25. Jayaram Subramanya S, Devvrit F, Simhadri HV, Krishnawamy R, Kadekodi R. Diskann: fast accurate billion-point nearest neighbor search on a single node. Adv Neural Inf Process Syst. 2019;32.

    Google Scholar 

  26. Guo R, Sun P, Lindgren E, Geng Q, Simcha D, Chern F, et al. Accelerating large-scale inference with anisotropic vector quantization. In: Hal D, III, Aarti S, editors. Proceedings of the 37th international conference on machine learning. In: Proceedings of machine learning research. PMLR; 2020. p. 3887–96.

    Google Scholar 

  27. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Geoffrey G, David D, Miroslav D, editors. Proceedings of the fourteenth international conference on artificial intelligence and statistics. In: Proceedings of machine learning research. PMLR; 2011. p. 315–23.

    Google Scholar 

  28. Hendrycks D, Gimpel K. Gaussian error linear units (gelus). ar**v preprint ar**v:160608415. 2016.

    Google Scholar 

  29. Ramachandran P, Zoph B, Le QV. Searching for activation functions. ar**v preprint ar**v:171005941. 2017.

    Google Scholar 

  30. Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In: Proceedings of the 2nd international conference on learning representations. Banff, AB, Canada; 2013.

    Google Scholar 

  31. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 770–8.

    Google Scholar 

  32. Cho K, van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of SSST-8, eighth workshop on syntax, semantics and structure in statistical translation. Doha, Qatar; 2014. p. 103–11.

    Google Scholar 

  33. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  CAS  PubMed  Google Scholar 

  34. Akbik A, Blythe DAJ, Vollgraf R. Contextual string embeddings for sequence labeling. In: International conference on computational linguistics; 2018.

    Google Scholar 

  35. Le QV, Jaitly N, Hinton GE. A Simple way to initialize recurrent networks of rectified linear units. Ar**v. 2015;abs/1504.00941.

    Google Scholar 

  36. Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81. https://doi.org/10.1109/78.650093.

    Article  Google Scholar 

  37. Ba JL, Kiros JR, Hinton GE. Layer normalization. ar**v preprint ar**v:160706450. 2016.

    Google Scholar 

  38. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2021;3(1):Article 2. https://doi.org/10.1145/3458754.

  39. Wang B, Shang L, Lioma C, Jiang X, Yang H, Liu Q, et al. On position embeddings in BERT. In: International conference on learning representations. Vienna, Austria; 2021.

    Google Scholar 

  40. Beltagy I, Peters ME, Cohan A. Longformer: the long-document transformer. ar**v preprint ar**v:200405150. 2020.

    Google Scholar 

  41. Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, et al. Big bird: Transformers for longer sequences. Adv Neural Inf Process Syst. 2020;33:17283–97.

    Google Scholar 

  42. Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst. 2014;27.

    Google Scholar 

  43. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN. Convolutional sequence to sequence learning. In: International conference on machine learning. PMLR; 2017. p. 1243–52.

    Google Scholar 

  44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.

    Google Scholar 

  45. Phan LN, Anibal JT, Tran H, Chanana S, Bahadroglu E, Peltekian A, et al. Scifive: a text-to-text transformer model for biomedical literature. ar**v preprint ar**v:210603598; 2021.

    Google Scholar 

  46. Feurer M, Hutter F. Hyperparameter Optimization. In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning: methods, systems, challenges. Cham: Springer International Publishing; 2019. p. 3–33.

    Chapter  Google Scholar 

  47. Laparra E, Mascio A, Velupillai S, Miller T. A review of recent work in transfer learning and domain adaptation for natural language processing of electronic health records. Yearb Med Inform. 2021;30(1):239–44. https://doi.org/10.1055/s-0041-1726522.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(1):Article 140.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Bethard .

Editor information

Editors and Affiliations

Glossary

Machine learning

(from the standard Springer glossary).

Gradient descent

an optimization algorithm that repeatedly updates a model’s parameters based upon the gradient of the model’s cost function.

Parameter

(of a machine learning model): a choice (typically a numeric value) that is determined automatically by the machine learning algorithm from the training data.

Hyperparameter

(of a machine learning model): a choice (typically a numeric value) that is not made by the learning algorithm but must instead be made by the machine learning designer.

Parameter initialization

how initial values are assigned to the parameters of a machine learning model.

Mini-batch

a small sample of the training data.

Stochastic gradient descent

gradient descent using mini-batches, rather than the entire training data, for each gradient step.

Training set

example inputs and outputs on which the parameters of a machine learning model are to be tuned.

Development set

example inputs and outputs on which the hyper-parameters of a machine learning model are to be tuned.

Test set

example inputs and outputs on which the generalization of a machine learning model is to be evaluated.

Learning curve

a plot of model performance on the training and/or development data against varying amounts of training data.

Underfitting

when a model has a high cost on the training set.

Overfitting

when a model has a low cost on the training set but a high cost on the development or test set.

Regularization

including a measure of model complexity, in addition to a measure of error on the training data, in the cost function that is minimized for a machine learning model.

Grid search

a hyper-parameter search algorithm where the designer selects a small number of values for each hyperparameter and then explores all possible combinations of such hyperparameter values.

Random search

a hyper-parameter search algorithm where the designer selects a numeric range of values for each hyperparameter and then samples a fixed number of combinations, each time sampling all hyperparameter values from their specified ranges.

Feature engineering

designing the inputs that are fed into a machine learning algorithm.

1-hot vector

a vector where a single entry is 1 and all other entries are 0.

Word embedding

a small dense vector used to represent a word.

Bag-of-words

a representation of text as a vector of counts of each of the words in a vocabulary that were found in that text.

Linear regression

a machine learning model that assumes that outputs can be predicted as a weighted sum of the inputs.

Logistic regression

a machine learning model that assumes that outputs can be predicted by applying a logistic sigmoid over the weighted sum of the inputs.

Support vector machine

a machine learning model that assumes that outputs can be predicted as a weighted sum of the inputs, under a learning process that maximizes the margin between the closest examples of one class and the other.

Decision tree

a machine learning model that assumes that outputs can be predicted by applying a series of tests to different features of the input.

k-nearest neighbors

a machine learning model that assumes that an output can be predicted by taking a new input, finding the most similar inputs in the training examples, and producing the most common output from those training examples.

Deep learning

applying complex neural network architectures to learn nonlinear transformations of the input.

Feedforward network

a machine learning model that assumes that outputs can be predicted as non-linear combinations of the inputs.

Convolutional network

a machine learning model that assumes that outputs can be predicted by aggregating over non-linear combinations of small regions of the input.

Recurrent network

a machine learning model that assumes that an output can be predicted by walking through the input one step at a time, and at each step, non-linearly combining the new step’s input with an aggregation of all of the previous steps’ inputs.

Transformer network

a machine learning model that assumes that an output can be predicted by non-linear combinations over all pairs of time steps in the input.

Sequence-to-sequence model

a model that takes a sequence of inputs and produces a sequence of outputs, where there is no guaranteed relation between the length of the inputs and outputs.

Pre-training

(a neural network): training a neural network on a task, typically using large unlabeled data, with the assumption that the network will later be fine-tuned.

Fine-tuning

(a neural network): taking the learned parameters of a pre-trained model and using those as the starting point for training on a new task.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bethard, S. (2024). Machine Learning and Deep Learning Algorithms. In: Xu, H., Demner Fushman, D. (eds) Natural Language Processing in Biomedicine. Cognitive Informatics in Biomedicine and Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-031-55865-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-55865-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-55864-1

  • Online ISBN: 978-3-031-55865-8

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics

Navigation