Abstract
This chapter presents a review of statistical and machine learning models to tackle data science problems, arguably the most popular approaches. Both supervised and unsupervised algorithms are described along with practical considerations when using these methods. Empirical results on exemplar datasets are also presented where applicable to illustrate the application of these methods to real-world problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arthur, D., & Vassilvitskii, S. (2007). k-Means++: The advantages of careful seeding. In SODA ’07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1027–1035).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Bottou, L., & Bengio, Y. (1995). Convergence properties of the k-means algorithms. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7). MIT Press.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chen, C., Liu, Y., & Peng, L. (2019). How to develop machine learning models for healthcare. Nature Materials, 18, 410–414.
Christopher, M. (2006). Pattern recognition and machine learning. Springer.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2), 103–130.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1), 3133–3181.
Friedman, J. B. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.
Funahashi, K. I. (1989). On the approximate realization of continuous map**s by neural networks. Neural Networks, 2(3), 183–192.
Garzon, M., & Botelho, F. (1999). Dynamical approximation by recurrent neural networks. Neurocomputing, 29(1), 25–46.
Glantz, S. A., Slinker, B. K., & Neilands, T. B. (1990). Primer of applied regression and analysis of variance. McGraw-Hill Inc.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Gideon, S., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Hastie, T. J., & Tibshirani, R. J. (1986). Generalized additive models. Statistical Science, 43(3), 297–310.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.
Kelleher, J. D., MacNamee, B., & D’Arcy, A. (2015). Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
Mount, J., & Zumel, N. (2019). Practical data science with R. Simon & Schuster.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Ripley, B. D. (2007). Pattern recognition and neural networks. Cambridge University Press.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Schapire, R. E. (2013). Explaining AdaBoost. In Empirical inference (pp. 37–52). Springer.
Taddy, M. (2019). Business data science: Combining machine learning and economics to optimize, automate, and accelerate business decisions. McGraw Hill Professional.
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Venugopal, D., Deng, LY., Garzon, M. (2022). Solutions to Data Science Problems. In: Garzon, M., Yang, CC., Venugopal, D., Kumar, N., Jana, K., Deng, LY. (eds) Dimensionality Reduction in Data Science. Springer, Cham. https://doi.org/10.1007/978-3-031-05371-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-05371-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05370-2
Online ISBN: 978-3-031-05371-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)