Abstract
Having more features for inference may be thought of better than having just a handful of features. However, in machine learning, data scientists usually treat having many features as a “curse of dimensionality.” The reasoning behind this is having large dimensions makes data exploration and visualization difficult. It is also computationally expensive to train models on high-dimensional datasets. After all, every dimension may not play a significant role in machine learning. Thus, it is always an advantage to reduce the dimensions. There are several techniques available for dimensionality reductions. I have covered almost 14 different techniques in this chapter. Some of these are trivial and require manual inspection; while there are many advanced techniques which are fully automated. To list a few, I discuss factor analysis, PCA, ICA, t-SNE, UMAP, SVD, and LDA. I describe each technique with the implementation code on an appropriate dataset and perform a series of experiments to show you the effectiveness of each technique. This will help you in gaining a solid knowledge of dimensionality reduction, which is a critical step in data science process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sarang, P. (2023). Dimensionality Reduction. In: Thinking Data Science. The Springer Series in Applied Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-02363-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-02363-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02362-0
Online ISBN: 978-3-031-02363-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)