• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Data Science / PCA tips / 10 quick tips for effective dimensionality reduction

10 quick tips for effective dimensionality reduction

June 23, 2019 by cmdlinetips

Dimensionality reduction techniques like PCA, SVD, tSNE, UMAP are fantastic toolset to perform exploratory data analysis and unsupervised learning with high dimensional data. It has become really easy to use many available dimensionality reduction techniques in both R and Python while doing data science. However, often it can be little bit challenging to interpret low dimensions we get after reducing dimensions and use the dimensionality reduction techniques effectively.

Lan Huong Nguyen and Susan Holmes, statistics professor at Stanford and the author of fantastic new book on modern statistics, have published an article “Ten quick tips for effective dimensionality reduction“. The article is freely available from PLoS journal.

There are many gems in the 10 tips that are useful for those are new to dimensionality reduction techniques and the ones who use them a lot.

I just had a chance to go through the article and here are the three tips from the article that I liked the most. Not because others were not good, but these three are the ones that never taught commonly, but extremely useful.

Correct aspect ratio for your visualizations

One of the tips I love the most is tip 6 on the use of aspect ratio to visualize the reduced dimensions of the data

Two-dimensional PCA plots with equal height and width are misleading but frequently encountered because popular software programs for analyzing biological data often produce square (2D) or cubical (3D) graphics by default. Instead, the height-to-width ratio of a PCA plot should be consistent with the ratio between the corresponding eigenvalues. Because eigenvalues reflect the variance in coordinates of the associated PCs, you only need to ensure that in the plots, one “unit” in direction of one PC has the same length as one “unit” in direction of another PC. (If you use ggplot2 R package for generating plots, adding + coords_fixed(1) will ensure a correct aspect ratio.)

Integrating Multi-Modal Datasets

Another one I like the most, but commonly overlooked for many reasons is tip 9 dealing with multiple high dimensional datasets from the same system. Increasingly, we are having not one high dimensional dataset, but many high dimensional data sets from the same set of samples. For example, you may have both audio and video data sets from the same set of individuals and you are interested in capturing the commonalities between the datasets. Techniques like Canonical Correlation Analysis can be useful in such scenarios. Tip 9, runs through a simple example of integrating 5 different high dimensional datasets from the same system using DiSTATIS is a must read.

Quantifying Uncertainties in Principal Components

Another one that is very useful is tip 10 that emphasizes on quantifying the uncertainty in principal components. One my pet peeves when people using PCs in their analysis is treating all PCs the same way ignoring the percentage of variance explained by each component. Scree plot is great way visualize the importance of each principal component (See tip 5). often

For some datasets, the PCA PCs are ill defined, i.e., two or more successive PCs may have very similar variances, and the corresponding eigenvalues are almost exactly the same

Quantifying Uncertainty in Principal Components

One of the ways to deal with such scenarios and estimate uncertainties is to use bootstrap techniques, i.e.use

random subsets of the data generated by resampling observations with replacement.

Quantifying uncertainties in PCs is an important topic, but you see it rarely. It was really great to see that nicely explained with an example.

If you are curious on other tips for effective dimensionality reduction, here it is.

  • Tip 1: Choose an appropriate method
  • Tip 2: Preprocess continuous and count input data
  • Tip 3: Handle categorical input data appropriately
  • Tip 4: Use embedding methods for reducing similarity and dissimilarity input data
  • Tip 5: Consciously decide on the number of dimensions to retain
  • Tip 6: Apply the correct aspect ratio for your visualizations
  • Tip 7: Understand the meaning of the new dimensions
  • Tip 8: Find the hidden signal
  • Tip 9: Take advantage of multidomain data
  • Tip 10: Check the robustness of your results and quantify uncertainties

Interested in dabbling with the examples of these 10 tips? No worries, check out the Rmarkdown illustrating all the 10 tips for effective dimensionality reduction available as supplement to the article.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailDimensionality Reduction with tSNE in Python PCA Example in Python with scikit-learn Default ThumbnailHow To Do PCA in tidyverse Framework? PCA plotPCA example using prcomp in R

Filed Under: PCA tips, tips for dimensionality reduction Tagged With: Multi-modal dataset, PCA tips, tips for dimensionality reduction

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version