Matrix decomposition by Singular Value Decomposition (SVD) is one of the widely used methods for dimensionality reduction. For example, Principal Component Analysis often uses SVD under the hood to compute principal components. In this post, we will work through an example of doing SVD in Python. We will use gapminder data in wide form to […]
How To Do PCA in tidyverse Framework?
In an earlier post, we saw a tutorial on how to do PCA in R using gapminder data set. Another interesting way of doing PCA is to follow the tidyverse framework. In this post, we will see an example of doing PCA analysis using gapminder data in a tidy framework. Being the first attempt to […]
How To Create a Column Using Condition on Another Column in Pandas?
Often while cleaning data, one might want to create a new variable or column based on the values of another column using conditions. In this post we will see two different ways to create a column based on values of another column using conditional statements. First we will use NumPy’s little unknown function where to […]
Empirical cumulative distribution function (ECDF) in Python
Histograms are a great way to visualize a single variable. One of the problems with histograms is that one has to choose the bin size. With a wrong bin size your data distribution might look very different. In addition to bin size, histograms may not be a good option to visualize distributions of multiple variables […]
How To Randomly Add NaN to Pandas Dataframe?
In this post we will see an example of how to introduce missing value, i.e. NaNs randomly in a data frame uusisng Pandas. Sometimes while testing a method, you might want to create a Pandas dataframe with NaNs randomly distributed. Here wee show how to do it. Let us load the packages we need Let […]