I just came across a useful little function in tidyr called separate_rows(). Often you may have a data frame with a column containing multiple information concatenated together with a delimiter. For example, we might have data frame with members of a family in a column separated by a delimiter. Here is a pictorial representation of […]
R
Introduction to Sparse Matrices in R
Often you may deal with large matrices that are sparse with a few non-zero elements. In such scenarios, keeping the data in full dense matrix and working with it is not efficient. A better way to deal with such sparse matrices is to use the special data structures that allows to store the sparse data […]
How To Do PCA in tidyverse Framework?
In an earlier post, we saw a tutorial on how to do PCA in R using gapminder data set. Another interesting way of doing PCA is to follow the tidyverse framework. In this post, we will see an example of doing PCA analysis using gapminder data in a tidy framework. Being the first attempt to […]
Empirical cumulative distribution function (ECDF) in Python
Histograms are a great way to visualize a single variable. One of the problems with histograms is that one has to choose the bin size. With a wrong bin size your data distribution might look very different. In addition to bin size, histograms may not be a good option to visualize distributions of multiple variables […]
How To Highlight Select Data Points with ggplot2 in R?
The power of ggplot2 lies in making it easy to make great plots and in easily tweaking it to the one wants. Sometimes, one might want to highlight certain data points in a plot in different color. Here we will see an example of highlighting specific data points in a plot. Let us first load […]