In this tutorial, we will learn how to randomly sample from letters or alphabets. Python’s random module has number of functions to generate random numbers from different distribution. We will first randomly sample single letter using random module’s choice() function and then randomly sample multiple letters using random module’s choices() function. Let us first load […]
How to Replace Multiple Column Names of a Dataframe with tidyverse
Of late, I am renaming column names of a dataframe a lot, in different flavors, in R using tidyverse. And every time I have to google it up :). Just came across, a really neat trick from Shannon Pileggi on twitter to replace multiple column names using deframe() function and !!! splice operator. Here is […]
How to lump factors in Pandas
Sometimes you would like to collapse least frequent values of a factor or character variable in to a new category “Other”. In R forcats library has a suit of functions for lumping the variables. This post contains a Pandas solution that can lump factors or values in three common ways. First, we will see how […]
Barplots and Countplot with Seaborn’s catplot
Love it or hate it, barplots are often useful in a quick exploratory data analysis to understand the variables in a dataset. In this post, we will see multiple examples on how to make barplots/countplot using Seaborn’s catplot() function. A couple of years ago Seaborn introduced catplot() function that provides a common framework to make […]
How to Replace NAs with column mean or row means with tidyverse
Just a quick rstat post on a simple imputation approach here for the future self. SVD/PCA is one of the first things I do for analyzing any new high dimensional data. Often such data are messy and have some missing values. Depending on the situation, I often resort to removing the rows with missing data […]