13 Tips to Randomly Select Rows with tidyverse

In this post, we will learn how to randomly sample rows from a data frame that is useful in most common scenarios. Tidyverse has a few options to randomly sample rows from a dataframe. slice_sample() in dplyr is the currently recommended function to use for randomly select rows. The older function in dplyr, sample_n(), for… Continue reading 13 Tips to Randomly Select Rows with tidyverse

dplyr matches(): select columns using regular expression

This quick post has an example using a neat dplyr function matches() to select columns using regular expressions. dplyr has a number of helper functions, contains(), starts_with() and others, for selecting columns based on certain condition. For example if you interested selecting columns based on how its starts with we can use start_with() function. However,… Continue reading dplyr matches(): select columns using regular expression

How to create a list of plot objects and save them as files

How to make a list of ggplots using map() in purrr

In this post, we will learn a really nice trick on creating multiple ggplots from a dataframe and saving the plots into files using ggsave, using tidyverse purrr’s magic. We will use Purrr’s map function to create multiple plots from a dataframe and use another Purrr function pwalk to save the plots as files. Learned… Continue reading How to create a list of plot objects and save them as files

How to Replace Multiple Column Names of a Dataframe with tidyverse

Of late, I am renaming column names of a dataframe a lot, in different flavors, in R using tidyverse. And every time I have to google it up :). Just came across, a really neat trick from Shannon Pileggi on twitter to replace multiple column names using deframe() function and !!! splice operator. Here is… Continue reading How to Replace Multiple Column Names of a Dataframe with tidyverse

How to Replace NAs with column mean or row means with tidyverse

Replace NAs with Column/Row Mean

Just a quick rstat post on a simple imputation approach here for the future self. SVD/PCA is one of the first things I do for analyzing any new high dimensional data. Often such data are messy and have some missing values. Depending on the situation, I often resort to removing the rows with missing data… Continue reading How to Replace NAs with column mean or row means with tidyverse