In this post, we will learn how to randomly sample rows from a data frame that is useful in most common scenarios. Tidyverse has a few options to randomly sample rows from a dataframe. slice_sample() in dplyr is the currently recommended function to use for randomly select rows. The older function in dplyr, sample_n(), for […]
tidyverse 101
dplyr matches(): select columns using regular expression
This quick post has an example using a neat dplyr function matches() to select columns using regular expressions. dplyr has a number of helper functions, contains(), starts_with() and others, for selecting columns based on certain condition. For example if you interested selecting columns based on how its starts with we can use start_with() function. However, […]
How to create a list of plot objects and save them as files
In this post, we will learn a really nice trick on creating multiple ggplots from a dataframe and saving the plots into files using ggsave, using tidyverse purrr’s magic. We will use Purrr’s map function to create multiple plots from a dataframe and use another Purrr function pwalk to save the plots as files. Learned […]
How to Replace Multiple Column Names of a Dataframe with tidyverse
Of late, I am renaming column names of a dataframe a lot, in different flavors, in R using tidyverse. And every time I have to google it up :). Just came across, a really neat trick from Shannon Pileggi on twitter to replace multiple column names using deframe() function and !!! splice operator. Here is […]
How to Replace NAs with column mean or row means with tidyverse
Just a quick rstat post on a simple imputation approach here for the future self. SVD/PCA is one of the first things I do for analyzing any new high dimensional data. Often such data are messy and have some missing values. Depending on the situation, I often resort to removing the rows with missing data […]