tidyverse 101

13 Tips to Randomly Select Rows with tidyverse

July 5, 2022 by cmdlinetips

In this post, we will learn how to randomly sample rows from a data frame that is useful in most common scenarios. Tidyverse has a few options to randomly sample rows from a dataframe. slice_sample() in dplyr is the currently recommended function to use for randomly select rows. The older function in dplyr, sample_n(), for […]

dplyr matches(): select columns using regular expression

June 20, 2022 by cmdlinetips

This quick post has an example using a neat dplyr function matches() to select columns using regular expressions. dplyr has a number of helper functions, contains(), starts_with() and others, for selecting columns based on certain condition. For example if you interested selecting columns based on how its starts with we can use start_with() function. However, […]

How to Replace Multiple Column Names of a Dataframe with tidyverse

March 1, 2022 by cmdlinetips

Of late, I am renaming column names of a dataframe a lot, in different flavors, in R using tidyverse. And every time I have to google it up :). Just came across, a really neat trick from Shannon Pileggi on twitter to replace multiple column names using deframe() function and !!! splice operator. Here is […]

How to Replace NAs with column mean or row means with tidyverse

January 15, 2022 by cmdlinetips

Just a quick rstat post on a simple imputation approach here for the future self. SVD/PCA is one of the first things I do for analyzing any new high dimensional data. Often such data are messy and have some missing values. Depending on the situation, I often resort to removing the rows with missing data […]