I just came across a useful little function in tidyr called separate_rows(). Often you may have a data frame with a column containing multiple information concatenated together with a delimiter. For example, we might have data frame with members of a family in a column separated by a delimiter. Here is a pictorial representation of separating a column into multiple rows. .
tidyr’s separate_rows() makes it easier to do it. Let us make a toy dataframe with multiple names in a column and see two examples of separating the column into multiple rows, first using dplyr’s mutate and unnest and then using the separate_rows() function from tidyr.
library(tidyverse)
df <- data.frame(Family_ID = 1:2, name = c("Smith, John", "Walker, Mike"), stringsAsFactors = FALSE) df ## Family_ID name ## 1 1 Smith, John ## 2 2 Walker, Mike
How To Separate Column into Rows?
Let us first use mutate and unnest to split a column into multiple rows. We started with two rows and the name column had two names separated by comma.
We first split the name using strsplit as an argument to mutate function. Then we pass that to unnest to get them as separate rows.
df %>% mutate(name=strsplit(name, ",")) %>% unnest(name) ## Family_ID name ## 1 1 Smith ## 2 1 John ## 3 2 Walker ## 4 2 Mike
How To Separate Column into Rows with separate_rows()
Instead of using the hack of using mutate to create the same name with strsplit, mutate() and unnest(), we can use tidyr’s separate_rows() with column name that we want to split into multiple rows.
df %>% separate_rows(name)
## Family_ID name ## 1 1 Smith ## 2 1 John ## 3 2 Walker ## 4 2 Mike