Of late, I am renaming column names of a dataframe a lot, in different flavors, in R using tidyverse. And every time I have to google it up :). Just came across, a really neat trick from Shannon Pileggi on twitter to replace multiple column names using deframe() function and !!!
splice operator. Here is a quick post for this more general version of renaming column names for future self.
Often the base R way of using colnames() to change the names might work if you the column names are in the same order as the new name vector.
colnames(df) <- new_name_vector
If it does, well and good. Sometimes I need a solution in tidyverse way. One of the ways I have been renaming the column names using rename_with() function in tidyverse. This approach is bit limited as it is useful mainly for some kind of string match and replace.
First, let us create a toy data frame with column names that we would like to change to some new names.
library(tidyverse)
df <- tibble(var_1 = letters[1:5], var_2 = sample(5), var_3 = sample(5), var_4 = sample(5))
The tibble we created has four columns and for the sake of ease, all the names have some common string.
df ## # A tibble: 5 × 4 ## var_1 var_2 var_3 var_4 ## <chr> <int> <int> <int> ## 1 a 4 3 2 ## 2 b 1 4 4 ## 3 c 5 2 3 ## 4 d 2 5 5 ## 5 e 3 1 1
Renaming multiple column names using rename_with() function
If we wanted to replace the common string in the names to something else, we can use rename_with() function in combination with substitute function like gsub to replace a pattern.
df %>% rename_with(function(x){gsub("var","variable",x)})
In this example, we have changed “var” in column names to “variable”. In the above solution rename columns, we were replacing any occurrence of “var” with “variable”.
## # A tibble: 5 × 4 ## variable_1 variable_2 variable_3 variable_4 ## <chr> <int> <int> <int> ## 1 a 4 3 2 ## 2 b 1 4 4 ## 3 c 5 2 3 ## 4 d 2 5 5 ## 5 e 3 1 1
However, what if we don’t want to simply replace a part of a string in new column names. The above approach is not useful as of now. Ideally we should be able to provide a character/string vector of interest as new column names.
Renaming multiple column names using named character vector and !!! splice operator
A better solution would be to have a dictionary (like Pandas rename() function in Python) or a lookup table containing current column name and the new column names. The neat tip shared by Shannon Pileggi on twitter does exactly that. One of the biggest advantage with this approach is that we can change the column names to anything we like. For example, if we want to rename the first column name to “ID” and the rest to “variable_*”, here is how to do it.
We need to create a dataframe with two columns, one containing new column names and the second containig old column names and
df_colnames <- tibble( "new_name" = c("ID",paste0("variable_",2:4)), "old_name" = colnames(df) )
df_colnames ## # A tibble: 4 × 2 ## new_name old_name ## <chr> <chr> ## 1 ID var_1 ## 2 variable_2 var_2 ## 3 variable_3 var_3 ## 4 variable_4 var_4
Then we will use the deframe() function in tibble to convert the two-column data frames to a named vector.
var_names <- deframe(df_colnames)
deframe() function uses first column as name and the second column as value.
var_names ## ID variable_2 variable_3 variable_4 ## "var_1" "var_2" "var_3" "var_4"
Then we can use rename() function using. the magic “!!!” slice operator on the named vector that we just created to replace the old column names to the new names.
df %>% rename(!!!var_names) ## # A tibble: 5 × 4 ## ID variable_2 variable_3 variable_4 ## <chr> <int> <int> <int> ## 1 a 1 2 5 ## 2 b 3 4 1 ## 3 c 5 1 3 ## 4 d 4 3 4 ## 5 e 2 5 2
Isn’t this nice?