7 Tips to Add Columns to a DataFrame with add_column() in tidyverse

Often while doing data analysis, one might create a new column or multiple columns to an existing data frame. In this post we will learn how to add one or more columns to a dataframe in R. tibble package in tidyverse, has a lesser known, but powerful function add_column(). We will learn 6 tips to use add_column() function to add one or more columns at the right place and making sure we don’t over right an existing column.

Let is first load tidyverse and create a simple data frame using tibble() function.

library(tidyverse)
df <- tibble(x=1:5,y=5:1)

Our simple dataframe contains two columns named x and y.

df

## # A tibble: 5 x 2
##       x     y
##   <int> <int>
## 1     1     5
## 2     2     4
## 3     3     3
## 4     4     2
## 5     5     1

1. How To Add a New Column?

We can add a new column to a dataframe using add_column() function providing the new column as an argument. In this example, we add a new column named “z” and we can see that we have added the new column to the dataframe.

df %>% 
  add_column(z=-2:2)

## # A tibble: 5 x 3
##       x     y     z
##   <int> <int> <int>
## 1     1     5    -2
## 2     2     4    -1
## 3     3     3     0
## 4     4     2     1
## 5     5     1     2

2. Add a column before another column?

add_column() in tibble/tidyverse is powerful. We can also specify where to add the new column. For example, we can add the new column just before another existing column using “.before” argument

df %>% 
  add_column(before_y=-2:2, .before="y")
## # A tibble: 5 x 3
##       x before_y     y
##   <int>    <int> <int>
## 1     1       -2     5
## 2     2       -1     4
## 3     3        0     3
## 4     4        1     2
## 5     5        2     1

3. How to Add a column after another column?

Similar to “.before” argument, add_column() function also has “.after” argument and we can use it to add a column after a another specific column.

In this example, we add a new column after “x” column using .after=”x” argument to add_column() function.

df %>% 
  add_column(after_x=-2:2, .after="x")

## # A tibble: 5 x 3
##       x after_x     y
##   <int>   <int> <int>
## 1     1      -2     5
## 2     2      -1     4
## 3     3       0     3
## 4     4       1     2
## 5     5       2     1

4. How to Add a column with same values?

Often you might face a situation, where you need to add a new column with same values repeated for each row. With add_column() we can add a column with same values as in the previous examples, but this time we specify the value we would like to repeat just once. We don’t need to create a vector repeating the same values to add new column.

Here, we add a new column called “batch” with repeating “batch1” for all the rows.

df %>% 
  add_column(batch_id="batch1")

## # A tibble: 5 x 3
##       x     y batch_id
##   <int> <int> <chr>   
## 1     1     5 batch1  
## 2     2     4 batch1  
## 3     3     3 batch1  
## 4     4     2 batch1  
## 5     5     1 batch1

5. How To Add multiple columns?

To add multiple columns, we specify each column that we would like to add separated by comma as shown below.

df %>% 
  add_column(z=-2:2,
             batch_id="batch1")

We have added two columns with add_column() function.

## # A tibble: 5 x 4
##       x     y     z batch_id
##   <int> <int> <int> <chr>   
## 1     1     5    -2 batch1  
## 2     2     4    -1 batch1  
## 3     3     3     0 batch1  
## 4     4     2     1 batch1  
## 5     5     1     2 batch1

6. How to Avoid Adding Duplicate Columns?

One of the concerns while adding a new column is that we might over write an existing column with the same name. add_column() function offers multiple options to deal with duplicate columns.

For example, if we try to add a duplicate column with the same name like here

df %>% 
  add_column(x=-2:2)

By default, we would get an error warning us the new column cannot be a duplicate. In this case, we already have column named “x” and we are trying to add another column with the name “x”.

Error: Column name `x` must not be duplicated. Run `rlang::last_error()` to see where the error occurred.

However, sometimes you might want to add the new column, by dealing with the duplicate names. add_column() function has “.name_repair” argument with multiple options to deal with duplicate columns, Here are the arguments “.name_repair” can take check_unique, unique, universal, minimal.

Here, when we specify .name_repair = "universal", add_column() changes the column names to make them distinct.

df %>% 
  add_column(x=-2:2,
             .name_repair = "universal")

add_column() warns us that it is changing the column names.

## New names:
## * x -> x...1
## * x -> x...3

Now, we can see that the first column with name “x” is called “x..1” and the recent one we added is named “x..3”.

## # A tibble: 5 x 3
##   x...1     y x...3
##   <int> <int> <int>
## 1     1     5    -2
## 2     2     4    -1
## 3     3     3     0
## 4     4     2     1
## 5     5     1     2

7. Dealing with more/less observations in the new column

Another useful functionality of add_column() is that it guards us against adding a new column whose length differs from the number of rows of the dataframe.

For example, when we try to add a column with 6 elements to a dataframe with 5 rows

df %>% 
  add_column(z=-2:3)

We get an error telling us

Error: New columns must be compatible with `.data`. x New column has 6 rows. ? `.data` has 5 rows. Run `rlang::last_error()` to see where the error occurred.

Also, we will get a similar error if we try to add a column with fewer elements than the number of rows of dataframe.