• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / 7 Tips to Add Columns to a DataFrame with add_column() in tidyverse

7 Tips to Add Columns to a DataFrame with add_column() in tidyverse

March 12, 2021 by cmdlinetips

Often while doing data analysis, one might create a new column or multiple columns to an existing data frame. In this post we will learn how to add one or more columns to a dataframe in R. tibble package in tidyverse, has a lesser known, but powerful function add_column(). We will learn 6 tips to use add_column() function to add one or more columns at the right place and making sure we don’t over right an existing column.

Let is first load tidyverse and create a simple data frame using tibble() function.

library(tidyverse)
df <- tibble(x=1:5,y=5:1)

Our simple dataframe contains two columns named x and y.

df

## # A tibble: 5 x 2
##       x     y
##   <int> <int>
## 1     1     5
## 2     2     4
## 3     3     3
## 4     4     2
## 5     5     1

1. How To Add a New Column?

We can add a new column to a dataframe using add_column() function providing the new column as an argument. In this example, we add a new column named “z” and we can see that we have added the new column to the dataframe.

df %>% 
  add_column(z=-2:2)

## # A tibble: 5 x 3
##       x     y     z
##   <int> <int> <int>
## 1     1     5    -2
## 2     2     4    -1
## 3     3     3     0
## 4     4     2     1
## 5     5     1     2

2. Add a column before another column?

add_column() in tibble/tidyverse is powerful. We can also specify where to add the new column. For example, we can add the new column just before another existing column using “.before” argument

df %>% 
  add_column(before_y=-2:2, .before="y")
## # A tibble: 5 x 3
##       x before_y     y
##   <int>    <int> <int>
## 1     1       -2     5
## 2     2       -1     4
## 3     3        0     3
## 4     4        1     2
## 5     5        2     1

3. How to Add a column after another column?

Similar to “.before” argument, add_column() function also has “.after” argument and we can use it to add a column after a another specific column.

In this example, we add a new column after “x” column using .after=”x” argument to add_column() function.

df %>% 
  add_column(after_x=-2:2, .after="x")

## # A tibble: 5 x 3
##       x after_x     y
##   <int>   <int> <int>
## 1     1      -2     5
## 2     2      -1     4
## 3     3       0     3
## 4     4       1     2
## 5     5       2     1

4. How to Add a column with same values?

Often you might face a situation, where you need to add a new column with same values repeated for each row. With add_column() we can add a column with same values as in the previous examples, but this time we specify the value we would like to repeat just once. We don’t need to create a vector repeating the same values to add new column.

Here, we add a new column called “batch” with repeating “batch1” for all the rows.

df %>% 
  add_column(batch_id="batch1")

## # A tibble: 5 x 3
##       x     y batch_id
##   <int> <int> <chr>   
## 1     1     5 batch1  
## 2     2     4 batch1  
## 3     3     3 batch1  
## 4     4     2 batch1  
## 5     5     1 batch1

5. How To Add multiple columns?

To add multiple columns, we specify each column that we would like to add separated by comma as shown below.

df %>% 
  add_column(z=-2:2,
             batch_id="batch1")

We have added two columns with add_column() function.

## # A tibble: 5 x 4
##       x     y     z batch_id
##   <int> <int> <int> <chr>   
## 1     1     5    -2 batch1  
## 2     2     4    -1 batch1  
## 3     3     3     0 batch1  
## 4     4     2     1 batch1  
## 5     5     1     2 batch1

6. How to Avoid Adding Duplicate Columns?

One of the concerns while adding a new column is that we might over write an existing column with the same name. add_column() function offers multiple options to deal with duplicate columns.

For example, if we try to add a duplicate column with the same name like here

df %>% 
  add_column(x=-2:2)

By default, we would get an error warning us the new column cannot be a duplicate. In this case, we already have column named “x” and we are trying to add another column with the name “x”.

Error: Column name `x` must not be duplicated. Run `rlang::last_error()` to see where the error occurred.

However, sometimes you might want to add the new column, by dealing with the duplicate names. add_column() function has “.name_repair” argument with multiple options to deal with duplicate columns, Here are the arguments “.name_repair” can take check_unique, unique, universal, minimal.

Here, when we specify .name_repair = "universal", add_column() changes the column names to make them distinct.

df %>% 
  add_column(x=-2:2,
             .name_repair = "universal")

add_column() warns us that it is changing the column names.

## New names:
## * x -> x...1
## * x -> x...3

Now, we can see that the first column with name “x” is called “x..1” and the recent one we added is named “x..3”.

## # A tibble: 5 x 3
##   x...1     y x...3
##   <int> <int> <int>
## 1     1     5    -2
## 2     2     4    -1
## 3     3     3     0
## 4     4     2     1
## 5     5     1     2

7. Dealing with more/less observations in the new column

Another useful functionality of add_column() is that it guards us against adding a new column whose length differs from the number of rows of the dataframe.

For example, when we try to add a column with 6 elements to a dataframe with 5 rows

df %>% 
  add_column(z=-2:3)

We get an error telling us

Error: New columns must be compatible with `.data`. x New column has 6 rows. ? `.data` has 5 rows. Run `rlang::last_error()` to see where the error occurred.

Also, we will get a similar error if we try to add a column with fewer elements than the number of rows of dataframe.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default Thumbnail3 Ways to Add New Columns to Pandas Dataframe? Default Thumbnail9 Ways To Create New Variables with tidyverse Default ThumbnailHow to Compute Summary Statistics Across Multiple Columns in R Default ThumbnailHow To Drop Multiple Columns in Pandas Dataframe?

Filed Under: R, R Tips, tidyverse 101 Tagged With: R, tidyverse 101

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version