3 Ways to Add New Columns to Pandas Dataframe?

While doing data wrangling or data manipulation, often one may want to add a new column or variable to an existing Pandas dataframe without changing anything else. Obviously the new column will have have the same number of elements.

Let us see examples of three ways to add new columns to a Pandas data frame.

Let us first load pandas library

import pandas as pd

Let us use gapminder data set to add new column or new variable in our examples. We will use gapminder data from Software Carpentry website given as data_url below.

data_url = 'http://bit.ly/2cLzoxH'
# load the gapminder dataframe from web as data frame
gapminder = pd.read_csv(data_url)
# select four columns
gapminder = gapminder[['country','year', 'gdpPercap', 'pop']]
# view few elements of the data frame
print(gapminder.head(3))
       country  year   gdpPercap         pop
0  Afghanistan  1952  779.445314   8425333.0
1  Afghanistan  1957  820.853030   9240934.0
2  Afghanistan  1962  853.100710  10267083.0

How To Add New Column to Pandas Dataframe by Indexing: Example 1

Let us say we want to create a new column from an existing column in the data frame. We can create a new column by indexing, using square bracket notation like we do to access the existing element.

For example, we can create a new column with population values in millions in addition to the original variable as

# add new column using square bracket notation
gapminder['pop_in_millions'] = gapminder['pop']/1e06

       country  year   gdpPercap         pop  pop_in_millions
0  Afghanistan  1952  779.445314   8425333.0         8.425333
1  Afghanistan  1957  820.853030   9240934.0         9.240934
2  Afghanistan  1962  853.100710  10267083.0        10.267083

How To Add New Column to Pandas Dataframe using loc: Example 2

Another way to add a new column to a dataframe is to use “loc” function. Here we specify the new column variable and its values.

 
gapminder.loc[:,'pop_in_millions'] = gapminder['pop']/1e06
gapminder.head(3)

       country  year   gdpPercap         pop  pop_in_millions
0  Afghanistan  1952  779.445314   8425333.0         8.425333
1  Afghanistan  1957  820.853030   9240934.0         9.240934
2  Afghanistan  1962  853.100710  10267083.0        10.267083

How To Add New Column to Pandas Dataframe using assign: Example 3

Inspired by dplyr’s mutate function in R to add new variable, Pandas’ recent versions have new function “assign” to add new columns. We can simply chain “assign” to the data frame.

 
gapminder.assign(pop_in_millions=gapminder['pop']/1e06).head(3) 

country	year	gdpPercap	pop	pop_in_millions
0	Afghanistan	1952	779.445314	8425333.0	8.425333
1	Afghanistan	1957	820.853030	9240934.0	9.240934
2	Afghanistan	1962	853.100710	10267083.0	10.267083

It returns a copy of the data frame as a new object with the new columns added to the original data frame. Remember that if you use the names of existing column, then it will be over-written.

With assign function, we can also use a function to add a new column. Here we use a lambda function to create nthe new column with population in millions.

gapminder.assign(pop_in_millions=lambda x: x['pop']/1e06).head()

With Python 3.6+, now one can create multiple new columns using the same assign statement so that one of the new columns uses another newly created column within the same assign statement.

For example, we can create two new variables such that the second new variable uses the first new column as shown below.

gapminder.assign(pop_in_millions=lambda x: x['pop']/1e6,
                pop_in_billions=lambda x: x['pop_in_millions']/1e3).head()

How To Add New Column to Pandas Dataframe by Indexing: Example 1

How To Add New Column to Pandas Dataframe using loc: Example 2

How To Add New Column to Pandas Dataframe using assign: Example 3

Share this:

Related posts: