In this post, we will learn how to insert a column at specific location in a Pandas dataframe. We will use Pandas insert() function to insert column into DataFrame at a specified location with a specific name.
import numpy as np import pandas as pd pd.__version__ 1.0.0
Let us create a data frame using NumPy’s random module.
# set random seed to reproduce the same data np.random.seed(42) # create Pandas data frame with 3 columns using numpy array df =pd.DataFrame(np.random.randint(20, size=(8,3)), columns=list('ABD'))
Our data frame looks like this. We have three columns with names A, B and D.
df.head() A B D 0 6 19 14 1 10 7 6 2 18 10 10 3 3 7 2 4 1 11 5
Let us try to insert a new column C, before the column D in the Pandas dataframe. We can use Pandas’ insert() function to insert a column. We need to specify the index of location, name of the column, and the actual column vector as arguments to insert() function.
df.insert(2,"C",np.random.randint(20, size=8))
df.head() A B C D 0 6 19 18 14 1 10 7 11 6 2 18 10 19 10 3 3 7 2 2 4 1 11 4 5
If we try to insert a column with a name that exists in the dataframe already, like shown below.
df.insert(2,"B",np.random.randint(20, size=8))
We will get a valueError as shown below by default.
ValueError: cannot insert B, already exists
We can change the above behaviour with allow_duplicates=True while we insert a column. For example we can insert B column with allow_duplicates=True
df.insert(2,"B",np.random.randint(20, size=8),allow_duplicates=True) df.head()
And we get a Pandas dataframe with duplicate column names.
A B B D 0 6 19 6 14 1 10 7 17 6 2 18 10 3 10 3 3 7 13 2 4 1 11 17 5
Inserting a column into a dataframe is a tricky task. A better way to add to add a column is to use two dataframe with common ID and merge them. One of the common applications of Pandas insert() function is to move a column to the front of the dataframe.
This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.