How To Add Identifier Column When Concatenating Pandas data frames?

Pandas concat() function is great for concating two data frames or appending one dataframe to another with same columns. Sometimes, you might want to keep an identifier for each appended dataframe. In this post, we will see an example of how to concat two dataframes with an identifier.

Let us import Pandas and numpy to create some data and two dataframes.

import pandas as pd
import numpy as np

Let us create two dataframes from scratch. We use Numpy’s random module to create some data and assign row name and column names

df1 = pd.DataFrame(np.random.randint(20, size=(2,3)),
                  index=list('ij'),
                   columns=list('ABC'))
df1


        A	B	C
i	2	9	4
j	18	13	18

Here we create our second dataframe using Pandas DataFrame() function.

df2 = pd.DataFrame(np.random.randint(20, size=(2,3)),
                  index=list('mn'),
                   columns=list('ABC'))
df2
        A	B	C
m	5	16	9
n	11	0	9

We can perform row binding of two dataframes with Pandas concat() function. To add an identifier for each dataframe, we need to specify the identifiers as a list for the argument “keys” in Pandas concat() function.

pd.concat([df1,df2],keys=['t1', 't2'])

It creates new multi-indexed Pandas dataframe with two dataframes concatenated. One of the row indexes is row index from input dataframe and the other row index is the identifier we added.

                A	B	C
t1	i	2	9	4
        j	18	13	18
t2	m	5	16	9
        n	11	0	9

We can use Pandas reset_index() function to convert the multiindex dataframe to regular dataframe

pd.concat([df1,df2], keys=['t1', 't2']).reset_index()

Pandas’ reset_index() automatically adds column names for the new columns created from the row names.

	level_0	level_1	A	B	C
0	t1	i	2	9	4
1	t1	j	18	13	18
2	t2	m	5	16	9
3	t2	n	11	0	9

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.