How to Collapse Multiple Columns in Pandas? Groupby with Dictionary

Often you may want to collapse two or multiple columns in a Pandas data frame into one column. For example, you may have a data frame with data for each year as columns and you might want to get a new column which summarizes multiple columns. One may need to have flexibility of collapsing columns of interest into one. If you guess, this is kind of “groupby operation” you are right.

Let us see an pictorial example of what we aim to do. Here we have a data frame with five columns and four of them are data from first two weeks of Jan and Feb. We want to create a new dataframe with just two columns one for Jan and the other for Feb, i.e combining the weekly data to monthly data.

Collapse columns with Groupby and Dictionary

Let us see a small example of collapsing columns of Pandas dataframe by combining multiple columns into one.

Let us first load NumPy and Pandas.

import numpy as np
import pandas as pd

We will use NumPy’s random module to create random data and use them to create a pandas data frame.

n = 3
Jan_week1 = np.random.randint(100, size=n).tolist()
Jan_week2 = np.random.randint(100, size=n).tolist()
Feb_week1 = np.random.randint(100, size=n).tolist()
Feb_week2 = np.random.randint(100, size=n).tolist()
sample_id = zip(["S"]*n,list(range(1,n+1)))
s_names=[''.join([w[0],str(w[1])]) for w in sample_id] 
s_names

Now we have all the columns we needed as lists. Let us convert multiple lists into a single data frame, first by creating a dictionary for each list with a name.

d = {'s_names':s_names, 'Jan_week1':Jan_week1,'Jan_week2':Jan_week2,
    'Feb_week1':Feb_week1, 'Feb_week2':Feb_week2}
d
{'Feb_week1': [32, 20, 38],
 'Feb_week2': [68, 7, 82],
 'Jan_week1': [8, 21, 65],
 'Jan_week2': [42, 33, 2],
 's_names': ['S1', 'S2', 'S3']}

Then use Pandas dataframe into dict. Now we get a data frame with four columns of data and one column for names.

df_reps = pd.DataFrame(d)
df_reps
	Feb_week1	Feb_week2	Jan_week1	Jan_week2	s_names
0	32	68	8	42	S1
1	20	7	21	33	S2
2	38	82	65	2	S3

How to Collapse/Combine Columns in Pandas Data Frame?

Now we have data frame with multiple columns and we want to collapse or combine multiple columns using specific rule. Ideally, we would like to clearly specify which columns we want to combine or collapse. We can do that by specifying the mapping as a dictionary, where the keys are the names of columns we would like to combine and the values are the names of resulting column.

mapping = {'Jan_week1':'Jan','Jan_week2':'Jan',
    'Feb_week1':'Feb', 'Feb_week2':'Feb'}
mapping
{'Feb_week1': 'Feb',
 'Feb_week2': 'Feb',
 'Jan_week1': 'Jan',
 'Jan_week2': 'Jan'}

We can use the mapping dictionary with in groupby function and specify axis=1 to groupby columns. Since we only want to collapse multiple columns of data not the “name” column, we first set it as row index and reset it later.

df =df_reps.set_index('s_names').groupby(mapping, axis=1).sum()
df.reset_index(level=0)
	s_names	Feb	Jan
0	S1	100	50
1	S2	27	54
2	S3	120	67

Now we have successfully combined multiple columns and have the collapsed data frame we wanted.

In the above example we collapsed multiple numerical columns into a single column. If you want to collapse multiple text columns into a single, check this post