• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / groupby / How to Collapse Multiple Columns in Pandas? Groupby with Dictionary

How to Collapse Multiple Columns in Pandas? Groupby with Dictionary

September 15, 2018 by cmdlinetips

Often you may want to collapse two or multiple columns in a Pandas data frame into one column. For example, you may have a data frame with data for each year as columns and you might want to get a new column which summarizes multiple columns. One may need to have flexibility of collapsing columns of interest into one. If you guess, this is kind of “groupby operation” you are right.

Let us see an pictorial example of what we aim to do. Here we have a data frame with five columns and four of them are data from first two weeks of Jan and Feb. We want to create a new dataframe with just two columns one for Jan and the other for Feb, i.e combining the weekly data to monthly data.

Collapse columns with Groupby and Dictionary

Let us see a small example of collapsing columns of Pandas dataframe by combining multiple columns into one.

Let us first load NumPy and Pandas.

import numpy as np
import pandas as pd

We will use NumPy’s random module to create random data and use them to create a pandas data frame.

n = 3
Jan_week1 = np.random.randint(100, size=n).tolist()
Jan_week2 = np.random.randint(100, size=n).tolist()
Feb_week1 = np.random.randint(100, size=n).tolist()
Feb_week2 = np.random.randint(100, size=n).tolist()
sample_id = zip(["S"]*n,list(range(1,n+1)))
s_names=[''.join([w[0],str(w[1])]) for w in sample_id] 
s_names

Now we have all the columns we needed as lists. Let us convert multiple lists into a single data frame, first by creating a dictionary for each list with a name.

d = {'s_names':s_names, 'Jan_week1':Jan_week1,'Jan_week2':Jan_week2,
    'Feb_week1':Feb_week1, 'Feb_week2':Feb_week2}
d
{'Feb_week1': [32, 20, 38],
 'Feb_week2': [68, 7, 82],
 'Jan_week1': [8, 21, 65],
 'Jan_week2': [42, 33, 2],
 's_names': ['S1', 'S2', 'S3']}

Then use Pandas dataframe into dict. Now we get a data frame with four columns of data and one column for names.

df_reps = pd.DataFrame(d)
df_reps
	Feb_week1	Feb_week2	Jan_week1	Jan_week2	s_names
0	32	68	8	42	S1
1	20	7	21	33	S2
2	38	82	65	2	S3

How to Collapse/Combine Columns in Pandas Data Frame?

Now we have data frame with multiple columns and we want to collapse or combine multiple columns using specific rule. Ideally, we would like to clearly specify which columns we want to combine or collapse. We can do that by specifying the mapping as a dictionary, where the keys are the names of columns we would like to combine and the values are the names of resulting column.

mapping = {'Jan_week1':'Jan','Jan_week2':'Jan',
    'Feb_week1':'Feb', 'Feb_week2':'Feb'}
mapping
{'Feb_week1': 'Feb',
 'Feb_week2': 'Feb',
 'Jan_week1': 'Jan',
 'Jan_week2': 'Jan'}

We can use the mapping dictionary with in groupby function and specify axis=1 to groupby columns. Since we only want to collapse multiple columns of data not the “name” column, we first set it as row index and reset it later.

df =df_reps.set_index('s_names').groupby(mapping, axis=1).sum()
df.reset_index(level=0)
	s_names	Feb	Jan
0	S1	100	50
1	S2	27	54
2	S3	120	67

Now we have successfully combined multiple columns and have the collapsed data frame we wanted.

In the above example we collapsed multiple numerical columns into a single column. If you want to collapse multiple text columns into a single, check this post

  • How to Combine Two Text Columns into a Single Column in Pandas?


Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Create a Pandas Dataframe from Lists Default ThumbnailHow To Collapse Multiple Text Columns in Dataframe Using Tidyverse? Default ThumbnailHow To Add a New Column Using a Dictionary in Pandas Data Frame ? Default ThumbnailHow To Drop Multiple Columns in Pandas Dataframe?

Filed Under: groupby, Pandas DataFrame, Python Tagged With: Pandas Collapse Columns, Pandas Combine Columns, Pandas Data Frame

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version