• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / Group-level Mean as New Column / How to Add Group-Level Summary Statistic as a New Column in Pandas?

How to Add Group-Level Summary Statistic as a New Column in Pandas?

October 8, 2019 by cmdlinetips

In this post, we will see an example adding results from one of aggregating functions like mean/median after group_by() on a specific column as a new column. In other words, we might have group-level summary values for a column and we might to add the summary values back to the original dataframe we computed group-level summary.

import pandas as pd

Let us work with gapminder data to show how to add grouped mean as a new column.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

Let us say we want to add mean lifeExp values per each continent as a new column to the gapminder data frame. The values of the new column will be the same for within each continent. We know that Pandas aggregating function mean() can compute mean after group_by() on continent.

lifeExp_per_continent = gapminder.groupby('continent').lifeExp.mean()

lifeExp_per_continent
continent
Africa      48.865330
Americas    64.658737
Asia        60.064903
Europe      71.903686
Oceania     74.326208
Name: lifeExp, dtype: float64

How To Add Group Level Mean as New Column with Pandas transform() function?

Here we want to add these mean lifeExp values per continent to the gapminder dataframe. There are multiple ways to do that in Pandas. Here we will use Pandas transform() funtion to compute mean values and add it to the original dataframe.

We will first groupby() on continent and extract lifeExp values and apply transform() function to compute mean.

gapminder.groupby('continent').lifeExp.transform('mean')

0       60.064903
1       60.064903
2       60.064903
3       60.064903
4       60.064903
          ...    
1699    48.865330
1700    48.865330
1701    48.865330
1702    48.865330
1703    48.865330
Name: lifeExp, Length: 1704, dtype: float64

We can see that it creates a Series of the same length as our gaominder dataframe. We can add that as a new column to get what we wanted.

gapminder['lifeExp_mean'] = gapminder.groupby('continent').lifeExp.transform('mean')
country	year	pop	continent	lifeExp	gdpPercap	lifeExp_mean
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314	60.064903
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030	60.064903
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	60.064903

How To Add Group Level Mean as New Column with Pandas map() function ?

Another way to add group-level mean as a new column is to use Pandas map() function and dictionary.

We first apply groupby and get group-level summary statistics, either mean or median. Then convert the summary dataframe to a dictionary. In our example, we have mean lifeExp per continent as a dataframe and we convert that into a dictionary using Pandas to_dict() function.

mean_dict = lifeExp_per_continent.to_dict()
mean_dict
{'Africa': 48.86533012820508,
 'Americas': 64.65873666666667,
 'Asia': 60.064903232323175,
 'Europe': 71.90368611111106,
 'Oceania': 74.32620833333333}

Now we can use map() function and provide the dictionary as argument to create a new column.

gapminder['mean'] = gapminder['continent'].map(mean_dict)

We can see the new column with mean lifeExp values per continent and it is the same as the previous approach.

	country	year	pop	continent	lifeExp	gdpPercap	lifeExp_mean	mean
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314	60.064903	60.064903
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030	60.064903	60.064903
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710	60.064903	60.064903

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailPandas Groupby and Computing Median Default ThumbnailHow To Add a New Column Using a Dictionary in Pandas Data Frame ? Default ThumbnailPandas Groupby and Compute Mean Default ThumbnailHow to Implement Pandas Groupby operation with NumPy?

Filed Under: Group-level Mean as New Column Tagged With: Python Tips

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version