In this post, we will see an example adding results from one of aggregating functions like mean/median after group_by() on a specific column as a new column. In other words, we might have group-level summary values for a column and we might to add the summary values back to the original dataframe we computed group-level summary.
import pandas as pd
Let us work with gapminder data to show how to add grouped mean as a new column.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url) print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
Let us say we want to add mean lifeExp values per each continent as a new column to the gapminder data frame. The values of the new column will be the same for within each continent. We know that Pandas aggregating function mean() can compute mean after group_by() on continent.
lifeExp_per_continent = gapminder.groupby('continent').lifeExp.mean() lifeExp_per_continent continent Africa 48.865330 Americas 64.658737 Asia 60.064903 Europe 71.903686 Oceania 74.326208 Name: lifeExp, dtype: float64
How To Add Group Level Mean as New Column with Pandas transform() function?
Here we want to add these mean lifeExp values per continent to the gapminder dataframe. There are multiple ways to do that in Pandas. Here we will use Pandas transform() funtion to compute mean values and add it to the original dataframe.
We will first groupby() on continent and extract lifeExp values and apply transform() function to compute mean.
gapminder.groupby('continent').lifeExp.transform('mean') 0 60.064903 1 60.064903 2 60.064903 3 60.064903 4 60.064903 ... 1699 48.865330 1700 48.865330 1701 48.865330 1702 48.865330 1703 48.865330 Name: lifeExp, Length: 1704, dtype: float64
We can see that it creates a Series of the same length as our gaominder dataframe. We can add that as a new column to get what we wanted.
gapminder['lifeExp_mean'] = gapminder.groupby('continent').lifeExp.transform('mean')
country year pop continent lifeExp gdpPercap lifeExp_mean 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 60.064903 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 60.064903 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710 60.064903
How To Add Group Level Mean as New Column with Pandas map() function ?
Another way to add group-level mean as a new column is to use Pandas map() function and dictionary.
We first apply groupby and get group-level summary statistics, either mean or median. Then convert the summary dataframe to a dictionary. In our example, we have mean lifeExp per continent as a dataframe and we convert that into a dictionary using Pandas to_dict() function.
mean_dict = lifeExp_per_continent.to_dict() mean_dict {'Africa': 48.86533012820508, 'Americas': 64.65873666666667, 'Asia': 60.064903232323175, 'Europe': 71.90368611111106, 'Oceania': 74.32620833333333}
Now we can use map() function and provide the dictionary as argument to create a new column.
gapminder['mean'] = gapminder['continent'].map(mean_dict)
We can see the new column with mean lifeExp values per continent and it is the same as the previous approach.
country year pop continent lifeExp gdpPercap lifeExp_mean mean 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 60.064903 60.064903 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 60.064903 60.064903 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710 60.064903 60.064903