• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / Pandas Groupby and Compute Mean

Pandas Groupby and Compute Mean

June 3, 2020 by cmdlinetips

One of most common use of Pandas’ groupby function is to compute some summary statistics on one or more variables in the dataframe. In this post we will see an example of how to compute mean on all numerical variables and a select variable after groupby operation.

Let us first load Pandas package.

import pandas as pd

We will use gapminder data set and we will load it directly from github page.

p2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/gapminder-FiveYearData.csv"
gapminder=pd.read_csv(p2data)
gapminder.head()
country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710
3	Afghanistan	1967	11537966.0	Asia	34.020	836.197138
4	Afghanistan	1972	13079460.0	Asia	36.088	739.981106

We learned earlier how to use Pandas groupby function on a variable and get multiple smaller dataframes or groups.
To split the dataframe by a variable into multiple smaller dataframes, we use groupby on a categorical variable in the dataframe. In this example, we groupby “continent” variable in the gapminder dataset.

gapminder.groupby(["continent"])

This gives us a Pandas grouped object, which contains a smaller dataframe for each continent. To compute mean values of all the numerical variables in the dataframe, we simply chain mean function to the Pandas groupby object as shown below.

Pandas Groupby and Sum on Multiple Variables

gapminder.groupby(["continent"]).mean()

This computes mean values for year, population, lifeExp, and gdpPercap for each continent in the gapminder dataset. Note that the result does not contain the country variable as we have computed mean for all countries in each continent.

	year	pop	lifeExp	gdpPercap
continent				
Africa	1979.5	9.916003e+06	48.865330	2193.754578
Americas	1979.5	2.450479e+07	64.658737	7136.110356
Asia	1979.5	7.703872e+07	60.064903	7902.150428
Europe	1979.5	1.716976e+07	71.903686	14469.475533
Oceania	1979.5	8.874672e+06	74.326208	18621.609223

Pandas Groupby and Sum on Single Variable

Sometimes, you don’t want to compute mean values of all numerical variables, but only on select numerical variable. In the example below, we will see how to groupby and perform mean value of one numerical variable.

We can select a single variable from groupby object using the variable name. We get a Pandas groupby Series object

gapminder.groupby(["continent"])['lifeExp']
<pandas.core.groupby.generic.SeriesGroupBy object at 0x1a1e685190>

And as before, we can chain mean() function to get mean lifeExp for each continent.

gapminder.groupby(["continent"])['lifeExp'].mean()

Note that when we get mean value for a single variable, we get Series object in return.

continent
Africa      48.865330
Americas    64.658737
Asia        60.064903
Europe      71.903686
Oceania     74.326208
Name: lifeExp, dtype: float64

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Implement Pandas Groupby operation with NumPy? Default ThumbnailGetting Started with Pandas Groupby Default ThumbnailPandas Groupby and Computing Median Default ThumbnailPandas Groupby and Sum

Filed Under: Pandas 101 Tagged With: Pandas 101, Python

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version