• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / Pandas Groupby and Computing Median

Pandas Groupby and Computing Median

June 14, 2020 by cmdlinetips

One of the common operations of data analysis is group the data by a variable and compute some sumamry statistics on the sub-group of data. In this post, we will see an example of how to use groupby() function in Pandas to group a dataframe into multiple smaller dataframes and compute median on another variable in each smaller dataframe.

Pandas have multiple summary functions to apply on groupby() object and we will use median() function to compute median

First, let us load Pandas and NumPy libraries.

import pandas as pd
import numpy as np

We will use gapminder data to perform groupby and compute median. Let us load gapminder data directly from web, cmdlinetips.com‘s github page.

p2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/gapminder-FiveYearData.csv"
gapminder=pd.read_csv(p2data)

gapminder.head()

country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710
3	Afghanistan	1967	11537966.0	Asia	34.020	836.197138
4	Afghanistan	1972	13079460.0	Asia	36.088	739.981106

Let us perform groupby() operation on continent variable in gapminder data. Under the good, Pandas splits the dataframe into multiple smaller dataframes for each value of continent values and gives us groupby object.

gapminder.groupby(["continent"])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1c1d9f6f50>

From the Pandas’ groupby object we can extract one or more variables in the dataframe. This is another groupby object.

gapminder.groupby(["continent"])['lifeExp']
<pandas.core.groupby.generic.SeriesGroupBy object at 0x1c1d9f6110>

Now we can apply summary function like median on the variable to compute summary stat for each value of groupby variable.

In this example, we compute median value for each continent. And it gives the answer we want as Pandas Series.

gapminder.groupby(["continent"])['lifeExp'].median()
continent
Africa      48.865330
Americas    64.658737
Asia        60.064903
Europe      71.903686
Oceania     74.326208
Name: lifeExp, dtype: float64

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Implement Pandas Groupby operation with NumPy? Default ThumbnailPandas Groupby and Sum Default ThumbnailPandas Groupby and Compute Mean Fun with Pandas Groupby, Agg,Fun with Pandas Groupby, Aggregate, Multi-Index and Unstack

Filed Under: Pandas 101 Tagged With: Pandas, Python

Reader Interactions

Trackbacks

  1. Pandas Groupby and Sum - Python and R Tips says:
    July 12, 2020 at 7:59 am

    […] some summary statistics each subgroup of data. For example, one might be interested in mean, median values, or total sum per group. In this post, we will see an example of how to use groupby() […]

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version