• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / Getting Started with Pandas Groupby

Getting Started with Pandas Groupby

May 27, 2020 by cmdlinetips

Pandas Groupby
Pandas Groupby Examples

Pandas groupby function is one of the most useful functions enabling a bunch of data munging activities. A simple use case of groupby function is that we can group a bigger dataframe by a single variable in the dataframe into multiple smaller dataframes. Typically, after grouping by a variable, we perform some computations on each of the smaller dataframe.

In this post we will see examples of how to use Pandas groupby function. We will groupby a single variable in the dataframe, examine the resulting grouped dataframe, extract other variables from grouped dataframe, and perform simple summary computations like mean and median for each grouped dataframe.

Let us load Pandas to learn more about groupby() function.

In the simples cases, we can

# import pandas
>import pandas as pd
# import numpy
>import numpy as np

We will use the gapminder data to play with groupby function(). Here we directly load the data from github page with Pandas’ read_csv() function..

p2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/gapminder-FiveYearData.csv"
gapminder=pd.read_csv(p2data)
gapminder.head()

Our data contains lifeEx, population and gdpPercap over years for world countries. gapminder data also has information about the continent each country belongs to.

	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710
3	Afghanistan	1967	11537966.0	Asia	34.020	836.197138
4	Afghanistan	1972	13079460.0	Asia	36.088	739.981106

Let us use groupby function to groupby “continent” variable in gaominder data. We provide the variable that want to groupby as a list to groupby().

gapminder.groupby(["continent"])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1a199f5690>

Pandas groupby() function groups the gapminder dataframe into multiple groups, where each group correspond to each continent in the data. In the grouped object, each continent is a smaller dataframe.

Getting Groups from Pandas Groupby Object

To check the groups in the grouped object, we can use the method “groups” as shown below. Each group is a dictionary with the group variable as key and the rest of the data corresponding to the group as value.

gapminder.groupby(["continent"]).groups
{'Africa': Int64Index([  24,   25,   26,   27,   28,   29,   30,   31,   32,   33,
             ...
             1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703],
            dtype='int64', length=624),
 'Americas': Int64Index([  48,   49,   50,   51,   52,   53,   54,   55,   56,   57,
             ...
             1634, 1635, 1636, 1637, 1638, 1639, 1640, 1641, 1642, 1643],
            dtype='int64', length=300),
 'Asia': Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
             ...
             1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679],
            dtype='int64', length=396),
 'Europe': Int64Index([  12,   13,   14,   15,   16,   17,   18,   19,   20,   21,
             ...
             1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, 1607],
            dtype='int64', length=360),
 'Oceania': Int64Index([  60,   61,   62,   63,   64,   65,   66,   67,   68,   69,   70,
               71, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101,
             1102, 1103],
            dtype='int64')}

Getting a Specific Group as Dataframe from Pandas Groupby Object

We can also access smaller dataframe corresponding to a value of grouped object using get_group() function. For example, we used groupby() on continent variable and Pandas groupby() has created smaller dataframes for each continent. We can access the dataframe corresponding to a specific continent using get_group() function with the continent value as argument. Here we extract the dataframe corresponding to Africa continent with get_group() function.

gapminder.groupby(["continent"]).get_group('Africa').head()
country	year	pop	continent	lifeExp	gdpPercap
24	Algeria	1952	9279525.0	Africa	43.077	2449.008185
25	Algeria	1957	10270856.0	Africa	45.685	3013.976023
26	Algeria	1962	11000948.0	Africa	48.303	2550.816880
27	Algeria	1967	12760499.0	Africa	51.407	3246.991771
28	Algeria	1972	14760787.0	Africa	54.518	4182.663766

Let us subset a specific variable from each of smaller dataframe from grouped object. For example, in the example below we extract lifeExp values for each continent from the grouped object. This slicing functionality is extremely useful in down stream analysis.

gapminder.groupby(["continent"])['lifeExp']
<pandas.core.groupby.generic.SeriesGroupBy object at 0x1a199f53d0>

Subsetting for a column in the grouped object gives us the SeriesGroupBy object that can be used for additional analysis.

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Implement Pandas Groupby operation with NumPy? Default ThumbnailPandas Groupby and Compute Mean Default ThumbnailPandas Groupby and Computing Median Default ThumbnailPandas Groupby and Sum

Filed Under: Pandas 101, Pandas Groupby Tagged With: Pandas 101

Reader Interactions

Trackbacks

  1. Pandas Groupby and Compute Mean - Python and R Tips says:
    June 3, 2020 at 12:45 am

    […] learned earlier how to use Pandas groupby function on a variable and get multiple smaller dataframes or groups. To split the dataframe by a variable […]

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version