• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / NumPy / Groupby with NumPy / How to Implement Pandas Groupby operation with NumPy?

How to Implement Pandas Groupby operation with NumPy?

May 8, 2019 by cmdlinetips

Pandas’ GroupBy function is the bread and butter for many data munging activities. Groupby enables one of the most widely used paradigm “Split-Apply-Combine”, for doing data analysis. Sometimes you will be working NumPy arrays and may still want to perform groupby operations on the array.

Just recently wrote a blogpost inspired by Jake’s post on groupby from scratch using sparse matrix. A few weeks ago got into a situation to implement groupby function with NumPy.

Here is one way to implement Pandas’ groupby operation using NumPy.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Let us use Pandas to load gapminder data as a dataframe

# link to gapminder data from Carpentries
data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head()

Let us say we want to compute mean life expectancy for each continent. Here, let us use Pandas’ groupby function to compute mean life expectancy for each continent. We can use chaining rule in Python to group the bigger dataframe into smaller continent specific dataframe and compute mean for each continent.

gapminder[['continent','lifeExp']].groupby('continent').mean()

Here we have the mean life expectancy computed using Pandas groupby function.

	lifeExp
continent	
Africa	48.865330
Americas 64.658737
Asia	60.064903
Europe	71.903686
Oceania	74.326208

Now let us use NumPy to perform groupby operation. First let us extract the columns of interest from the dataframe in to NumPy arrays.

# numPy array for lifeExp
life_exp = gapminder[['lifeExp']].values
# NumPy array for continent
conts= gapminder[['continent']].values

Let us also get the groups, in this case five continents as an array.

>all_continents = gapminder['continent'].unique()
>all_continents
array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)

We can use List Comprehensions to go through each continent and compute mean life expectancy using NumPy’s slicing and mean function

[(i, life_exp[conts==i].mean()) for i in all_continents]

Voila, we have our results, that is the same as obtained by Pandas groupby function.

[('Asia', 60.064903232323225),
 ('Europe', 71.9036861111111),
 ('Africa', 48.86533012820513),
 ('Americas', 64.65873666666667),
 ('Oceania', 74.32620833333333)]

In summary, we implemented Pandas’ group by function from scratch using Python’s NumPy. In this example we grouped a single variable and computed mean for just one another variable. Tune in for a bit more advanced groupby operations with NumPy.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailPandas Groupby and Computing Median Default ThumbnailPandas Groupby and Compute Mean Fun with Pandas Groupby, Agg,Pandas groupby: 13 Functions To Aggregate Default ThumbnailPandas Groupby and Sum

Filed Under: Groupby with NumPy, NumPy, Pandas 101, Pandas Groupby using NumPy, Python Tips Tagged With: Groupby with NumPy, Pandas Groupby using NumPy, pandas groupby()

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version