• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / Pandas explode() / Pandas 0.25.0 is Here. What is New? Named aggregation, explode() and sparse dataframe

Pandas 0.25.0 is Here. What is New? Named aggregation, explode() and sparse dataframe

July 26, 2019 by cmdlinetips

Pandas 0.25.0
Pandas 0.25.0

If you are like me, you might have missed that the fantastic Pandas team has released the new version Pandas 0.25.0.

As one would expect, there are quite a few new things in Pandas 0.25.0. A couple of new enhancements are around pandas’ groupby aggregation. Here are a few new things that look really interesting.

To get started with pandas version 0.25.0, install

python3 -m pip install --upgrade pandas

And load the new version of pandas.

import pandas as pd
# make sure the version is pandas 0.25.0
pd.__version__

Named Aggregation with groupby

One of the interesting updates is a new groupby behavior, known as “named aggregation”. This helps naming the output columns when applying multiple aggregation functions to specific columns.

animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                        'height': [9.1, 6.0, 9.5, 34.0],
                        'weight': [7.9, 7.5, 9.9, 198.0]})

For example, if we want to compute both minimum and maximum values of height for each aniumal kind and keep them as resulting column, we can use pd.NamedAgg function as follows.

animals.groupby("kind").agg(
    min_height=pd.NamedAgg(column='height', aggfunc='min'),
    max_height=pd.NamedAgg(column='height', aggfunc='max'))

And we would get

     min_height  max_height  
kind                                        
cat          9.1         9.5           
dog          6.0        34.0     

In addition to explicitly using pd.NameddAgg() function, we can also providethe desired columns names as the **kwargs to .agg. However, the values of **kwargs should be tuples where the first element is the column selection, and the second element is the aggregation function to apply.

We will get the same result as above using the following code

animals.groupby("kind").agg(
   min_height=('height', 'min'),
   max_height=('height', 'max'))

Explode function to split list-like values to separate rows

Another interesting function in Pandas 0.25.0 is explode() method available for both Series and DataFrame objects.

For example, you might have a dataframe with a column, whose values contain multiple items separated by a delimiter. Basically, the values of the column are like a list. Sometimes you might want the elements of list to be a separate row.

This new explode() function is sort of like the new separate_rows() function in tidyverse.

Here is an example of dataframe with comma separated string in a column. And how explode can be useful in splitting them in to a separate row.

df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1},
                   {'var1': 'd,e,f', 'var2': 2}])

    var1  var2
0  a,b,c     1
1  d,e,f     2

And we can split the comma separated column values as rows.

df.assign(var1=df.var1.str.split(',')).explode('var1')

  var1  var2
0    a     1
0    b     1
0    c     1
1    d     2
1    e     2
1    f     2

SparseDataFrame is Deprecated

Another interesting change is Pandas’ SparseDataFrame subclass (and SparseSeries) is deprecated. Instead, the DataFrame function can directly take sparse values as input.

Instead of using SparseDataFrame to create a sparse dataframe like

# Old Way
pd.SparseDataFrame({"A": [0, 1]})

in the new version of pandas, one would use

# New Way
pd.DataFrame({"A": pd.SparseArray([0, 1])})

Similarly, there is a new way for dealing with sparse matrix in Pandas.

Instead of the old approach

# Old way
from scipy import sparse
mat = sparse.eye(3)
df = pd.SparseDataFrame(mat, columns=['A', 'B', 'C'])

the new version of Pandas offers

# New way
df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow To Slice Rows and Columns of Sparse Matrix in Python? Default ThumbnailHow To Drop Multiple Columns in Pandas Dataframe? Pandas explode(): Convert list-like column elements to separate rows Default ThumbnailHow To Write Pandas GroupBy Function using Sparse Matrix?

Filed Under: Pandas explode(), Pandas Sparse Dataframe, Uncategorized Tagged With: Named aggregation in Pandas, Pandas 0.25.0, Pandas explode() function, Pandas Sparse Dataframe

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version