A common step in data analysis is to group the data by a variable and compute some summary statistics each subgroup of data. For example, one might be interested in mean, median values, or total sum per group. In this post, we will see an example of how to use groupby() function in Pandas to […]
Pandas
Getting Started with Pandas Groupby
Pandas groupby function is one of the most useful functions enabling a bunch of data munging activities. A simple use case of groupby function is that we can group a bigger dataframe by a single variable in the dataframe into multiple smaller dataframes. Typically, after grouping by a variable, we perform some computations on each […]
Fun with Pandas Groupby, Aggregate, Multi-Index and Unstack
This post is titled as “fun with Pandas Groupby, aggregate, and unstack”, but it addresses some of the pain points I face when doing mundane data-munging activities. Every time I do this I start from scratch and solved them in different ways. The purpose of this post is to record at least a couple of […]
Pandas 1.0.0 is Here: Top New Features of Pandas You Should Know
Pandas 1.0.0 is ready for prime time now. Pandas project has come a long way since the early release of Pandas version 0.4 in 2011. It had contributions from 2 developers including Wes Kinney then, now Pandas has over 300 contributors. The latest version of Pandas can be installed from standard package managers like Anaconda, […]
11 Tips to Make Plots with Pandas
Python Pandas library is well known for its amazing data munging capabilities. However, a little underused feature of Pandas is its plotting capabilities. Yes, one can make better visualizations with Matplotlib or Seaborn or Altair. However, Pandas plotting capabilities can be extremely handy when you are in exploratory data analysis mode and want to quickly […]