One of most common use of Pandas’ groupby function is to compute some summary statistics on one or more variables in the dataframe. In this post we will see an example of how to compute mean on all numerical variables and a select variable after groupby operation. Let us first load Pandas package. We will […]
dplyr 1.0.0 is here: Quick fun with Summarise() and rowwise()
New version of dplyr, version 1.0.0 is here. It was originally supposed to be available in early May and finally out on CRAN now. One of the cool things with the new dplyr version 1.0.0 is its cool new logo. Jokes apart, dplyr 1.0.0 is loaded with new features and Hadley Wickham has started teasing […]
Getting Started with Pandas Groupby
Pandas groupby function is one of the most useful functions enabling a bunch of data munging activities. A simple use case of groupby function is that we can group a bigger dataframe by a single variable in the dataframe into multiple smaller dataframes. Typically, after grouping by a variable, we perform some computations on each […]
How To Add Identifier Column When Concatenating Pandas data frames?
Pandas concat() function is great for concating two data frames or appending one dataframe to another with same columns. Sometimes, you might want to keep an identifier for each appended dataframe. In this post, we will see an example of how to concat two dataframes with an identifier. Let us import Pandas and numpy to […]
Fun with Pandas Groupby, Aggregate, Multi-Index and Unstack
This post is titled as “fun with Pandas Groupby, aggregate, and unstack”, but it addresses some of the pain points I face when doing mundane data-munging activities. Every time I do this I start from scratch and solved them in different ways. The purpose of this post is to record at least a couple of […]


