How To Discretize/Bin a Variable in Python with NumPy and Pandas?

Sometimes you may have a quantitative variable in your data set and you might want to discretize it or bin it or categorize it based on the values of the variable. For example, let us say you have measurements of height and want to discretize it such that it is 0 or 1 depending on… Continue reading How To Discretize/Bin a Variable in Python with NumPy and Pandas?

How to Add Group-Level Summary Statistic as a New Column in Pandas?

In this post, we will see an example adding results from one of aggregating functions like mean/median after group_by() on a specific column as a new column. In other words, we might have group-level summary values for a column and we might to add the summary values back to the original dataframe we computed group-level… Continue reading How to Add Group-Level Summary Statistic as a New Column in Pandas?

How to Drop Rows Based on a Column Value in Pandas Dataframe?

In this post we will see examples of how to drop rows of a dataframe based on values of one or more columns in Pandas. Pandas drop function makes it really easy to drop rows of a dataframe using index number or index names. We can use Pandas drop function to drop rows and columns… Continue reading How to Drop Rows Based on a Column Value in Pandas Dataframe?

Pandas groupby: 13 Functions To Aggregate

Fun with Pandas Groupby, Agg,

Pandas groupby function enables us to do “Split-Apply-Combine” data analysis paradigm easily. Basically, with Pandas groupby, we can split Pandas data frame into smaller groups using one or more variables. Pandas has a number of aggregating functions that reduce the dimension of the grouped object. In this post will examples of using 13 aggregating function… Continue reading Pandas groupby: 13 Functions To Aggregate

How To Drop Duplicate Rows in Pandas?

Pandas drop_duplicates(): remove duplicated rows from a dataframe

In this post, we will learn how to drop duplicate rows in a Pandas dataframe. We will use Pandas drop_duplicates() function to can delete duplicated rows with multiple examples. One of the common data cleaning tasks is to make a decision on how to deal with duplicate rows in a data frame. If the whole… Continue reading How To Drop Duplicate Rows in Pandas?