How To Subset Pandas Dataframe Based on Values of a Column?

Often, you want to subset a pandas dataframe based on the values of a specific column. Essentially, we would like to filter rows based on the values of a variable’s value so that we keep all the columns, but keep only certain rows. Here is how to filter rows in pandas dataframe. Let us first […]

Introduction to Split-Apply-Combine with Pandas

In a classic paper published at 2011, Hadley Wickham asked What do we do when we analyze data? What are common actions and what are common mistakes? And then went ahead to spell it out one of the most common strategies, Split-Apply-Combine, that is used in common data analysis. Intuitively, while solving a big problem, […]

How To Randomly Select Rows in Pandas?: Pandas Tutorial

Pandas’ sample function lets you randomly sample data from pandas data frame. Here are three ways of using Pandas’ sample to randomly select rows. Let us first load the data. How to get a random subset of data To randomly select rows from a pandas dataframe, we can use sample function from pandas. For example, […]

How to Get Frequency Counts of a Column in Pandas Dataframe: Pandas Tutorial

Often while working with pandas dataframe you might have a column with categorical variables, string/characters, and you want to find the frequency counts of each unique elements present in the column. Pandas’ value_counts() easily let you get the frequency counts. Let us get started with an example from a real world data set. Load gapminder […]

How to Load a Massive File as small chunks in Pandas?

The longer you work in data science, the higher the chance that you might have to work with a really big file with thousands or millions of lines. Trying to load all the data at once in memory will not work as you will end up using all of your RAM and crash your computer. […]

How to Create Pandas Dataframe from Multiple Lists? Pandas Tutorial

NumPy is fantastic for numerical data. One can really do powerful operations with numerical data easily and much faster. However, if your data is of mixed type, like some columns are strings while the others are numeric, using data frame with Pandas is the best option. How to Create Pandas Dataframe from lists? Let us […]