In this post we will see how can we get the counts of missing values in each column of a Pandas dataframe. Dealing with missing values is one of the common tasks in doing data analysis with real data. A quick understanding on the number of missing values will help in deciding the next step of the analysis.
We will use Pandas’s isna() function to find if an element in Pandas dataframe is missing value or not and then use the results to get counts of missing values in the dataframe.
Let us first load the libraries needed.
import pandas as pd import seaborn as sns
We will use Palmer Penguins data to count the missing values in each column. The latest version of Seaborn has Palmer penguins data set and we will use that.
penguins = sns.load_dataset("penguins")
This is how Penguins data looks like and we can see some missing vales represented as NaN in the dataframe.
penguins.head() species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female 3 Adelie Torgersen NaN NaN NaN NaN NaN 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
We can use Pandas isna() function find if each element of the dataframe is missing value or not.
penguins.isna()
When applied to a dataframe, Pandas isna() function return boolean dataframe with True with the element is missing value and False when it is not a missing value.
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 0 False False False False False False False 1 False False False False False False False 2 False False False False False False False 3 False False True True True True True 4 False False False False False False False
We can use Pandas’ sum() function to get the counts of missing values per each column in the dataframe.
penguins.isna().sum()
By default, Pandas sum() adds across columns. And we get a dataframe with number of missing values for each column.
species 0 island 0 bill_length_mm 2 bill_depth_mm 2 flipper_length_mm 2 body_mass_g 2 sex 11 dtype: int64
When you have a bigger dataframe, we can quickly make a bar plot using Pandas’ plot.bar function to get the sense of missing values. We use dot operator to chain the results of isna().sum() to reset_index() to name the result column and use plot.bar to make a quick bar plot.
penguins.isna(). sum(). reset_index(name="n"). plot.bar(x='index', y='n', rot=45)