How To Get Number of Missing Values in Each Column in Pandas

Missing Values Count with isna()
Missing Values Count with isna() in Pandas

In this post we will see how can we get the counts of missing values in each column of a Pandas dataframe. Dealing with missing values is one of the common tasks in doing data analysis with real data. A quick understanding on the number of missing values will help in deciding the next step of the analysis.

We will use Pandas’s isna() function to find if an element in Pandas dataframe is missing value or not and then use the results to get counts of missing values in the dataframe.

Let us first load the libraries needed.

import pandas as pd
import seaborn as sns

We will use Palmer Penguins data to count the missing values in each column. The latest version of Seaborn has Palmer penguins data set and we will use that.

penguins = sns.load_dataset("penguins")

This is how Penguins data looks like and we can see some missing vales represented as NaN in the dataframe.

penguins.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

We can use Pandas isna() function find if each element of the dataframe is missing value or not.

penguins.isna()

When applied to a dataframe, Pandas isna() function return boolean dataframe with True with the element is missing value and False when it is not a missing value.

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False
3	False	False	True	True	True	True	True
4	False	False	False	False	False	False	False

We can use Pandas’ sum() function to get the counts of missing values per each column in the dataframe.

penguins.isna().sum()

By default, Pandas sum() adds across columns. And we get a dataframe with number of missing values for each column.

species               0
island                0
bill_length_mm        2
bill_depth_mm         2
flipper_length_mm     2
body_mass_g           2
sex                  11
dtype: int64

When you have a bigger dataframe, we can quickly make a bar plot using Pandas’ plot.bar function to get the sense of missing values. We use dot operator to chain the results of isna().sum() to reset_index() to name the result column and use plot.bar to make a quick bar plot.

  
penguins.isna().
         sum().
         reset_index(name="n").
         plot.bar(x='index', y='n', rot=45)
Missing Values Count with isna() in Pandas