• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / filter missing data in Pandas / How to Filter a Pandas Dataframe Based on Null Values of a Column?

How to Filter a Pandas Dataframe Based on Null Values of a Column?

March 5, 2018 by cmdlinetips

How to deal with Missing Values in Pandas dataframe?
Missing Values in Pandas
Real datasets are messy and often they contain missing data. Python’s pandas can easily handle missing data or NA values in a dataframe. One of the common tasks of dealing with missing data is to filter out the part with missing values in a few ways.

One might want to filter the pandas dataframe based on a column such that we would like to keep the rows of data frame where the specific column don’t have data and not NA.

Let us consider a toy example to illustrate this. Let us first load the pandas library and create a pandas dataframe from multiple lists.

# import pandas
import pandas as pd

Our toy dataframe contains three columns and three rows. The column Last_Name has one missing value, denoted as “None”. The column Age has one missing value as well.

# create a pandas dataframe from multiple lists
>df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'First_Name': ['John', 'Mike', 'Bill'],
                   'Age': [35, 45, None]})

Since the dataframe is small, we can print it and see the data and missing values. Note that pandas deal with missing data in two ways. The missing data in Last_Name is represented as None and the missing data in Age is represented as NaN, Not a Number. This is because pandas handles the missing values in numeric as NaN and other objects as None. Don’t worry, pandas deals with both of them as missing values.

>print(df)
	Age	First_Name	Last_Name
0	35.0	John	Smith
1	45.0	Mike	None
2	NaN	Bill	Brown

How to filter out rows based on missing values in a column?

To filter out the rows of pandas dataframe that has missing values in Last_Namecolumn,
we will first find the index of the column with non null values with pandas notnull() function. It will return a boolean series, where True for not null and False for null values or missing values.

>df.Last_Name.notnull()
0     True
1    False
2     True
Name: Last_Name, dtype: bool

We can use this boolean series to filter the dataframe so that it keeps the rows with no missing data for the column ‘Last_Name’.

>df[df.Last_Name.notnull()]
	Age	First_Name	Last_Name
0	35.0	John	Smith
2	NaN	Bill	Brown

How to filter out all rows with missing values?

If you want to filter out all rows containing one or more missing values, pandas’ dropna() function is useful for that

# drop rows with missing value
>df.dropna()
        Age	First_Name	Last_Name
0	35.0	John	Smith

Note that dropna() drops out all rows containing missing data. In this case there is only one row with no missing values. By default, dropna() drop rows with missing values. If you want to drop the columns with missing values, we can specify axis =1

#drop column with missing value
>df.dropna(axis=1)
	First_Name
0	John
1	Mike
2	Bill

In this example, the only column with missing data is the First_Name column. So we end up with a dataframe with a single column after using axis=1 with dropna().

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Pandas Filter/Select Rows Based on Column ValuesHow To Filter Pandas Dataframe By Values of Column? Default ThumbnailHow to Select Top N Rows with the Largest Values in a Column(s) in Pandas? Default ThumbnailHow to Filter Rows Based on Column Values with query function in Pandas? Default ThumbnailHow to Drop Rows Based on a Column Value in Pandas Dataframe?

Filed Under: filter missing data in Pandas, Pandas DataFrame, Python Tips Tagged With: Pandas Dataframe, pandas dropna(), pandas filter rows with missing data, Python Tips

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version