• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / How To Remove Rows with Missing values using dplyr

How To Remove Rows with Missing values using dplyr

October 31, 2020 by cmdlinetips

Missing data is a common problem while doing data analysis. Sometimes you might to remove the missing data. One approach is to remove rows containing missing values. In this post we will see examples of removing rows containing missing values using dplyr in R.
How To Remove Rows With Missing Values with dplyr's drop_na()?
How To Remove Rows With Missing Values?

We will use dplyr’s function drop_na() to remove rows that contains missing data. Let us load tidyverse first.
library("tidyverse")

As in other tidyverse 101 examples, we will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com’ github page.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)

Let us move sex column which has a number of missing values to the front using dplyr’s relocate() function.

# move sex column to first
penguins <- penguins %>% 
            relocate(sex)

We can see that our data frame has 344 rows in total and a number of rows have missing values. Note the fourth row has missing values for most the columns and it is represented as “NA”.

penguins


## # A tibble: 344 x 7
##   sex   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##   <chr> <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
## 1 male  Adelie  Torge…           39.1          18.7              181        3750
## 2 fema… Adelie  Torge…           39.5          17.4              186        3800
## 3 fema… Adelie  Torge…           40.3          18                195        3250
## 4 <NA>  Adelie  Torge…           NA            NA                 NA          NA
## 5 fema… Adelie  Torge…           36.7          19.3              193        3450
## 6 male  Adelie  Torge…           39.3          20.6              190        3650

Let us use dplyr’s drop_na() function to remove rows that contain at least one missing value.

penguins %>% 
  drop_na()

Now our resulting data frame contains 333 rows after removing rows with missing values. Note that the fourth row in our original dataframe had missing values and now it is removed.

## # A tibble: 333 x 7
##    species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>            <dbl>       <dbl>
##  1 Adelie  Torge…           39.1          18.7              181        3750
##  2 Adelie  Torge…           39.5          17.4              186        3800
##  3 Adelie  Torge…           40.3          18                195        3250
##  4 Adelie  Torge…           36.7          19.3              193        3450
##  5 Adelie  Torge…           39.3          20.6              190        3650
##  6 Adelie  Torge…           38.9          17.8              181        3625

How to Remove Rows Based on Missing Values in a Column?

Sometimes you might want to removes rows based on missing values in one or more columns in the dataframe. To remove rows based on missing values in a column.

penguins %>% 
  drop_na(bill_length_mm)

We have removed the rows based on missing values in bill_length_mm column. In comparison to the above example, the resulting dataframe contains missing values from other columns. In this example, we can see missing values Note that

## # A tibble: 342 x 7
##    sex   species island bill_length_mm bill_depth_mm flipper_length_…
##    <chr> <chr>   <chr>           <dbl>         <dbl>            <dbl>
##  1 male  Adelie  Torge…           39.1          18.7              181
##  2 fema… Adelie  Torge…           39.5          17.4              186
##  3 fema… Adelie  Torge…           40.3          18                195
##  4 fema… Adelie  Torge…           36.7          19.3              193
##  5 male  Adelie  Torge…           39.3          20.6              190
##  6 fema… Adelie  Torge…           38.9          17.8              181
##  7 male  Adelie  Torge…           39.2          19.6              195
##  8 <NA>  Adelie  Torge…           34.1          18.1              193
##  9 <NA>  Adelie  Torge…           42            20.2              190
## 10 <NA>  Adelie  Torge…           37.8          17.1              186
## # … with 332 more rows, and 1 more variable: body_mass_g <dbl>

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow To Move A Column to the Front with dplyr Default Thumbnaildplyr filter(): Filter/Select Rows based on conditions Default Thumbnaildplyr groupby() and summarize(): Group By One or More Variables Default Thumbnaildplyr arrange(): Sort/Reorder by One or More Variables

Filed Under: R, tidyverse 101 Tagged With: R

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version