• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / R Tips / How to Replace NAs with column mean or row means with tidyverse

How to Replace NAs with column mean or row means with tidyverse

January 15, 2022 by cmdlinetips

Just a quick rstat post on a simple imputation approach here for the future self. SVD/PCA is one of the first things I do for analyzing any new high dimensional data. Often such data are messy and have some missing values. Depending on the situation, I often resort to removing the rows with missing data or replace NAs with mean values before doing SVD/PCA.

Here is the #rstats tip to quickly replace all NAs in a dataframe with column means or row means using tidyverse functions.

Replace NAs with Column/Row Mean
Replace NAs with Column/Row Mean

Let us get started by loading the whole of tidyverse.

library(tidyverse)

Creating Data with NAs

First, let us create a toy dataset with some missing values using sample() functin in R. We store the data as a tibble.

set.seed(49)
data_df <- tibble(a = sample(c(1:3,NA), 5, replace = TRUE),
       b = sample(c(1:3,NA), 5, replace = TRUE),
       c = sample(c(1:3,NA), 5, replace = TRUE))

Our toy dataframe with missing values (NAs) has 3 columns and 5 rows.

data_df

## # A tibble: 5 × 3
##       a     b     c
##   <int> <int> <int>
## 1     3     3    NA
## 2     1    NA     3
## 3     3     2     1
## 4     1     3    NA
## 5     3     3     1

Let us also add add a unique row ID to the dataframe using row_number() function.

data_df <- data_df %>%
  mutate(row_id = row_number()) 
data_df 

## # A tibble: 5 × 4
##       a     b     c row_id
##   <int> <int> <int>  <int>
## 1     3     3    NA      1
## 2     1    NA     3      2
## 3     3     2     1      3
## 4     1     3    NA      4
## 5     3     3     1      5

Replace NAs with column means in tidyverse

A simple way to replace NAs with column means is to use group_by() on the column names and compute means for each column and use the mean column value to replace where the element has NA.

To start with, let us convert the dataframe/tibble into tidy dataframe by keeping the row ID in one column, column names in another column and the actual values in the third column.

data_df %>%
  pivot_longer(-row_id, names_to="column_id", values_to="val") 

## # A tibble: 15 × 3
##    row_id column_id   val
##     <int> <chr>     <int>
##  1      1 a             3
##  2      1 b             3
##  3      1 c            NA
##  4      2 a             1
##  5      2 b            NA
##  6      2 c             3
##  7      3 a             3
##  8      3 b             2
##  9      3 c             1
## 10      4 a             1
## 11      4 b             3
## 12      4 c            NA
## 13      5 a             3
## 14      5 b             3
## 15      5 c             1

Now we can group by the column names, and update the value column with column mean if there is NA. We use mutate function to update the current values and use ifelse() to check if the element is NA or not. If it is NA, we replace with that group’s mean value. Note that we use na.rm=TRUE while using mean function.

data_df %>%
  pivot_longer(-row_id,
               names_to="column_id",
               values_to="val") %>%
  group_by(column_id) %>%
  mutate(val= ifelse(is.na(val), mean(val, na.rm = T), val)) 

Now we have all NAs replaced by column means in long tidy form.

## # A tibble: 15 × 3
## # Groups:   column_id [3]
##    row_id column_id   val
##     <int> <chr>     <dbl>
##  1      1 a          3   
##  2      1 b          3   
##  3      1 c          1.67
##  4      2 a          1   
##  5      2 b          2.75
##  6      2 c          3   
##  7      3 a          3   
##  8      3 b          2   
##  9      3 c          1   
## 10      4 a          1   
## 11      4 b          3   
## 12      4 c          1.67
## 13      5 a          3   
## 14      5 b          3   
## 15      5 c          1

We can use pivot_wider() function from tidyr to convert the long tidy form to wider form data as we started with originally.

data_df %>%
  mutate(row_id = row_number()) %>%
  pivot_longer(-row_id, 
               names_to="column_id",
               values_to="val") %>%
  group_by(column_id) %>%
  mutate(val= ifelse(is.na(val), mean(val,na.rm = T), val)) %>%
  pivot_wider(names_from="column_id", values_from = "val")
## # A tibble: 5 × 4
##   row_id     a     b     c
##    <int> <dbl> <dbl> <dbl>
## 1      1     3  3     1.67
## 2      2     1  2.75  3   
## 3      3     3  2     1   
## 4      4     1  3     1.67
## 5      5     3  3     1

Replace NAs with row means

We can use similar approach to replace missing values by row means. The big difference is we will be grouping by row ID instead of column ID. Everything else is the same.

data_df %>%
  pivot_longer(-row_id, 
                names_to="column_id",
                values_to="val") %>%
  group_by(row_id)  %>%
  mutate(val= ifelse(is.na(val), mean(val,na.rm = T), val)) %>% 
  pivot_wider(names_from="column_id",
              values_from = "val") 

Here is our result after replacing NAs with row means.

## # A tibble: 5 × 4
## # Groups:   row_id [5]
##   row_id     a     b     c
##   <fct>  <dbl> <dbl> <dbl>
## 1 1          3     3     3
## 2 2          1     2     3
## 3 3          3     2     1
## 4 4          1     3     2
## 5 5          3     3     1

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

rowwise operationsRow-wise operations in R: compute row means in tidyverse How to Compute Column Means in R?How To Compute Column Means in R with tidyverse Default Thumbnail7 Tips to Add Columns to a DataFrame with add_column() in tidyverse Pandas Replace Multiple Column Values with DictionaryHow to Replace Multiple Column Values with Dictionary in Python

Filed Under: R, R Tips, tidyverse 101 Tagged With: replace NA with column mean

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version