• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / add random NaNs in Pandas / How To Randomly Add NaN to Pandas Dataframe?

How To Randomly Add NaN to Pandas Dataframe?

May 12, 2019 by cmdlinetips

In this post we will see an example of how to introduce missing value, i.e. NaNs randomly in a data frame uusisng Pandas.

Sometimes while testing a method, you might want to create a Pandas dataframe with NaNs randomly distributed. Here wee show how to do it.

How To Add Random NaNs in Pandas?

Let us load the packages we need

import numpy as np
import pandas as pd
import seaborn as sns

Let us use gaominder data in wide form to introduce NaNs randomly. We will use the wide form gapminder data.

data_url = "https://goo.gl/ioc2Td"
gapminder = pd.read_csv(data_url)
print(gapminder.iloc[0:5,0:4])
  continent       country  gdpPercap_1952  gdpPercap_1957
0    Africa       Algeria     2449.008185     3013.976023
1    Africa        Angola     3520.610273     3827.940465
2    Africa         Benin     1062.752200      959.601080
3    Africa      Botswana      851.241141      918.232535
4    Africa  Burkina Faso      543.255241      617.183465

Let us drop two columns from the dataframe using Pandas “drop” function. Now the resulting dataframe contains data.

gapminder = gapminder.drop(['continent','country'], axis=1)

We can see that there are 5112 data points in total.

gapminder.count().sum()

Create a logical Pandas dataframe with fixed percent of TRUE/FALSEs

Let us create a boolean NumPy array of the same size as our Pandas dataframe. We create the boolean 2d-array such that it contains about 50% of its elements are True and False. We will using Numpy’s random module to create random numbers and use to create boolean array

nan_mat = np.random.random(gapminder.shape)<0.5
nan_mat

We can take a peek at the boolean array.

array([[False,  True,  True, ...,  True,  True,  True],
       [False,  True,  True, ...,  True, False,  True],
       [ True, False,  True, ..., False,  True, False],
       ...,
       [ True, False,  True, ..., False,  True,  True],
       [ True,  True, False, ..., False,  True, False],
       [ True,  True,  True, ..., False, False,  True]])

We can get the total number of True elements, i.e. total number NaNs we will be adding to the dataframe using NumPy’s sum function.

nan_mat.sum()

Introduce random NAs using Pandas mask() function

Let us use Pandas mask() function to replace values with NAs. Pandas mask()

Where cond is False, keep the original value. Where True, replace with corresponding value from other.

Pandas’ function mask checks each element in the dataframe. and it will use the element in the dataframe if the condition is False and change it to NaN if it is True. We provide the condition using logical/boolean dataframe we created.

gapminder_NaN = gapminder.mask(nan_mat)

Yes, we could have also directly applied the condition to create boolean matrix inside mask function.

gapminder_NaN = gapminder.mask(np.random.random(gapminder.shape)<0.5)

We can verify that the dataframe has NaNs introduced randomly as we intended.

gapminder_NaN.iloc[0:3,0:5]
gdpPercap_1952	gdpPercap_1957	gdpPercap_1962	gdpPercap_1967	gdpPercap_1972
0	2449.008185	NaN	NaN	3246.991771	4182.663766
1	3520.610273	NaN	NaN	NaN	NaN
2	NaN	959.60108	NaN	1035.831411	NaN

We can count the total number of nulls or NaNs and see that it is approximately about 50%.

gapminder_NaN.isnull().sum(axis = 0).sum()

In summary, we have added NaNs randomly to a Pandas dataframe. We used NumPy’s random module to create a random boolean arrays with approximately specific number of NaNs and Pandas mask fucntion to add NaNs in the dataframe.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailDifference between Pandas where() function and mask() function Default ThumbnailHow To Randomly Select Rows in Pandas? Pandas Filter/Select Rows Based on Column ValuesHow To Filter Pandas Dataframe By Values of Column? Missing Values Count with isna()How To Get Number of Missing Values in Each Column in Pandas

Filed Under: add random NaNs in Pandas, Pandas mask Tagged With: add random NaNs in Pandas, Pandas mask

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version