• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / How To Randomly Select Rows in Pandas?

How To Randomly Select Rows in Pandas?

February 17, 2018 by cmdlinetips

Creaating unbiased training and testing data sets are key for all Machine Learning tasks. Pandas’ sample function lets you randomly sample data from Pandas data frame and help with creating unbiased sampled datasets. It is a great way to get downsampled data frame and work with it.

In this post, we will learn three ways of using Pandas’ sample to randomly select/sample/resample rows. Let us first load the data.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head())

How to get a random subset of data

To randomly select rows from a pandas dataframe, we can use sample function from Pandas. For example, to randomly select n=3 rows, we use sample with the argument n.

>random_subset = gapminder.sample(n=3)
>print(random_subset.head())
        country  year         pop continent  lifeExp     gdpPercap
578       Ghana  1962   7355248.0    Africa   46.452   1190.041118
410     Denmark  1962   4646899.0    Europe   72.350  13583.313510
100  Bangladesh  1972  70759295.0      Asia   45.252    630.233627

Every time, we run “sample” we will get randomly selected 3 rows from the Pandas dataframe.

How to sample rows with replacement in Pandas?

By default, pandas’ sample randomly selects rows without replacement. Sampling with replacement is very useful for statistical techniques like bootstrapping. If we want to randomly sample rows with replacement, we can set the argument “replace” to True.

For example, to randomly select n=3 rows with replacement from the gapminder data

>sample_with_replacement = gapminder.sample(n=3,replace=True)
>print(sample_with_replacement)
           country  year         pop continent  lifeExp    gdpPercap
1416         Spain  1952  28549870.0    Europe   64.940  3834.034742
201   Burkina Faso  1997  10352843.0    Africa   50.324   946.294962
1187        Panama  2007   3242173.0  Americas   75.537  9809.185636

Here we have not sampled enough rows, so we did not see the same row twice.

How to randomly select a percentage of rows in Pandas dataframe?

Often, you may want to sample a percentage of data rather than a fixed number of rows. Pandas’ sample has argument “frac” that lets you specify a fraction (percentage) of rows that you want to randomly select from pandas.

>fraction_of_rows = gapminder.sample(frac=0.003)
>print(fraction_of_rows)
          country  year         pop continent  lifeExp     gdpPercap
903         Libya  1967   1759224.0    Africa   50.227  18772.751690
1221  Philippines  1997  75012988.0      Asia   68.564   2536.534925
1565      Tunisia  1977   6005061.0    Africa   59.837   3120.876811
1003     Mongolia  1987   2015133.0      Asia   60.222   2338.008304
1157         Oman  1977   1004533.0      Asia   57.367  11848.343920

We can use replace=True option with frac option to get a percentage of rows with replacement. Note that we can not combine frac option and n option.

Another useful argument to sample is random_state. We can reproduce the same random samples by setting random number seed. For example, by specifying ‘random_state=99’ as an argument to sample, we can get the same random sample every time and help us reproduce the results.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Randomly Select Groups in R with dplyr? Default ThumbnailHow to Select Top N Rows with the Largest Values in a Column(s) in Pandas? Default ThumbnailHow To Randomly Add NaN to Pandas Dataframe? Default ThumbnailHow to Drop Rows Based on a Column Value in Pandas Dataframe?

Filed Under: Pandas DataFrame, Random Sampling with Pandas, random subset from pandas Tagged With: Pandas Dataframe, Python Tips, random sample from Pandas

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version