• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / How to Select Top N Rows with the Largest Values in a Column(s) in Pandas?

How to Select Top N Rows with the Largest Values in a Column(s) in Pandas?

March 18, 2019 by cmdlinetips

Sometimes, while doing data wrangling, we might need to get a quick look at the top rows with the largest or smallest values in a column. This kind of quick glance at the data reveal interesting information in a dataframe. Pandas dataframe easily enables one to have a quick look at the top rows either with largest or smallest values in a column.

Pandas library has function called nlargest makes it really easy to look at the top or bottom rows. Let us first load Pandas library.

import pandas as pd

Let us use gapminder data. Let us load the data from Carpentry’s github page and look at the data corresponding to the year 2007 alone.

# Carpentry url containing data
data_url = 'http://bit.ly/2cLzoxH'
# Load the data from Carpentry url
gapminder = pd.read_csv(data_url)
# filter the data to contain just year=2007
gapminder_2007 = gapminder[gapminder.year==2007]

Pandas nlargest function can take the number of rows we need as argument and the column name for which we are looking for largest values. Pandas nlargest function

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

Let us look at the top 3 rows of the dataframe with the largest population values using the column variable “pop”.

gapminder_2007.nlargest(3,'pop')

We just get two rows and see that China, India, and US are the top 3 countries with largest population.

country	year	pop	continent	lifeExp	gdpPercap
299	China	2007	1.318683e+09	Asia	72.961	4959.114854
707	India	2007	1.110396e+09	Asia	64.698	2452.210407
1619	United States	2007	3.011399e+08	Americas	78.242	42951.653090

The function nlargest also has an argument keep that allows one to deal with duplicate values. keep can take {‘first’, ‘last’, ‘all’}, where

first : prioritize the first occurrence(s)
last : prioritize the last occurrence(s)
all : does not drop any duplicates

How to Get Top N Rows Based on Largest Values in Multiple Columns in Pandas?

In the above example we saw getting top rows ordered by values of a single column. Pandas nlargest function can take more than one variable to order the top rows.

We can give a list of variables as input to nlargest and get first n rows ordered by the list of columns in descending order.

# top n rows ordered by multiple columns
gapminder_2007.nlargest(3,['lifeExp','gdpPercap'])

Here we get top 3 rows with largest values in column “lifeExp” and then “gdpPercap”.

	country	year	pop	continent	lifeExp	gdpPercap
803	Japan	2007	127467972.0	Asia	82.603	31656.06806
671	Hong Kong China	2007	6980412.0	Asia	82.208	39724.97867
695	Iceland	2007	301931.0	Europe	81.757	36180.78919

How to Get Top N Rows Based on Smallest Values of a Column in Pandas?

Just as you guessed, Pandas has the function nsmallest to select top rows of smallest values in one or more column, in descending order.

Let use see an example of using nsmallest on gapminder data. Here is how to get top 3 countries with smallest lifeExp.

gapminder_2007.nsmallest(3,'liefExp')
	country	year	pop	continent	lifeExp	gdpPercap
1463	Swaziland	2007	1133066.0	Africa	39.613	4513.480643
1043	Mozambique	2007	19951656.0	Africa	42.082	823.685621
1691	Zambia	2007	11746035.0	Africa	42.384	1271.211593

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Get Top N Rows with in Each Group in Pandas? Pandas Filter/Select Rows Based on Column ValuesHow To Filter Pandas Dataframe By Values of Column? Default ThumbnailHow To Randomly Select Rows in Pandas? Default Thumbnail6 ways to Sort Pandas Dataframe: Pandas Tutorial

Filed Under: Pandas DataFrame, Pandas nlargest, Python Tagged With: Pandas Dataframe, Pandas nlargest

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version