How to Get Unique Values from a Column in Pandas Data Frame?

In this tutorial, we will learn how to get unique values of a column in a Pandas dataframe using two approaches. We will first use Pandas unique() function to get unique values of a column and then use Pandas drop_duplicates() function to get unique values of a column.

Pandas unique() function To Get Unique values of a Column in Pandas?

Pandas unique() function can work with dataframe and Series object and give unique values of a variable. In this example we will learn how to use Pandas unique() function on a column of a dataframe.

For example, let us say we want to find the unique values of column ‘continent’ in the data frame.This would result in all continents in the dataframe. We can use pandas’ function unique() on the column of interest. And it will return NumPy array with unique values of the column.

Pandas unique values examples using gapminder data set

Let us get started with some examples from a real world data set. We will use gapminder dataset to get unique values of character/categorical variable.

# import pandas as pd
import pandas as pd
# software carpentry url for gapminder data
gapminder_csv_url ='http://bit.ly/2cLzoxH'
# load the data with pd.read_csv
gapminder = pd.read_csv(gapminder_csv_url)

Let us check the basic information of the data frame. We can see that the variables ‘continent’ and ‘country’ are objects/strings and we can find the number of unique values for them.

# check the data frame info
print(gapminder.info())

class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
country      1704 non-null object
year         1704 non-null int64
pop          1704 non-null float64
continent    1704 non-null object
lifeExp      1704 non-null float64
gdpPercap    1704 non-null float64
dtypes: float64(3), int64(1), object(2)
memory usage: 79.9+ KB

Applying Pandas unique() to a column of a dataframe

>gapminder['continent'].unique()
array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)

Note that the unique values of a column from Pandas unique() function is not sorted and it will be returned in order of appearance in the dataframe.

We can also use Pandas chaining method and use it on the Pandas Series corresponding to the column and get unique values.

>gapminder.continent.unique()
array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)

Unique values of a columns as a list

If we want the the unique values of the column in pandas data frame as a list, we can easily apply the function tolist() by chaining it to the previous command.

>gapminder['continent'].unique().tolist()
['Asia', 'Europe', 'Africa', 'Americas', 'Oceania']

If we try the unique function on the ‘country’ column from the dataframe, the result will be a big numpy array.

>gapminder['country'].unique()

Instead, we can simply count the number of unique values in the country column and find that there are 142 countries in the data set.

>len(gapminder['country'].unique().tolist())
142

How To Get Unique Values of a Column with drop_duplicates()

Another way, that is a bit unintuitive , to get unique values of column is to use Pandas drop_duplicates() function in Pandas. Pandas’ drop_duplicates() function on a variable/column removes all duplicated values and returns a Pandas series.

For example, to get unique values of continent variable, we will Pandas’ drop_duplicates() function as follows.

# unique values with drop_duplicates
gapminder.continent.drop_duplicates()

0         Asia
12      Europe
24      Africa
48    Americas
60     Oceania
Name: continent, dtype: object

Want to learn more Pandas tips? Check out our new Byte Sized Pandas 101 tutorials.