In this tutorial, we will learn how to get unique values of a column in a Pandas dataframe using two approaches. We will first use Pandas unique() function to get unique values of a column and then use Pandas drop_duplicates() function to get unique values of a column.
Pandas unique() function To Get Unique values of a Column in Pandas?
Pandas unique() function can work with dataframe and Series object and give unique values of a variable. In this example we will learn how to use Pandas unique() function on a column of a dataframe.
For example, let us say we want to find the unique values of column ‘continent’ in the data frame.This would result in all continents in the dataframe. We can use pandas’ function unique() on the column of interest. And it will return NumPy array with unique values of the column.
Pandas unique values examples using gapminder data set
Let us get started with some examples from a real world data set. We will use gapminder dataset to get unique values of character/categorical variable.
# import pandas as pd import pandas as pd # software carpentry url for gapminder data gapminder_csv_url ='http://bit.ly/2cLzoxH' # load the data with pd.read_csv gapminder = pd.read_csv(gapminder_csv_url)
Let us check the basic information of the data frame. We can see that the variables ‘continent’ and ‘country’ are objects/strings and we can find the number of unique values for them.
# check the data frame info print(gapminder.info()) class 'pandas.core.frame.DataFrame'> RangeIndex: 1704 entries, 0 to 1703 Data columns (total 6 columns): country 1704 non-null object year 1704 non-null int64 pop 1704 non-null float64 continent 1704 non-null object lifeExp 1704 non-null float64 gdpPercap 1704 non-null float64 dtypes: float64(3), int64(1), object(2) memory usage: 79.9+ KB
Applying Pandas unique() to a column of a dataframe
>gapminder['continent'].unique() array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)
Note that the unique values of a column from Pandas unique() function is not sorted and it will be returned in order of appearance in the dataframe.
We can also use Pandas chaining method and use it on the Pandas Series corresponding to the column and get unique values.
>gapminder.continent.unique() array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)
Unique values of a columns as a list
If we want the the unique values of the column in pandas data frame as a list, we can easily apply the function tolist() by chaining it to the previous command.
>gapminder['continent'].unique().tolist() ['Asia', 'Europe', 'Africa', 'Americas', 'Oceania']
If we try the unique function on the ‘country’ column from the dataframe, the result will be a big numpy array.
>gapminder['country'].unique()
Instead, we can simply count the number of unique values in the country column and find that there are 142 countries in the data set.
>len(gapminder['country'].unique().tolist()) 142
How To Get Unique Values of a Column with drop_duplicates()
Another way, that is a bit unintuitive , to get unique values of column is to use Pandas drop_duplicates() function in Pandas. Pandas’ drop_duplicates() function on a variable/column removes all duplicated values and returns a Pandas series.
For example, to get unique values of continent variable, we will Pandas’ drop_duplicates() function as follows.
# unique values with drop_duplicates gapminder.continent.drop_duplicates() 0 Asia 12 Europe 24 Africa 48 Americas 60 Oceania Name: continent, dtype: object