Pandas value_counts: How To Get Counts of Unique Variables in a Dataframe?

As part of exploring a new data, often you might want to count unique values of one or more columns in a dataframe. Pandas value_counts() can get counts of unique values of columns in a Pandas dataframe. Starting from Pandas version 1.1.0, we can use value_counts() on a Pandas Series and dataframe as well.

In this tutorial, we will see examples of using Pandas value_counts on a single variable in a dataframe (i.e. Series value_counts()) first and then see how to use value_counts on a dataframe, i.e. with multiple variables.

Let us load Pandas and check the version of Pandas. With Pandas version 1.1.0 and above we can use value_counts() on Pandas dataframe and Series.

import pandas as pd
import numpy as np
pd.__version__
1.1.0

We will be using Penguins data set to count values for one or more variables. Let us load the data directly from github page.

data_url = "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
df = pd.read_csv(data_url)
df.head()
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	female

For our examples illustrating value_counts(), we will mainly focus first on categorical variables and get their counts/frequency.

Pandas value_counts() on a single column

With Pandas value_counts() function we can compute the frequency of a variable from dataframe as shown below. In the example below, we are interested in “island” column and ask what are the counts of each unique island in the dataset. We can see that there are three different islands in the data and also count/frequency of each of them.

df.island.value_counts()
Biscoe       168
Dream        124
Torgersen     52
Name: island, dtype: int64

By default, the resulting Series object from Pandas value_counts()

will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Here is another example of using value_counts() on another categorical variable and their counts.

df.species.value_counts()

Adelie       152
Gentoo       124
Chinstrap     68
Name: species, dtype: int64

Pandas’ value_counts() to get proportion

By using normalize=True argument to Pandas value_counts() function, we can get the proportion of each value of the variable instead of the counts.

df.species.value_counts(normalize = True)

We can see that the resulting Series has relative frequencies of the unique values.


Adelie       0.441860
Gentoo       0.360465
Chinstrap    0.197674
Name: species, dtype: float64

How to convert value_counts() output into a data frame

Pandas’ value_counts() returns a Series object. Here is a way to convert the output Series from value_counts() into a Pandas Data Frame. We can use reset_index() function and a get data frame.

df.species.value_counts().reset_index()

index	species
0	Adelie	152
1	Gentoo	124
2	Chinstrap	68

Pandas reset_index() function uses the variable name for the counts. Often you would like to rename column names to more suitable names. Here we assign new column names using columns method on the dataframe.

df_species_counts = df.species.value_counts().reset_index()
df_species_counts.columns = ["island", "n"]

Now the results from value_counts() is a dataframe with suitable column names.

df_species_counts

island	n
0	Biscoe	168
1	Dream	124
2	Torgersen	52

Pandas value_counts() on multiple columns (or on a dataframe)

Sometimes you might want to tabulate counts of multiple variables. With Pandas version 1.1.0 and above we can use Pandas’ value_coiunts() function to get counts for multiple variable. For example, if we want to know the counts of each island and species combination, we can use value_counts as follows.

df[["island", "species"]].value_counts()

Pandas value_counts() on dataframe gives the result as multi-index Series.

island     species  
Biscoe     Gentoo       124
Dream      Chinstrap     68
           Adelie        56
Torgersen  Adelie        52
Biscoe     Adelie        44
dtype: int64

We can also use Pandas to_markdown() fucntion to nicely tabulate the results from value_counts()

print(df[["island", "species"]].value_counts().to_markdown())

|                         |   0 |
|:------------------------|----:|
| ('Biscoe', 'Gentoo')    | 124 |
| ('Dream', 'Chinstrap')  |  68 |
| ('Dream', 'Adelie')     |  56 |
| ('Torgersen', 'Adelie') |  52 |
| ('Biscoe', 'Adelie')    |  44 |

If you want the results from value_counts() as a simple data frame, not multi-indexed one, we can use reset_index() function to get a simple dataframe.

df[["island", "species"]].value_counts(sort=False).reset_index()
island	species	0
0	Biscoe	Adelie	44
1	Biscoe	Gentoo	124
2	Dream	Adelie	56
3	Dream	Chinstrap	68
4	Torgersen	Adelie	52

Another useful feature of value_counts() is that we can get normalized values of counts as proportions, the relative frequency by dividing each value by the sum of all values., using normalize=True argument.

df[["island", "species"]].value_counts(normalize=True)
island     species  
Biscoe     Gentoo       0.360465
Dream      Chinstrap    0.197674
           Adelie       0.162791
Torgersen  Adelie       0.151163
Biscoe     Adelie       0.127907
dtype: float64

If we want the result as a simple dataframe, we can use reset_index() function to convert the multi-indexed dataframe to a simple one.

df[["island", "species"]].value_counts(normalize=True).reset_index()
island	species	0
0	Biscoe	Gentoo	0.360465
1	Dream	Chinstrap	0.197674
2	Dream	Adelie	0.162791
3	Torgersen	Adelie	0.151163
4	Biscoe	Adelie	0.127907

Pandas value_counts() with groupby()

If you are using pandas version below 1.1.0 and stil want to compute counts of multiple variables, the solution is to use Pandas groupby function. We basically select the variables of interest from the data frame and use groupby on the variables and compute size. In the example below we have computed the counts for each combination of unique values of two variables.

df[["island", "species"]].groupby(["island", "species"]).size()
island     species  
Biscoe     Adelie        44
           Gentoo       124
Dream      Adelie        56
           Chinstrap     68
Torgersen  Adelie        52
dtype: int64

You can see that we get the same results, but with different order. As both value_counts() and groupby() functions have different sort options.