dplyr count(): Explore Variables with count in dplyr

In this tutorial, we will see examples of using count() function from dplyr to explore variables in a dataframe. One of the first things to do after loading a data is to perform simple exploratory data analysis. One typically starts data exploration with a quick look at the data with functions like glimpse() or head().

As a next step, you might want to know more about a specific variable. For example, if you have categorical variable, you might want to count the number of observations for each value of the categorical variable. dplyr’s count() function enables to count one or more variables easily.

Let us first load tidyverse suite of R packages.

library("tidyverse")

We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com‘s github page.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)

We can see that some of the variables in the penguins dataframe, like species, island, and sex, are character variables.

## Parsed with column specification:
## cols(
##   species = col_character(),
##   island = col_character(),
##   bill_length_mm = col_double(),
##   bill_depth_mm = col_double(),
##   flipper_length_mm = col_double(),
##   body_mass_g = col_double(),
##   sex = col_character()
## )

Count Observations by Single Group

Let us explore variables the categorical/character variables with count() in dplyr.

For example, if we want to know the number of observations for each of the penguin species, we can use count() function as follows.

count(penguins, species)

And we get a new tibble with species as one column and the number of observations as another column.

## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Chinstrap    68
## 3 Gentoo      124

Count Observations by Single Group using pipe operator

There is another way to use tidyverse functions that can be extremely useful later. In the above example, we provided the name of dataframe and the variable in the dataframe as input to count() function to compte the number of penguins in each species.

Instead we can use the pipe operator %>% to connect the data frame to count() function. For example, we can write the name dataframe first, use the pipe operator %>% next and then write count() function with the variable name inside. The way to understand this is that we provide the content of dataframe through the pipe to the count function.

penguins %>%
  count(species)

And we get exactly the same results as before.

## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Chinstrap    68
## 3 Gentoo      124

This framework can be extremely useful if we are performing multiple operations one after the other. We can simply feed the results from one to another using the %>% operator.

Count Observations by Single Group and Sort the Results

We can sort the results in descending order with sort=TRUE argument.

penguins %>%
  count(species, sort=TRUE)
## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Gentoo      124
## 3 Chinstrap    68

Count Observations by Two Groups

count() function in dplyr can be used to count observations by multiple groups. Here is an example, where we count observations by two variables.

penguins %>%
  count(species,island)

We get number of observations for each combinations of the two variables. In this example, we get the number of penguins for penguin species in each island.


## # A tibble: 5 x 3
##   species   island        n
##   <chr>     <chr>     <int>
## 1 Adelie    Biscoe       44
## 2 Adelie    Dream        56
## 3 Adelie    Torgersen    52
## 4 Chinstrap Dream        68
## 5 Gentoo    Biscoe      124

Count Observations by Two Groups and Sort the Results

With sort=TRUE argument, we can also sort the results from count() with two groups.

penguins %>%
  count(species,island, sort=TRUE)
## # A tibble: 5 x 3
##   species   island        n
##   <chr>     <chr>     <int>
## 1 Gentoo    Biscoe      124
## 2 Chinstrap Dream        68
## 3 Adelie    Dream        56
## 4 Adelie    Torgersen    52
## 5 Adelie    Biscoe       44

Check out more on count() function at dplyr’s website.