• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyverse 101 / dplyr count(): Explore Variables with count in dplyr

dplyr count(): Explore Variables with count in dplyr

July 5, 2020 by cmdlinetips

In this tutorial, we will see examples of using count() function from dplyr to explore variables in a dataframe. One of the first things to do after loading a data is to perform simple exploratory data analysis. One typically starts data exploration with a quick look at the data with functions like glimpse() or head().

As a next step, you might want to know more about a specific variable. For example, if you have categorical variable, you might want to count the number of observations for each value of the categorical variable. dplyr’s count() function enables to count one or more variables easily.

Let us first load tidyverse suite of R packages.

library("tidyverse")

We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com‘s github page.

path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)

We can see that some of the variables in the penguins dataframe, like species, island, and sex, are character variables.

## Parsed with column specification:
## cols(
##   species = col_character(),
##   island = col_character(),
##   bill_length_mm = col_double(),
##   bill_depth_mm = col_double(),
##   flipper_length_mm = col_double(),
##   body_mass_g = col_double(),
##   sex = col_character()
## )

Count Observations by Single Group

Let us explore variables the categorical/character variables with count() in dplyr.

For example, if we want to know the number of observations for each of the penguin species, we can use count() function as follows.

count(penguins, species)

And we get a new tibble with species as one column and the number of observations as another column.

## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Chinstrap    68
## 3 Gentoo      124

Count Observations by Single Group using pipe operator

There is another way to use tidyverse functions that can be extremely useful later. In the above example, we provided the name of dataframe and the variable in the dataframe as input to count() function to compte the number of penguins in each species.

Instead we can use the pipe operator %>% to connect the data frame to count() function. For example, we can write the name dataframe first, use the pipe operator %>% next and then write count() function with the variable name inside. The way to understand this is that we provide the content of dataframe through the pipe to the count function.

penguins %>%
  count(species)

And we get exactly the same results as before.

## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Chinstrap    68
## 3 Gentoo      124

This framework can be extremely useful if we are performing multiple operations one after the other. We can simply feed the results from one to another using the %>% operator.

Count Observations by Single Group and Sort the Results

We can sort the results in descending order with sort=TRUE argument.

penguins %>%
  count(species, sort=TRUE)
## # A tibble: 3 x 2
##   species       n
##   <chr>     <int>
## 1 Adelie      152
## 2 Gentoo      124
## 3 Chinstrap    68

Count Observations by Two Groups

count() function in dplyr can be used to count observations by multiple groups. Here is an example, where we count observations by two variables.

penguins %>%
  count(species,island)

We get number of observations for each combinations of the two variables. In this example, we get the number of penguins for penguin species in each island.


## # A tibble: 5 x 3
##   species   island        n
##   <chr>     <chr>     <int>
## 1 Adelie    Biscoe       44
## 2 Adelie    Dream        56
## 3 Adelie    Torgersen    52
## 4 Chinstrap Dream        68
## 5 Gentoo    Biscoe      124

Count Observations by Two Groups and Sort the Results

With sort=TRUE argument, we can also sort the results from count() with two groups.

penguins %>%
  count(species,island, sort=TRUE)
## # A tibble: 5 x 3
##   species   island        n
##   <chr>     <chr>     <int>
## 1 Gentoo    Biscoe      124
## 2 Chinstrap Dream        68
## 3 Adelie    Dream        56
## 4 Adelie    Torgersen    52
## 5 Adelie    Biscoe       44

Check out more on count() function at dplyr’s website.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailPandas value_counts: How To Get Counts of Unique Variables in a Dataframe? Default Thumbnaildplyr arrange(): Sort/Reorder by One or More Variables Default Thumbnaildplyr groupby() and summarize(): Group By One or More Variables dplyr select(): How to Select Columns?dplyr select(): Select one or more variables from a dataframe

Filed Under: tidyverse 101 Tagged With: count dplyr

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version