• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / ggplot2 / How To Highlight Select Data Points with ggplot2 in R?

How To Highlight Select Data Points with ggplot2 in R?

May 9, 2019 by cmdlinetips

The power of ggplot2 lies in making it easy to make great plots and in easily tweaking it to the one wants. Sometimes, one might want to highlight certain data points in a plot in different color. Here we will see an example of highlighting specific data points in a plot.

Let us first load the packages needed, we will mainly be using dplyr and ggplot2 here.

library(dplyr)
library(ggplot2)
theme_set(theme_bw(base_size = 16))

Let us use the gapminder data from Carpentries website to make plots and highlight data points.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = read_csv(data_url)
## Parsed with column specification:
## cols(
##   country = col_character(),
##   year = col_integer(),
##   pop = col_double(),
##   continent = col_character(),
##   lifeExp = col_double(),
##   gdpPercap = col_double()
## )

This is how our gapminder data looks like.

head(gapminder, n=3)

## # A tibble: 3 x 6
##   country      year      pop continent lifeExp gdpPercap
##   <chr>       <int>    <dbl> <chr>       <dbl>     <dbl>
## 1 Afghanistan  1952  8425333 Asia         28.8      779.
## 2 Afghanistan  1957  9240934 Asia         30.3      821.
## 3 Afghanistan  1962 10267083 Asia         32.0      853.

Let us use the data to make a simple scatter plot using ggplot. Let us plot lifeExp on x-axis and gdpPercap on y-axis. Since there are a lot of overlapping data points, let us set the transparency level to 0.3.

gapminder %>% 
  ggplot(aes(x=lifeExp,y=gdpPercap)) + 
  geom_point(alpha=0.3)  

A quick look at the plot suggests the gdpPercap outliers on y-axis squishes the ploints on y-axis a lot. It is natural to seek out more information on the outliers. Also, we probably need to change the y-axis to log-scale to spread out the datapoints on y-axis.

Let us highlight the outlier data points in red using ggplot2. The way to do it is, we first make the scatter plot normally as we did before. And then create a new dataframe containing only the data points we need to highlight. Here we can use filter function to create a new dataframe from gapminder data.

# filter dataframe to get data to be highligheted
highlight_df <- gapminder %>% 
             filter(gdpPercap>=59000)

We can use the new data frame containing the data points to be highlighted to add another layer of geom_point().

gapminder %>% 
  ggplot(aes(x=lifeExp,y=gdpPercap)) + 
  geom_point(alpha=0.3) +
  geom_point(data=highlight_df, 
             aes(x=lifeExp,y=gdpPercap), 
             color='red',
             size=3)

Note that we have two geom_point(), one for all the data and the other for with data only for the data to be highlighted. And in the second geom_point(), we use the new dataframe, not the original data frame. We can see that the data points above 59k for gdpPercap is highlighted in red.

Highlight selected points with ggplot2 in R

We can also highlight by a variable/column in the dataframe to learn more about the highlighted data points. Let us color the highlighted data points by country.

gapminder %>% 
  ggplot(aes(x=lifeExp,y=gdpPercap)) + 
  geom_point(alpha=0.3) +
  geom_point(data=highlight_df,
             aes(x=lifeExp,y=gdpPercap, color=country),size=3)

We can see that all the highlighted points are from the country Kuwait.

Highlight select points in R

In summary, we saw examples of using ggplot2 to highlight certain data points of interest in a scatter plot. We created a new data frame from the original dataframe to select the data points of interest and used it with geom_point() to add it as another to layer to the plot.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

How To Highlight Data Annotate with Text Python?How to Highlight Data Points with Colors and Text in Python ggplot2 change legend title with guides()How To Change Legend Title in ggplot2? Boxplot with jittered Data Points in RHow to Make Boxplot in R with ggplot2? How to annotate with ellipses in ggplot2 using ggforce?Getting started with ggforce – a ggplot2 extension package

Filed Under: ggplot2, highlight data in ggplot2, R, R Tips, tidyverse 101 Tagged With: highlight data in ggplot2, highlight select points in R

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version