3 Different ways to add regression line in ggplot2

How to add regression line to scatterplot using geom_line()
Add regression line to scatterplot using geom_line()

In this post, we will learn how to add simple regression line in three different ways to a scatter plot made with ggplot2 in R. This is something I have to google almost every time, so here is the post recording the options to add linear regression line.

We will use palmer penguin data to make scatter plot and then add regression lines. The three different ways to add regression is using

  • geom_smooth() with method=”lm”
  • geom_abline() using slope and intercept from linear regression model
  • geom_line() using fitted values

Let us get started loading the packages needed and set ggplot theme to theme_bw().

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

We will use body mass and bill length columns from the penguins data to make a scatter plot.

penguins %>% head(5)

## # A tibble: 5 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

Simple scatter plot between two numerical variables look like this.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()

We do see a strong association between body mass and bill length and adding a regression line would be help understand the relationship easier.

Adding regression line using geom_smooth()

One of the easiest methods to add a regression line to a scatter plot with ggplot2 is to use geom_smooth(), by adding it as additional later to the scatter plot. To make a linear regression line, we specify the method to use to be “lm”.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_smooth(method="lm")+
  labs(title="Add Regression Line using geom_smooth()")
ggsave("add_regression_line_using_geom_smooth_lm.png")

By default geom_smooth() function adds standard error to our regression line drawn over the scatter plot.

Add Regression Line to geom_smooth()

We can disable the option to show standard error by adding the argument se=FALSE to geom_smooth() function.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)+
  labs(title="Add Regression Line using geom_smooth()")
ggsave("add_regression_line_using_geom_smooth_no_se.png")
Add Regression Line to scatter plot using geom_smooth() without std err SE

Adding regression line using geom_abline()

For the remaining two ways to add regression line, we need linear regression fit results. Let us fit linear regression model to our data and save the results.

lm_fit <- lm(bill_length_mm ~ body_mass_g, data=penguins)
lm_fit

## 
## Call:
## lm(formula = bill_length_mm ~ body_mass_g, data = penguins)
## 
## Coefficients:
## (Intercept)  body_mass_g  
##   26.898872     0.004051

Two key estimates from the linear regression fit is slope and intercept. We can access the estimates using coeffients argument to the fit object.

lm_fit$coefficients[1]

## (Intercept) 
##    26.89887
lm_fit$coefficients[2]

## body_mass_g 
## 0.004051417

Since we have the linear regression fit and the slope/intercept estimates, we can use the estimates to make the regression line on the scatter plot. The handy geom in ggplot2 is geom_abline() and it takes slope and intercept as argument and draws the regression line.

penguins %>%
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_abline(slope=lm_fit$coefficients[2],
              intercept=lm_fit$coefficients[1],
              color="blue",
              size=1)
ggsave("add_regression_line_using_geom_abline.png")
Add regression line to scatterplot using geom_abline()

Adding regression line using geom_line()

The third approach to add regression line to a scatter plot is to use the fitted values from the linear regression fit and use geom_line() to add the line by using the fitted values on y-axis and x-axis variable. For that we need to get the fitted values from the lm fit.

One of the ways to get fitted values from a model is to use broom package’s augment() function. When we apply augment() function on the linear regression fit, we get the linear mode results including .fitted values and the original data used to do linear regression as a dataframe.

broom::augment(lm_fit) %>% 
       head()

## # A tibble: 6 × 9
##   .rownames bill_length_mm body_mass_g .fitted .resid    .hat .sigma   .cooksd
##   <chr>              <dbl>       <int>   <dbl>  <dbl>   <dbl>  <dbl>     <dbl>
## 1 1                   39.1        3750    42.1 -2.99  0.00385   4.40 0.000900 
## 2 2                   39.5        3800    42.3 -2.79  0.00366   4.40 0.000745 
## 3 3                   40.3        3250    40.1  0.234 0.00705   4.40 0.0000101
## 4 5                   36.7        3450    40.9 -4.18  0.00550   4.39 0.00251  
## 5 6                   39.3        3650    41.7 -2.39  0.00431   4.40 0.000642 
## 6 7                   38.9        3625    41.6 -2.69  0.00444   4.40 0.000837 
## # … with 1 more variable: .std.resid <dbl>

We can use the resulting dataframe with geom_line() function to make the regression line on top of the scatter plot.

penguins %>%
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_line(data = broom::augment(lm_fit),
            aes(x = body_mass_g, y = .fitted),
            color="blue",
            size=1)
ggsave("add_regression_line_using_geom_line_fitted_values.png")
Add regression line to scatterplot using geom_line()

An equivalent way of adding regression line is to use fortify() function in ggplot2 package to get the fitted values. However, fortify is deprecated and it is recommended to use broom instead.