• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Uncategorized / 3 Different ways to add regression line in ggplot2

3 Different ways to add regression line in ggplot2

June 24, 2022 by cmdlinetips

In this post, we will learn how to add simple regression line in three different ways to a scatter plot made with ggplot2 in R. This is something I have to google almost every time, so here is the post recording the options to add linear regression line.

We will use palmer penguin data to make scatter plot and then add regression lines. The three different ways to add regression is using

  • geom_smooth() with method=”lm”
  • geom_abline() using slope and intercept from linear regression model
  • geom_line() using fitted values

Let us get started loading the packages needed and set ggplot theme to theme_bw().

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

We will use body mass and bill length columns from the penguins data to make a scatter plot.

penguins %>% head(5)

## # A tibble: 5 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year <int>

Simple scatter plot between two numerical variables look like this.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()

We do see a strong association between body mass and bill length and adding a regression line would be help understand the relationship easier.

Adding regression line using geom_smooth()

One of the easiest methods to add a regression line to a scatter plot with ggplot2 is to use geom_smooth(), by adding it as additional later to the scatter plot. To make a linear regression line, we specify the method to use to be “lm”.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_smooth(method="lm")+
  labs(title="Add Regression Line using geom_smooth()")
ggsave("add_regression_line_using_geom_smooth_lm.png")

By default geom_smooth() function adds standard error to our regression line drawn over the scatter plot.

How to Add Regression Line to scatter plot using geom_smooth() with method="lm"
Add Regression Line to geom_smooth()

We can disable the option to show standard error by adding the argument se=FALSE to geom_smooth() function.

penguins %>% 
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)+
  labs(title="Add Regression Line using geom_smooth()")
ggsave("add_regression_line_using_geom_smooth_no_se.png")
How to Add Regression Line to scatter plot using geom_smooth() without std err
Add Regression Line to scatter plot using geom_smooth() without std err SE

Adding regression line using geom_abline()

For the remaining two ways to add regression line, we need linear regression fit results. Let us fit linear regression model to our data and save the results.

lm_fit <- lm(bill_length_mm ~ body_mass_g, data=penguins)
lm_fit

## 
## Call:
## lm(formula = bill_length_mm ~ body_mass_g, data = penguins)
## 
## Coefficients:
## (Intercept)  body_mass_g  
##   26.898872     0.004051

Two key estimates from the linear regression fit is slope and intercept. We can access the estimates using coeffients argument to the fit object.

lm_fit$coefficients[1]

## (Intercept) 
##    26.89887
lm_fit$coefficients[2]

## body_mass_g 
## 0.004051417

Since we have the linear regression fit and the slope/intercept estimates, we can use the estimates to make the regression line on the scatter plot. The handy geom in ggplot2 is geom_abline() and it takes slope and intercept as argument and draws the regression line.

penguins %>%
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_abline(slope=lm_fit$coefficients[2],
              intercept=lm_fit$coefficients[1],
              color="blue",
              size=1)
ggsave("add_regression_line_using_geom_abline.png")
How to add regression line to scatterplot using geom_abline() with slope and intecept
Add regression line to scatterplot using geom_abline()

Adding regression line using geom_line()

The third approach to add regression line to a scatter plot is to use the fitted values from the linear regression fit and use geom_line() to add the line by using the fitted values on y-axis and x-axis variable. For that we need to get the fitted values from the lm fit.

One of the ways to get fitted values from a model is to use broom package’s augment() function. When we apply augment() function on the linear regression fit, we get the linear mode results including .fitted values and the original data used to do linear regression as a dataframe.

broom::augment(lm_fit) %>% 
       head()

## # A tibble: 6 × 9
##   .rownames bill_length_mm body_mass_g .fitted .resid    .hat .sigma   .cooksd
##   <chr>              <dbl>       <int>   <dbl>  <dbl>   <dbl>  <dbl>     <dbl>
## 1 1                   39.1        3750    42.1 -2.99  0.00385   4.40 0.000900 
## 2 2                   39.5        3800    42.3 -2.79  0.00366   4.40 0.000745 
## 3 3                   40.3        3250    40.1  0.234 0.00705   4.40 0.0000101
## 4 5                   36.7        3450    40.9 -4.18  0.00550   4.39 0.00251  
## 5 6                   39.3        3650    41.7 -2.39  0.00431   4.40 0.000642 
## 6 7                   38.9        3625    41.6 -2.69  0.00444   4.40 0.000837 
## # … with 1 more variable: .std.resid <dbl>

We can use the resulting dataframe with geom_line() function to make the regression line on top of the scatter plot.

penguins %>%
  ggplot(aes(body_mass_g, bill_length_mm))+
  geom_point()+
  geom_line(data = broom::augment(lm_fit),
            aes(x = body_mass_g, y = .fitted),
            color="blue",
            size=1)
ggsave("add_regression_line_using_geom_line_fitted_values.png")
How to add regression line to scatterplot using geom_line()
Add regression line to scatterplot using geom_line()

An equivalent way of adding regression line is to use fortify() function in ggplot2 package to get the fitted values. However, fortify is deprecated and it is recommended to use broom instead.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

statsmodels PythonLinear Regression Analysis with statsmodels in Python Default ThumbnailIntroduction to Linear Regression in R Default ThumbnailIntroduction to Linear Regression in Python Default ThumbnailAltair 4.0 is here: Barplots, Scatter Plots with Regression Line and Boxplots

Filed Under: R Tips, Uncategorized Tagged With: add regression line geom_abline, add regression line geom_line, add regression line geom_smooth

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version