Linear Regression with Matrix Decomposition Methods

Add regression line using coefficients in lm() results
Add regression line using coefficients in lm() results

Still remember the first time I learned we can perform linear regression using matrix decomposition techniques like QR decomposition. Totally mind blown. Clearly had no clue that was possible. This is a fun post on performing linear regression using QR decomposition, Cholesky decomposition and Singular Value Decomposition (SVD) using #rstats. No theory behind it, just typing some R code using the wonders of linear algebra for matrix decomposition and multiplication.

Let us get started by loading the packages needed. In addition to tidyverse, we will load broom for tidying the lm model results and palmerpenguin data for fitting linear regression using lm() and decomposition methods.

library(tidyverse)
library(broom)
theme_set(theme_bw(16))

Removing any rows with missing values using dplyr’s drop_na() to keep it simple.

penguins <- penguins %>%
  drop_na()

penguins %>% head() 


# A tibble: 6 × 8
  species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
  <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
1 Adelie  Torge…           39.1          18.7              181        3750 male 
2 Adelie  Torge…           39.5          17.4              186        3800 fema…
3 Adelie  Torge…           40.3          18                195        3250 fema…
4 Adelie  Torge…           36.7          19.3              193        3450 fema…
5 Adelie  Torge…           39.3          20.6              190        3650 male 
6 Adelie  Torge…           38.9          17.8              181        3625 fema…
# … with 1 more variable: year <int>

Linear Regression by lm()

We will focus on simple linear regression between two numerical variables. We will use body mass and flipper length, two variables with strong correlation between them. Here is a scatter plot between the two variables.

penguins%>% 
  ggplot(aes( x=body_mass_g, y=flipper_length_mm))+
  geom_point()  
ggsave("ggplot_scatter_pplot_lm.png")

Linear Regression with Matrix Decomposition

In R, we use lm() function to perform linear regression analysis. In this example we fit a linear regression model between flipper length and body mass.

lm_penguins <- lm(flipper_length_mm ~ body_mass_g, penguins)

And we can look at the key results using broom’s tidy() function. The estimate column is of interest for us. It basically gives us the intercept and slope estimated from our data using linear regression analysis.

lm_penguins %>% tidy()

# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept) 137.      2.00          68.6 1.13e-197
2 body_mass_g   0.0152  0.000467      32.6 3.13e-105

We just pull those estimates from the tibble.

lm_coeffs <- lm_penguins %>% 
  tidy() %>%
  pull(estimate)
lm_coeffs

[1] 137.03962089   0.01519526

And use the coefficients to make a regression line using geom_abline() function on top of the scatter plot.

penguins %>% 
  ggplot(aes(y=flipper_length_mm, x=body_mass_g))+
  geom_point()+
  geom_abline(slope=lm_coeffs[2],
              intercept = lm_coeffs[1],
              size=2,
              color="dodgerblue")+
  labs("Regression Line using co-effs from lm()")
ggsave("scatter_plot_with_regression_line_using_lm.png")
Add regression line using coefficients in lm() results

Linear Regression by QR decomposition

Using the lm() function in R is the easiest way to do linear regression analysis. Did you know that under the hood, lm() function uses the matrix factorisation or decomposition method called QR decomposition. Note the method argument to lm() function’s help page, it says method=”qr”.

lm() in R uses QR decomposition by default

Let us use QR decomposition manually and perform linear regression analysis. Basically using QR decomposition on our data we will estimate the betas or co-efficients.

What is QR decomposition

QR decomposition, also known as QR factorization, is one of the methods for decomposing a matrix A into the product of two matrices Q and R, where Q is an orthogonal matrix (meaning that its columns are orthogonal and have unit length) and R is an upper triangular matrix (meaning that all entries below the main diagonal are zero).

In R, we can use the qr() function to perform QR decomposition. This function takes a matrix as input and returns an object of class “qr” which contains the QR decomposition of the matrix. Let us prepare the data for performing linear regression using QR decomposition.

First we add a column of ones to take care of intercepts.

X <- penguins %>% 
  mutate(intercept = rep(1,n())) %>%
  select(intercept, body_mass_g)  %>%
  as.matrix() 
X %>% head()

     intercept body_mass_g
[1,]         1        3750
[2,]         1        3800
[3,]         1        3250
[4,]         1        3450
[5,]         1        3650
[6,]         1        3625

y <- penguins %>% 
       pull(flipper_length_mm)

With the data in the format we need we can call qr() function on X to do the decomposition.

# perform QR decomposition
qr_res <- qr(X)
str(qr_res)

List of 4
 $ qr   : num [1:333, 1:2] -18.2483 0.0548 0.0548 0.0548 0.0548 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:2] "intercept" "body_mass_g"
 $ rank : int 2
 $ qraux: num [1:2] 1.05 1.03
 $ pivot: int [1:2] 1 2
 - attr(*, "class")= chr "qr"

We can use the method solve.qr solving system linear equation using the qr objects. Here we apply solve.qr() function on the results frrom QR decomposition by qr(). And the result we get is the estimates of the coefficients.

betahat_qr <- solve.qr(qr_res, y)
betahat_qr

   intercept  body_mass_g 
137.03962089   0.01519526 

Comparing the coefficients estimated by lm() with that of QR decomposition above, we can see that it is exactly the same.

Linear Regression with Cholesky Decomposition

Cholesky decomposition is another popular decomposition method. Cholesky decomposition takes in a positive-definite matrix and decomposes it into the product of a lower triangular matrix and its conjugate transpose. It is named after the French mathematician André-Louis Cholesky, who was killed in action during WW I.

The Cholesky decomposition of a matrix A is a decomposition of the form:

A = L * L^H

where L is a lower triangular matrix and L^H is the conjugate transpose of L.

The matrix L from Cholesky decomposition has good properties and is useful in efficiently solving number of problems including system of linear equations.

As the input to Cholesky needs to be a symmetric matrix we can make it symmetric

t(X)%*%X

            intercept body_mass_g
intercept         333     1400950
body_mass_g   1400950  6109136250

and then apply chol() function in R to do the Cholesky decomposition.

chol(t(X)%*%X)

            intercept body_mass_g
intercept    18.24829    76771.59
body_mass_g   0.00000    14671.73

We can use the inverse of the result from Cholesky decomposition

chol2inv(chol(t(X)%*%X))

              [,1]          [,2]
[1,]  0.0852261590 -1.954410e-05
[2,] -0.0000195441  4.645552e-09

to get the beta estimates of linear regression.

chol2inv(chol(t(X)%*%X)) %*% t(X) %*% y

             [,1]
[1,] 137.03962089
[2,]   0.01519526

Here we save the beta estimates from Cholesky decomposition.

betahat_chol <-  chol2inv(chol(t(X)%*%X)) %*% t(X) %*% y
betahat_chol

             [,1]
[1,] 137.03962089
[2,]   0.01519526

And it is exactly what we got from lm() method and QR decomposition.

Linear Regression with SVD

Singular Value Decomposition or SVD is a matrix decomposition method where a matrix M is decomposed into three matrices: U, S, and V. These matrices have the following properties:

U is a matrix of the left singular vectors of M.
S is a diagonal matrix of the singular values of M.
V is a matrix of the right singular vectors of M.

Let us chug along to get the estimates of co-efficients from linear regression using SVD. We use the same X matrix we created earlier as input to SVD.

svd_res <- svd(X)

Here we take quick look at the three matrices we get after applying SVD.

svd_res$v

             [,1]          [,2]
[1,] 0.0002293205  0.9999999737
[2,] 0.9999999737 -0.0002293205
svd_res$d

[1] 78160.965777     3.425418
svd_res$u
             [,1]          [,2]
  [1,] 0.04797791  0.0408850086
  [2,] 0.04861762  0.0375376727
  [3,] 0.04158086  0.0743583670
  ...
  ...
  [332,] 0.05245585  0.0174536577
  [333,] 0.04829777  0.0392113406

We can get the estimates of Betas in linear regression using eigen values d, the singular vectors u, and the y vector

(1/svd_res$d * t(svd_res$u)) %*% y

             [,1]
[1,]   0.04662125
[2,] 137.03961380


And we can easily see again SVD also gives us the same estimates of beta as the other methods we saw before.

betahat_svd <- svd_res$v %*% (1/svd_res$d * t(svd_res$u)) %*% y
betahat_svd

             [,1]
[1,] 137.03962089
[2,]   0.01519526

In summary, we have learned how to use multiple decomposition methods, QR, Cholesky and SVD to perform simple linear regression in R and compared the results with what we get from lm().