One of the nice things about twitter, when you follow awesome people, is that you will come across tweets that will just blow your mind. Last week is just one such week with some fantastic and funniest tweetorials.
One of the tweetorials was from Prof. Daniela Witten for @WomenInStat. And it starts like this and beautifully explains why Singular Value Decomposition – SVD in short, is the one matrix decomposition that rules over all the others.
So yesterday I asked you all what you wanted to hear about from me this week, and one answer stood out from all the others: the SINGULAR VALUE DECOMPOSITION (SVD).https://t.co/9Nyqp2Kr9k
1/
— Women in Statistics and Data Science (@WomenInStat) July 21, 2020
Being always in the #teamSVD camp, here is a blog post that tries to unpack some of the tweets with real data example using the awesome Penguins data.
Before we jump in to SVD, let us load tidyverse, the suit of R packages.
library("tidyverse") theme_set(theme_bw(16))
We will use the Palmer Penguins data collated by Allison Horst directly from cmdlinetips.com‘s github page.
p2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins_df <- read_csv(p2data) penguins_df %>% head() ## # A tibble: 6 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## 6 Adelie Torge… 39.3 20.6 190 3650 male
Let us separate the numerical data part of the penguins data from the columns with penguin information.
penguins_meta <- penguins_df %>% drop_na() %>% select_if(is.character) penguins_meta %>% head() ## # A tibble: 6 x 3 ## species island sex ## <chr> <chr> <chr> ## 1 Adelie Torgersen male ## 2 Adelie Torgersen female ## 3 Adelie Torgersen female ## 4 Adelie Torgersen female ## 5 Adelie Torgersen male ## 6 Adelie Torgersen female
We store the numerical variables describing penguins in a new variable and use it to do SVD.
penguins <- penguins_df %>% drop_na() %>% select_if(is.numeric) penguins %>% head() ## # A tibble: 6 x 4 ## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <dbl> <dbl> <dbl> <dbl> ## 1 39.1 18.7 181 3750 ## 2 39.5 17.4 186 3800 ## 3 40.3 18 195 3250 ## 4 36.7 19.3 193 3450 ## 5 39.3 20.6 190 3650 ## 6 38.9 17.8 181 3625
What is Singular Value Decomposition (SVD)?
First of all, what does the SVD do? You give me an $n \times p$ matrix $X$, and I’ll give you back 3 matrices $U$, $D$, $V$. Without loss of generality let’s assume that $n \geq p$. Then $U$ is an $n \times p$ matrix, and $D$ and $V$ have dimension $p \times p$.
7/
— Women in Statistics and Data Science (@WomenInStat) July 21, 2020
We can use svd() function in R to do SVD. We provide the data matrix as input to svd() function.
# perform SVD using svd() penguins_svd <- svd(penguins)
What happens when we apply svd() to a data matrix?
$U$, $D$, $V$ “decompose” the matrix $X$ because $X=UDV^T$. Simple as that. But, what makes this decomposition special (and unique) is the particular set of properties of $U$, $D$, and $V$.
— Women in Statistics and Data Science (@WomenInStat) July 21, 2020
In R, we get a list of three objects corresponding to matrix decomposition, singular values d, and singular vectors u and v from svd() function.
str(penguins_svd) ## List of 3 ## $ d: num [1:4] 78250.5 499.9 74 41.8 ## $ u: num [1:333, 1:4] -0.048 -0.0486 -0.0416 -0.0442 -0.0467 ... ## $ v: num [1:4, 1:4] -0.01022 -0.00389 -0.04657 -0.99886 0.1985 ...
Let us take a look at the singular values and vectors and understand their properties. For the sake of simplicity, let us give a simple name to singular values and vectors.
The sigular vector U is a matrix of dimension 333 x 4, the same dimension as our data.
U <- penguins_svd$u U %>% dim() ## [1] 333 4
And it looks like this.
U %>% head() ## [,1] [,2] [,3] [,4] ## [1,] -0.04798188 0.01338848 0.004108109 0.07663240 ## [2,] -0.04862309 0.01809574 0.014438916 0.03365464 ## [3,] -0.04160792 0.08855716 0.025312973 -0.01876370 ## [4,] -0.04415929 0.06451464 0.067224723 0.03968621 ## [5,] -0.04671088 0.04099607 0.024614580 0.08738297 ## [6,] -0.04638621 0.02499507 0.006737752 0.04717352
Similarly, the singular vectors V is of the dimension 4 x 4.
V <- penguins_svd$v
And the singular vectors V looks like these.
V %>% head() ## [,1] [,2] [,3] [,4] ## [1,] -0.010218185 0.19849831 -0.975661291 -0.092623187 ## [2,] -0.003891394 0.14068499 -0.065122188 0.987902673 ## [3,] -0.046569777 0.96877576 0.209389404 -0.124341735 ## [4,] -0.998855196 -0.04774608 0.000472226 0.002896004
And finally, the singular values D we get is a vector of length 4.
D <- penguins_svd$d D ## [1] 78250.54165 499.88402 74.04578 41.84098
What is special about these singular matrices?
The singular matrices U and V are orthogonal. What this means is if you multiply U with transpose of U, you will get an identity matrix with ones along diagonal and zeros elsewhere. This is true for the singular matrix V as well.
Let us verify the orthogonal properties of the singular matrices from the penguin data.
t(U) %*% U
We can see that diagonal elements are ones and off diagonal elements pretty close to zero.
## [,1] [,2] [,3] [,4] ## [1,] 1.000000e+00 2.038300e-16 3.816392e-17 6.126555e-16 ## [2,] 2.038300e-16 1.000000e+00 2.289835e-16 2.167862e-16 ## [3,] 3.816392e-17 2.289835e-16 1.000000e+00 -1.156844e-16 ## [4,] 6.126555e-16 2.167862e-16 -1.156844e-16 1.000000e+00 And this is true of the singular matrix V as well.
t(V) %*% V ## [,1] [,2] [,3] [,4] ## [1,] 1.000000e+00 -4.857226e-17 1.919038e-17 -3.473784e-16 ## [2,] -4.857226e-17 1.000000e+00 2.058629e-17 5.583641e-18 ## [3,] 1.919038e-17 2.058629e-17 1.000000e+00 -1.406308e-17 ## [4,] -3.473784e-16 5.583641e-18 -1.406308e-17 1.000000e+00
Also, this is true if you change the order of multiplication as well.
V %*% t(V) ## [,1] [,2] [,3] [,4] ## [1,] 1.000000e+00 5.551115e-17 2.255141e-17 -6.179952e-18 ## [2,] 5.551115e-17 1.000000e+00 8.326673e-17 3.486794e-16 ## [3,] 2.255141e-17 8.326673e-17 1.000000e+00 -1.534146e-17 ## [4,] -6.179952e-18 3.486794e-16 -1.534146e-17 1.000000e+00
Another special property of the singular vectors is that orthonormal. What this means is that the squared elements of each column of U sums to 1. And the inner product (dot product) between any pair of columns in U equals to 0. And the inner product between each pair of columns of V equals 0. The same is true for V columns as well.
Let us first check if the squared elements of each column equals 1 for U and V.
colSums(U^2) ## [1] 1 1 1 1
colSums(V^2) ## [1] 1 1 1 1
We see that the sum of squares for each columns of U and V is 1.
Now let us check if the column vectors are orthogonal, i.e. does the inner product of a pair U columns (and also V columns) equals to zero.
We see that the column vectors of U and V are orthogonal.
sum(V[,1]*V[,4]) ## [1] -3.474868e-16
sum(U[,1]*U[,2]) ## [1] 1.715343e-16
Since both the properties “orthogonal” and “normality” hold true for U and V, we say that the singular vectors are orthonormal.
Reconstructing Original data with a few singular vectors
One of the common applications of SVD is in data compression or dimensionality reduction. The original data matrix with bigger dimension can be approximated with one or few singular vectors.
Let us reconstruct an approximate version of the original data using one column vector from U and V and the first singular value. We get rank-1 approximation of the data.
D[1] * (U[,1] %*% t(V[,1])) %>% head() ## [,1] [,2] [,3] [,4] ## [1,] 38.36528 14.61066 174.8513 3750.310 ## [2,] 38.87798 14.80591 177.1879 3800.427 ## [3,] 33.26880 12.66977 151.6239 3252.115 ## [4,] 35.30882 13.44667 160.9213 3451.533 ## [5,] 37.34901 14.22364 170.2196 3650.967 ## [6,] 37.08941 14.12477 169.0365 3625.591
Let us check the top few rows from rank-1 approximate data with the original data. And we can see that, rank-1 approximation is pretty close to original data.
penguins %>% head() ## # A tibble: 6 x 4 ## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <dbl> <dbl> <dbl> <dbl> ## 1 39.1 18.7 181 3750 ## 2 39.5 17.4 186 3800 ## 3 40.3 18 195 3250 ## 4 36.7 19.3 193 3450 ## 5 39.3 20.6 190 3650 ## 6 38.9 17.8 181 3625
We can improve our ability to reconstruct by using more singular vectors.
In this case let us compute rank-2 approximation of data matrix using the first column singular vectors and singular values.
((D[1] * (U[,1] %*% t(V[,1])))+ (D[2] * (U[,2] %*% t(V[,2])))) %>% head() ## [,1] [,2] [,3] [,4] ## [1,] 39.69377 15.55222 181.3350 3749.991 ## [2,] 40.67355 16.07852 185.9512 3799.995 ## [3,] 42.05598 18.89765 194.5099 3250.001 ## [4,] 41.71036 17.98374 192.1642 3449.993 ## [5,] 41.41689 17.10673 190.0730 3649.989 ## [6,] 39.56958 15.88258 181.1410 3624.994
When we compare this to original penguins data, we can see that it is a very good approximation.
Computing PCA from SVD
Principal component analysis is just a special case of SVD. Applying SVD on the data matrix whose columns are centered, is doing PCA. Columns of V are PC loading vectors. Columns of U are PC score vectors.
Let us allply SVD on mean centered Penguins data. By mean centering, we are putting all the variables/columns in the data on the same scale. Therefore a single variable with a huge range would not dominate.
# mean centering the columns data4pca <- apply(penguins,2,function(x){x-mean(x)}) # apply SVD pca_by_svd <- svd(data4pca) # get a peek at V singular vectors pca_by_svd$v %>% head() ## [,1] [,2] [,3] [,4] ## [1,] 0.004003162 -0.31927773 -0.94126475 -0.1098470736 ## [2,] -0.001154327 0.08684753 -0.14449479 0.9856862728 ## [3,] 0.015194547 -0.94354238 0.30518986 0.1278908060 ## [4,] 0.999875876 0.01571702 -0.00103611 -0.0003657482
Let us also apply PCA on Penguins data using prcomp() function. Notice that rotation matrix from PCA is the same as V singular vectors from SVD on mean centered data.
pca_obj <- prcomp(penguins) pca_obj$rotation %>% head() ## PC1 PC2 PC3 PC4 ## bill_length_mm 0.004003162 -0.31927773 -0.94126475 -0.1098470736 ## bill_depth_mm -0.001154327 0.08684753 -0.14449479 0.9856862728 ## flipper_length_mm 0.015194547 -0.94354238 0.30518986 0.1278908060 ## body_mass_g 0.999875876 0.01571702 -0.00103611 -0.0003657482
t(as.matrix(penguins)) %*% as.matrix(penguins) ## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## bill_length_mm 654405.7 250640.99 2960705 62493450 ## bill_depth_mm 250641.0 99400.11 1143413 23798630 ## flipper_length_mm 2960705.0 1143412.60 13514330 284815600 ## body_mass_g 62493450.0 23798630.00 284815600 6109136250
SVD from Scratch: A Pair of Singular Vectors at a time
What is all the more interesting is writing a function from scratch to perform SVD is relatively easy. We can write a function that computes a pair of U and V using an iterative approach. Idea is to compute the first U and V singular vectors from the data iteratively and then remove the rank-1 approximation from the data and apply the approach to compute the second U and V singular vectors.
Here is an R function that computes the first singular vectors of SVD from scrtach
svd_from_scratch <- function(data){ # initialize u and v with random numbers u <- rnorm(nrow(data)) v <- rnorm(ncol(data)) for (iter in 1:500){ u <- data %*% v u <- u/sqrt(sum(u^2)) v <- t(data) %*% u v <- v / sqrt(sum(v^2)) } return(list(u=u,v=v)) }
svdx <- svd_from_scratch(as.matrix(penguins))
cor(U[,1],svdx$u) ## [,1] ## [1,] -1 cor(V[,1],svdx$v) ## [,1] ## [1,] -1
If you love the tweetorial from Prof. Daniela Witten, you would love the “An Introduction to Statistical Learning with Applications in R” co-authored by Prof. Witten. And it is freely available online.
And can’t wait to get my hands on the Second Edition of the book
If you like #islr first edition then you will love ? ? our second edition… coming soon!! https://t.co/9w1DSYJAgH
— Daniela Witten (@daniela_witten) May 16, 2020