Often you may deal with large matrices that are sparse with a few non-zero elements. In such scenarios, keeping the data in full dense matrix and working with it is not efficient.
A better way to deal with such sparse matrices is to use the special data structures that allows to store the sparse data efficiently. In R, the Matrix package offers great solutions to deal with large sparse matrices.
In this post we will see simple step-by-step examples of using the Matrix library. We will get started with using Sparse Matrices in R by addressing the following questions.
- How to create sparse matrix from a dense matrix?
- How to visualize the sparse Matrix?
- How to create sparse matrix from a sparse data?
- How to save sparse matrix in a file?
In a later post, we will dig deeper on using other functionalities available in the Matrix package.
library(Matrix)
Let us create a matrix with sparse data from scratch. We will first create data, a vector with million random numbers from normal distribution with zero mean and unit variance.
data <- rnorm(1e6)
The above data vector is not sparse and contains data in all elements. Let us randomly select the indices and make them to contain zeroes.
data <- rnorm(1e6) zero_index <- sample(1e6)[1:9e5] data[zero_index] <- 0
Now we have created a vector of million elements, but 90% of the elements are zeros. Let us make it into a dense matrix.
mat <- matrix(data, ncol=1000) mat[1:5,1:5]
We can see that the matrix has ver few non-zero elements.
## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 0.000000 0 0 0 ## [2,] 0 0.000000 0 0 0 ## [3,] 0 1.817244 0 0 0 ## [4,] 0 1.580687 0 0 0 ## [5,] 0 0.000000 0 0 0
We can use R function object.size and check the size of the dense matrix.
print(object.size(mat),units="auto")
The dense matrix close to 8Mb.
## 7.6 Mb
How To Create Sparse Matrix from Dense Matrix in R ?
Let us use sparse matrix library to convert the dense matrix to sparse matrix.
mat_sparse <- Matrix(mat, sparse=TRUE)
Let us check how the data is store in sparse matrix. We can see that elements with no values are shown as dots.
mat_sparse[1:5,1:5] ## 5 x 5 sparse Matrix of class "dgCMatrix" ## ## [1,] . . . . . ## [2,] . . . . . ## [3,] . 1.817244 . . . ## [4,] . 1.580687 . . . ## [5,] . . . . .
It tells us that our sparse matrix belongs to a class “dgCMatrix”. There are different types of sparse matrices. Each sparse matrix type is suitable for certain mathematical operations and reading, writing and storing. In our example our data is of type double. And the sparse matrix type “dgCMatrix” refers to double sparse matrix stored in CSC, Compressed Sparse Column format. A sparse matrix in CSC format is column-oriented format and it is implemented such that the non-zero elements in the columns are sorted into increasing row order.
Let us check the size of our sparse matrix.
print(object.size(mat_sparse),units="auto")
## 1.1 Mb
The sparse matrix stores the same data in just about 1 Mb, way more memory efficient than the dense matrix. About seven times smaller than the dense matrix.
How To Visualize Sparse Matrix in R ?
Let us quickly visualize a small portion of sparse matrix using the function image in R. We can see the matrix predominantly white, meaning sparse with no data.
image(mat_sparse[1:10,1:10])
How To Create Sparse Matrix from Scratch in R ?
Our example of creating a sparse matrix was kind of silly. We started with dense matrix and converted into sparse matrix. We did that to illustrate the benefit of sparse matrix. In real life, we often have sparse matrix in sparse form.
A better way to create sparse matrix is to start with data in sparse format. Simplest way to store the data in sparse form is to keep the co-ordinates of only non-zero elements. Basically, we need three vectors of same sizes. The first two vectors specify the co-ordinates (i,j) of non-zero element, where i is the row index and j is column index. And the third vector stores the actual non-zero values.
Let us create sparse matrix corresponding to 10×10 dense matrix, such that the matrix contain just 5 non-zero elements.
# 5 random row indices i <- sample(10,5) # 5 random column indices j <- sample(10,5) # 5 random numbers x <- rpois(5,10)
We can use sparseMatrix function and give the i,j, and x values as argument and also specify the dimension of the dense matrix.
sp_matrix <- sparseMatrix(i=i,j=j,x=x,dims=list(10,10))
sparseMatrix function creates sparse matrix for us and we can see the content simply printing it.
sp_matrix ## 10 x 10 sparse Matrix of class "dgCMatrix" ## ## [1,] . . . . . . 11 . . . ## [2,] 5 . . . . . . . . . ## [3,] . . . . . . . . . . ## [4,] . . . . . . . . . . ## [5,] . 10 . . . . . . . . ## [6,] . . . . . . . . . . ## [7,] . . . . . . . . . 8 ## [8,] . . . 10 . . . . . . ## [9,] . . . . . . . . . . ## [10,] . . . . . . . . . .
How To Save Sparse Matrix in to a File?
You would also like to save the sparse matrix and use it later. One of the ways to save the sparse matrix is to save them as Mtx file, that stores matrix in MatrixMarket format.
We can use writeMM function to save the sparse matrix object into a file. In this example, we save our toy sparse matrix into file named “sparse_matrix.mtx”.
writeMM(obj = sp_matrix, file="sparse_matrix.mtx")
We can load the saved sparse matrix data into sparse matrix using readMM function. We can see that it is the same as what saved, but this time it is dgTMatrix class.
sp_matrix_read <- readMM("sparse_matrix.mtx") sp_matrix_read ## 10 x 10 sparse Matrix of class "dgTMatrix" ## ## [1,] . . . . . . . . . . ## [2,] . . . . . . 4 . . . ## [3,] . . . . . . . . . . ## [4,] . . . 18 . . . . . . ## [5,] . . 10 . . . . . . . ## [6,] . . . . . . . . . . ## [7,] . . . . 8 . . . . . ## [8,] . . . . . . . . . . ## [9,] . . . . . . . . . . ## [10,] . . . . . . . 9 . .
In summary, in this post we learned how to get started with using sparse matrix data structure in R. More specifically, we learned to created sparse matrix from dense matrix, to visualize portion of sparse matrix, to create sparse matrix in R from three vectors, to write a sparse matrix to a file and load the sparse matrix stored in MarketMatrix format into sparse matrix data structure.
Tune in for a future post on how to use the sparse matrix in common statistical and machine learning applications relevant to data science practice.