Sparse matrices greatly help dealing with large matrices with a lot of missing data. Sparse matrix efficiently store data set with a lot sparsity in matrix. It offers a much smaller memory foot print to store and access than the full matrix. With SciPy’s Sparse module, one can directly use sparse matrix for common arithmetic operations, like addition, subtraction, multiplication, division, and more complex matrix operations.
Among the many types of sparse matrices available in Python SciPy package, we will see examples of creating sparse matrix in Coordinate Format or COO format.
Coordinate list format or COO format stores data as a list of tuple with three elements; row, column, value. The first element is row index, the second element is column index, and the third element is the value to be stored in the row and column. As you you can imagin, the tuple is present only for non-zero elements. The biggest advantages of sparse matrix in COO format is that one can construct the sparse matrix really fast and can convert the COO sparse matrix to other sparse matrix formats like Compressed Sparse Row matrix (CSR) and Compressed Sparse Column matrix (CSC).
Let us load Sparse matrix module for SciPy to access the sparse matrix functions. Let us also load NumPy and we will use NumPy’s random module to generate random numbers.
# load coo_matrix from Scipy.sparse module from scipy.sparse import coo_matrix # import numpy import numpy as np
1. How To Construct an Empty Sparse Matrix in COO format?
We can construct empty sparse matrix in COO format using the coo_matrix() from scipy.sparse. To create a empty COO matrix of size 4×5, we can do as
# create empty COO matrix A = coo_matrix((4, 5)) print(A)
When we print the empty matrix we will see nothing, as there are no non-zero elements in the sparse matrix.
>A.toarray() array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
Similarly, we can also use todense() function to get all the content of a sparse matrix.
A.todense() matrix([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
We can also specify the data type of the elements in the empty sparse matrix with dtype. Here we construct empty sparse matrix of size 3×4 with integers
coo_matrix((3, 4), dtype=np.int8).toarray() array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=int8)
2. How To Construct COO matrix from a Dense Matrix?
Sometimes we may have the data already as a dense matrix and we might like to convert the dense matrix into a sparse one so that we can store the data efficiently.
Let us create a dense matrix with ones and zeroes using NumPy’s random module.
A_dense = np.random.randint(2, size=(3, 4))
We can print the dense matrix and see its content.
>print(A_dense) [[1 1 1 0] [1 0 0 1] [0 1 0 0]]
We can use the coo_matrix() function to convert the dense matrix to
A_coo = coo_matrix(A_dense)
And we can see the content of sparse matrix and it will print the (i,j,k) tuples for the elements with non-zero values
>print(A_coo) (0, 0) 1 (0, 1) 1 (0, 2) 1 (1, 0) 1 (1, 3) 1 (2, 1) 1
3. How To Construct COO matrix from data in (i,j,v) format?
Sometimes you have the sparse data in co-ordinate list format as tuples like row, col, and value format, where row and col correspond to row and column indices. With Scipy’s sparse module we can easily create sparse matrix in COO format.
Let us first create some data in (i,j,v) format. The row, col, and data elements are stored as numpy arrays.
# Constructing a matrix using ijv format row = np.array([0, 3, 1, 2, 3, 2]) col = np.array([0, 1, 1, 2, 0, 1]) data = np.array([10, 3, 88, 9, 2,6])
Let us provide row, col, data arrays as input argument to coo_matrix function and also specify the dimention of sparse matrix
>B = coo_matrix((data, (row, col)), shape=(4, 4))
When we print the COO matrix we will see the data in sparse (row, col, val) format.
>print(B) (0, 0) 10 (3, 1) 3 (1, 1) 88 (2, 2) 9 (3, 0) 2 (2, 1) 6
If we want to see the data in matrix form
>B.toarray() array([[10, 0, 0, 0], [ 0, 88, 0, 0], [ 0, 6, 9, 0], [ 2, 3, 0, 0]])
Scipy’s sparse module also has a lot of utility functions to work with sparse matrices. For example, scipy.sparse.isparse can tell if the matrix is sparse or not.
>from scipy.sparse import isspmatrix_coo, isspmatrix >scipy.sparse.issparse(B) True
We can also specifically check if it is COO matrix or not with isspmatrix_coo() function.
>isspmatrix_coo(B) True