Sometimes, while working with large sparse matrices in Python, you might want to select certain rows of sparse matrix or certain columns of sparse matrix. As we saw earlier, there are many types of sparse matrices available in SciPy in Python. Each of the sparse matrix type is optimized for specific operations.
We will see examples of slicing a sparse matrix by row and column. Basically, we will create a random sparse matrix and select a subset of rows or columns from sparse matrix using Scipy/NumPy in Python.
Let us load the modules needed.
from scipy import sparse import numpy as np from scipy import stats
Let us create a sparse random matrix using SciPy’s sparse module’s random function. Here we generate sparse random matrix of size 5 x 5 containing random numbers from Poisson distribution.
A = sparse.random(5, 5, density=0.5, data_rvs=stats.poisson(10, loc=10).rvs)
We can see the content of the sparse matrix with print statement and todense() function.
print(A.todense()) [[ 0. 18. 23. 19. 0.] [ 0. 20. 23. 0. 14.] [ 0. 0. 0. 17. 17.] [17. 0. 25. 0. 20.] [ 0. 22. 0. 0. 0.]]
Let us say we are interested in rows or columns with even indices.
select_ind = np.array([0,2,4])
How to Select Rows from a Sparse Matrix?
We can subset our original sparse matrix using slice operation. The thing to note is that sparse.random function creates sparse matrix in COO format by default. However, COO matrix is not slice operations friendly.
So we first convert the COO sparse matrix to CSR (Compressed Sparse Row format) matrix using tocsr() function. And then we can slice the sparse matrix rows using the row indices array we created.
A.tocsr()[select_ind,:] <3x5 sparse matrix of type '<class 'numpy.float64'>' with 6 stored elements in Compressed Sparse Row format>
We can see that after slicing we get a sparse matrix of size 3×5 in CSR format. To see the contents of the sliced sparse matrix, we can use todense() function. Now we have just three rows instead of five.
A.tocsr()[select_ind,:].todense() matrix([[ 0., 18., 23., 19., 0.], [ 0., 0., 0., 17., 17.], [ 0., 22., 0., 0., 0.]])
How to Select Columns from a Sparse Matrix?
We can do the same for slicing columns of a sparse matrix. We will have to first convert to CSR or CSC matrix and then using slice operation for selecting the columns we are interested in.
Let us use tocsr() like before and select the columns with even indices.
A.tocsr()[:,select_ind].todense() matrix([[ 0., 23., 0.], [ 0., 23., 14.], [ 0., 0., 17.], [17., 25., 20.], [ 0., 0., 0.]])
Another option to slice rows or columns of a sparse matrix that is not big is to convert to a dense matrix and slice rows/columns. Obviously this approach is not efficient or possible when the sparse matrix dimension is large.