How To Save Sparse Matrix in Python to Mtx and Npz file

Sparse matrices in Python are of great of use when you work with dataset that are of high dimensional and sparse. Python’s Scipy has a set of tools to work with different types of commonly used sparse matrices. In this post, we will learn how to save a sparse matrix in Mtx format and npz format. Mtx format and npz format are two common ways to save sparse matrices.

Let us first load the python modules needed. To work with sparse matrix we need Scipy’s sparse module and to read and write sparse matrices in different format, we use Scipy’s io module.

import scipy.sparse as sparse
import scipy.io as sio
import scipy.stats as stats
import numpy as np

First, we will be creating a sparse matrix using Scipy’s sparse.random module. Here we generate random numbers from Poisson distribution using Scipy’s stats module.

With scipy’s sparse module we can generate sparse matrix of specific format and sparsity. In this example, we have created sparse matrix in CSR format with 25% density.

np.random.seed(42)
rvs = stats.poisson(15, loc=10).rvs
sparse_matrix = sparse.random(500, 
                  25,
                  density=0.25,
                  data_rvs=rvs,
                  format="csr")

We can verify if the variable is of sparse matrix type using “sparse.isspmatrix”

sparse.isspmatrix(sparse_matrix)

True

We can see that our sparse matrix is of dimension 500×25 and of type Compressed Sparse Row format.

sparse_matrix

<500x25 sparse matrix of type '<class 'numpy.float64'>'
	with 3125 stored elements in Compressed Sparse Row format>

If we use print statement, we get the full sparse matrix in row, column, value format.

print(sparse_matrix)

  (0, 0)	20.0
  (0, 1)	23.0
  (0, 6)	26.0
  (0, 11)	25.0
  (0, 12)	28.0
  (0, 21)	22.0
  (1, 3)	29.0

Using todense() function we can also convert the sparse matrix into a full 2D matrix.

sparse_matrix.todense()

matrix([[20., 23.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0., 24., ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0., ..., 20., 27.,  0.],
        [ 0.,  0., 24., ..., 25.,  0., 22.],
        [ 0.,  0.,  0., ...,  0.,  0., 25.]])

How to Write Sparse Matrix as Mtx File?

Scipy’s io module has a number of options to write a sparse matrix in to a file. To write the sparse matrix as Mtx file, we use io’s mmwrite() function with the file name and the sparse matrix. Mtx file format is short for Matrix Market files and widely used across different programming languages.

sio.mmwrite("sparse_matrix.mtx",sparse_matrix)

If you want to load a sparse matrix saves as Mtx file, we can use mmread() function and read it as sparse matrix.

sp_matrix=sio.mmread("sparse_matrix.mtx")

Scipy’s io module also has mminfo() function to check basic information on the file that is saved as Mtx file. Here we see the dimension of the matrix, total number of elements and data type.

sio.mminfo("sparse_matrix.mtx")

(500, 25, 3125, 'coordinate', 'real', 'general')

How to Write Sparse Matrix as .npz File?

Another way store a sparse matrix in Python is to write it in npz format. The .npz file format is a “zipped archive of files named after the variables they contain”. We can use sparse module’s save_npz() function to write a sparse matrix into a file in npz format.

sparse.save_npz('sparse_matrix.npz', sparse_matrix)

Similarly, we can load a saved .npz file using load_npz() function. It takes the .npz file and returns sparse matrix.

sparse_matrix = sparse.load_npz('sparse_matrix.npz')

Here, it returns sprase matrix in CSR format as that was our sparse matrix format.

sparse_matrix

<500x25 sparse matrix of type '<class 'numpy.float64'>'
	with 3125 stored elements in Compressed Sparse Row format>