Introduction to Probabilistic Programming with PyStan

Introduction to PyStan

Stan, developed by a team led by Andrew Gelman, is one of the leading languages to do probabilistic computing. The core of probabilistic computing lies in Bayesian statistics.

Stan gets its name in honor of Stanislaw Ulam, co-inventor of the Monte Carlo method, the computational engine behind all Bayesian computing.

Stan is C++ package providing full Bayesian inference using the No-U-Turn sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC). And PyStan is the Python interface to Stan.

A Simple PyStan Example

In most of statistics, we start with observed data and try to infer the process that generated data. One approach is to assume the data was generated by a probabilistic model with one or more parameters. And one is typically interested in estimating the parameters from the observed data.

Bayesian approach uses the observed data as constant and the unknown parameters to have their own probability distribution. Then, use Bayes’ theorem to combine the prior probability distribution and the data to get the posterior distribution.

Without getting into the details of statistics or computing behind using PyStan, in this post we will illustrate a simple example of using PyStan.

We will see a classic statistics problem, where we have observed data and want to estimate the parameters of the model that generated the data. By starting with a parametric model, we can estimate parameters of a simple model in many ways.

For example, we can use Maximum Likelihood Estimate and get a point estimate for the parameters defining the model. The advantage of using Stan, a Bayesian approach is that we get the full distribution of the parameter, not just most likely value of the parameter.

Let us make sure, we have Pystan installed and it works nicely. Check out PyStan’s github page, https://github.com/stan-dev/pystan, for instructions for installing PyStan. The pystan version used in this example is 2.19.1.1 (One can check the installed version using pystan.version after importing pystan).

Let us import pystan and other modules needed for working through the simple example.

 
# load pystan
import pystan
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
import scipy.stats as stats

Let us simulate our observed data using normal probability distribution. Normal distribution is specified by two parameters, mu specifying the mean and sigma specifying the variance/standard deviation. Let us also set some random seed to reproduce our results.

 
# set seed
np.random.seed(1234)
# Number of observed data
n = 100
# true mean
mu = 10
# true standard deviation
sd = 2
# simulate data from normal distributions
data = np.random.normal(mu, sd, n)

Now that we have our data to use and we are ready to estimate the parameters of distribution the data was generated from.

 
data[0:5]
array([10.9429,  7.618 , 12.8654,  9.3747,  8.5588])

Three Must Have Code Blocks in PyStan

The key step of using Stan is to write out the data generating probabilistic model in Stan programming language. In PyStan, we will wrap the Stan model code as a string. All Stan model code must have at least three code blocks. We need

  1. data block
  2. parameters block
  3. model block

The data block defines the variable names and the dimension used in Bayesian computing. For example, in our simple example, data block defines n to be integer and y for the data sampled from distribution.

The parameters block defines the names and the limits of parameters using in the model. In our example, the parameters are mean and sigma.

The distribution statement goes in the model block, which in our case is a straight line with additional Gaussian noise.

 
norm_code = """
data {
    int<lower=0>; n;
    real y[n];
}
parameters {
    real<lower=0, upper=100>; mu;
    real<lower=0, upper=10>; sigma;
}
model {
    y ~ normal(mu, sigma);
}
"""

Though not included here, it is also possible to specify transformed data and transformed parameter blocks, as well as blocks for functions and generated quantities.

Let us use the code block to build a Stan model using StanModel command.

 
model = pystan.StanModel(model_code=norm_code)

We will provide the data in Python dictionary.

 
norm_data = {
             'n': n,
             'y': data,
            }

Fitting the model with Pystan

And then fit our data to the model using sampling() function by providing the data dictionary.

fit = model.sampling(data=norm_data)

We can check the fit object, containing the results from PyStan. Let us print the fit object.

 
fit

It will show you the basic description about the parameters. Let us focus on two parameters for now, mu and sigma of interest.

We can see that the mean value of the parameter “mu” and sigma is 10.07 and 2.03 and they both are pretty close to the true values. This is like MLE estimate of the parameters.

 
Inference for Stan model: anon_model_01c78a21aa51883a9aabd4563199e4c5.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
mu     10.07  3.6e-3    0.2   9.66   9.93  10.07  10.21  10.46   3195    1.0
sigma   2.03  2.9e-3   0.15   1.76   1.92   2.02   2.12   2.36   2782    1.0
lp__  -117.2    0.02   1.06 -120.1 -117.6 -116.9 -116.5 -116.2   1838    1.0

Samples were drawn using NUTS at Mon Dec  9 18:53:48 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The biggest advantage of doing Bayesian analysis with pystan is that we get the full distributions of the parameters, mu and sigma.

We can extract mu from fit object as

 
fit['mu']
array([10.4385, 10.4692, 10.0316, ..., 10.0665,  9.9765, 10.3807])

and the parameter sigma as

 
fit['sigma']
array([1.9407, 2.1506, 1.9739, ..., 2.094 , 2.1223, 2.183 ])

Visaulizing the results from PyStan

Let us extract the distribution of the parameters from Stan modeling and visualize them.

Let us first visualize the trace and distribution of mu.

 
mu = fit['mu']

trace of mu gives us how the values of mu changes over each iteration/sample.

 
bplot = sns.lineplot(range(0,len(mu)),mu )
bplot.axhline(np.median(mu), ls='--', color="red")
plt.ylabel("mu", fontsize=15)
plt.xlabel("Samples",fontsize=15)
Trace of mu(mean) in PyStan Example
Trace of mu(mean) in PyStan Example

Similarly, the distribution of mu gives us the uncertainty in our mu estimates in the samples.

 
bplot=sns.distplot(fit['mu'])
bplot.axvline(np.median(mu),  color='r', linestyle='dashed', linewidth=2)
#bplot.axvline(np.mean(mu),  color='g', linestyle='dashed', linewidth=1)
plt.xlabel("mu", fontsize=15)
plt.ylabel("Density",fontsize=15)

Distribution of mu: PyStan example

We can do the same for parameter sigma.

 
sigma= fit['sigma']

Trace of sigma shows the sigma estimates over iteration.

 
bplot = sns.lineplot(range(0,len(sigma)),sigma )
bplot.axhline(np.median(sigma), ls='--', color="red")
plt.ylabel("sigma", fontsize=15)
plt.xlabel("Samples",fontsize=15)

Trace of sigma: PyStan Example

And the the distribution of sigma shows how variabile it is.

 
bplot = sns.distplot(sigma )
bplot.axvline(np.median(sigma), ls='--', color="red")
plt.xlabel("sigma", fontsize=15)
plt.ylabel("Density",fontsize=15)
Distribution of sigma: Pystan Example

To summarize, we have seen a simple example of using PyStan to perform some probabilistic computing. We looked at classic statistical inference problem of given some observed data. And we used PyStan to estimate the posterior distribution of the parameters and saw that the distribution makes sense. Look out for more examples of probabilistic program with PyStan here soon.