In [1]:
from __future__ import division
import os
import sys
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
%precision 4
plt.style.use('ggplot')
In [2]:
np.random.seed(1234)
import pystan
import scipy.stats as stats
PyStan¶
Install PyStan
with
pip install pystan
The nice thing about PyMC
is that everything is in Python. With
PyStan
, however, you need to use a domain specific language based on
C++ syntax to specify the model and the data, which is less flexible and
more work. However, in exchange you get an extremely powerful HMC
package (only does HMC) that can be used in R and Python.
Useful links¶
In [3]:
import warnings
warnings.simplefilter('ignore', FutureWarning)
Coin toss¶
We’ll repeat the example of determining the bias of a coin from observed coin tosses. The likelihood is binomial, and we use a beta prior.
In [4]:
%%capture
coin_code = """
data {
int<lower=0> n; // number of tosses
int<lower=0> y; // number of heads
}
transformed data {}
parameters {
real<lower=0, upper=1> p;
}
transformed parameters {}
model {
p ~ beta(2, 2);
y ~ binomial(n, p);
}
generated quantities {}
"""
coin_dat = {
'n': 100,
'y': 61,
}
fit = pystan.stan(model_code=coin_code, data=coin_dat, iter=1000, chains=1)
Loading from a file¶
The string in coin_code can also be in a file - say coin_code.stan
-
then we can use it like so
fit = pystan.stan(file='coin_code.stan', data=coin_dat, iter=1000, chains=1)
In [5]:
print(fit)
Inference for Stan model: anon_model_7f1947cd2d39ae427cd7b6bb6e6ffd77.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
p 0.61 3.6e-3 0.05 0.51 0.58 0.61 0.65 0.69 171 1.0
lp__ -70.24 0.04 0.66 -72.04 -70.41 -69.98 -69.79 -69.74 236 1.0
Samples were drawn using NUTS at Fri Apr 12 08:10:00 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
In [6]:
coin_dict = fit.extract()
coin_dict.keys()
# lp_ is the log posterior
Out[6]:
odict_keys(['p', 'lp__'])
In [7]:
fit.plot('p');
plt.tight_layout()
WARNING: Deprecation warning. In future, use ArviZ library (`pip install arviz`)

Estimating mean and standard deviation of normal distribution¶
In [8]:
%%capture
norm_code = """
data {
int<lower=0> n;
real y[n];
}
transformed data {}
parameters {
real<lower=0, upper=100> mu;
real<lower=0, upper=10> sigma;
}
transformed parameters {}
model {
y ~ normal(mu, sigma);
}
generated quantities {}
"""
norm_dat = {
'n': 100,
'y': np.random.normal(10, 2, 100),
}
fit = pystan.stan(model_code=norm_code, data=norm_dat, iter=1000, chains=1)
In [9]:
fit
Out[9]:
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 10.06 0.01 0.19 9.71 9.94 10.07 10.2 10.45 310 1.0
sigma 2.02 6.5e-3 0.15 1.76 1.92 2.02 2.11 2.35 504 1.0
lp__ -117.1 0.06 0.94 -119.5 -117.5 -116.8 -116.4 -116.2 237 1.0
Samples were drawn using NUTS at Fri Apr 12 08:11:10 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
In [10]:
trace = fit.extract()
In [11]:
plt.figure(figsize=(10,4))
plt.subplot(1,2,1);
plt.hist(trace['mu'][:], 25, histtype='step');
plt.subplot(1,2,2);
plt.hist(trace['sigma'][:], 25, histtype='step');

Optimization (finding MAP)¶
In [26]:
%%capture
sm = pystan.StanModel(model_code=norm_code)
op = sm.optimizing(data=norm_dat)
op
Reusing fitted objects¶
In [13]:
new_dat = {
'n': 100,
'y': np.random.normal(10, 2, 100),
}
In [14]:
fit2 = pystan.stan(fit=fit, data=new_dat, chains=1)
In [15]:
fit2
Out[15]:
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 9.9 8.5e-3 0.2 9.52 9.77 9.9 10.04 10.3 546 1.0
sigma 1.99 6.1e-3 0.14 1.73 1.89 1.98 2.07 2.29 525 1.0
lp__ -115.4 0.05 1.0 -118.3 -115.8 -115.1 -114.7 -114.5 398 1.0
Samples were drawn using NUTS at Fri Apr 12 08:12:20 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Saving compiled models¶
We can also compile Stan models and save them to file, so as to reload them for later use without needing to recompile.
In [16]:
def save(obj, filename):
"""Save compiled models for reuse."""
import pickle
with open(filename, 'wb') as f:
pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)
def load(filename):
"""Reload compiled models for reuse."""
import pickle
return pickle.load(open(filename, 'rb'))
In [27]:
%%capture
model = pystan.StanModel(model_code=norm_code)
save(model, 'norm_model.pic')
In [18]:
new_model = load('norm_model.pic')
fit4 = new_model.sampling(new_dat, chains=1)
fit4
Out[18]:
Inference for Stan model: anon_model_3318343d5265d1b4ebc1e443f0228954.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 9.89 8.8e-3 0.2 9.48 9.75 9.9 10.02 10.27 530 1.0
sigma 1.98 6.7e-3 0.15 1.71 1.89 1.97 2.08 2.3 470 1.0
lp__ -115.5 0.05 1.03 -118.3 -115.9 -115.2 -114.7 -114.5 422 1.0
Samples were drawn using NUTS at Fri Apr 12 08:13:30 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Estimating parameters of a linear regression model¶
We will show how to estimate regression parameters using a simple linear model
We can restate the linear model
as sampling from a probability distribution
We will assume the following priors
In [19]:
%%capture
lin_reg_code = """
data {
int<lower=0> n;
real x[n];
real y[n];
}
transformed data {}
parameters {
real a;
real b;
real sigma;
}
transformed parameters {
real mu[n];
for (i in 1:n) {
mu[i] <- a*x[i] + b;
}
}
model {
sigma ~ uniform(0, 20);
y ~ normal(mu, sigma);
}
generated quantities {}
"""
n = 11
_a = 6
_b = 2
x = np.linspace(0, 1, n)
y = _a*x + _b + np.random.randn(n)
lin_reg_dat = {
'n': n,
'x': x,
'y': y
}
fit = pystan.stan(model_code=lin_reg_code, data=lin_reg_dat, iter=1000, chains=1)
In [20]:
fit
Out[20]:
Inference for Stan model: anon_model_4bdbb0aeaf5fafee91f7aa8b257093f2.
1 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=500.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a 7.42 0.08 0.88 5.75 6.89 7.36 7.93 9.13 132 1.0
b 1.44 0.04 0.53 0.47 1.11 1.45 1.77 2.41 149 1.0
sigma 0.89 0.02 0.25 0.55 0.74 0.85 1.0 1.47 195 1.0
mu[1] 1.44 0.04 0.53 0.47 1.11 1.45 1.77 2.41 149 1.0
mu[2] 2.18 0.04 0.46 1.34 1.89 2.2 2.5 3.03 160 1.0
mu[3] 2.92 0.03 0.39 2.21 2.67 2.94 3.19 3.64 182 1.0
mu[4] 3.66 0.02 0.33 3.08 3.44 3.66 3.89 4.27 222 1.0
mu[5] 4.4 0.02 0.29 3.86 4.2 4.4 4.6 4.95 315 1.0
mu[6] 5.15 0.01 0.27 4.62 4.96 5.13 5.32 5.68 486 1.0
mu[7] 5.89 0.01 0.28 5.38 5.7 5.87 6.06 6.46 535 1.0
mu[8] 6.63 0.02 0.32 6.05 6.42 6.61 6.83 7.29 386 1.0
mu[9] 7.37 0.02 0.37 6.64 7.13 7.36 7.62 8.09 233 1.0
mu[10] 8.11 0.03 0.43 7.28 7.84 8.1 8.4 8.98 191 1.0
mu[11] 8.85 0.04 0.5 7.91 8.54 8.84 9.17 9.88 170 1.0
lp__ -3.26 0.12 1.52 -6.95 -3.87 -2.96 -2.21 -1.5 164 1.0
Samples were drawn using NUTS at Fri Apr 12 08:14:40 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
In [21]:
fit.plot(['a', 'b']);
plt.tight_layout()
WARNING: Deprecation warning. In future, use ArviZ library (`pip install arviz`)

Simple Logistic model¶
We have observations of height and weight and want to use a logistic model to guess the sex.
In [22]:
# observed data
df = pd.read_csv('data/HtWt.csv')
df.head()
Out[22]:
male | height | weight | |
---|---|---|---|
0 | 0 | 63.2 | 168.7 |
1 | 0 | 68.7 | 169.8 |
2 | 0 | 64.8 | 176.6 |
3 | 0 | 67.9 | 246.8 |
4 | 1 | 68.9 | 151.6 |
In [23]:
%%capture
log_reg_code = """
data {
int<lower=0> n;
int male[n];
real weight[n];
real height[n];
}
transformed data {}
parameters {
real a;
real b;
real c;
}
transformed parameters {}
model {
a ~ normal(0, 10);
b ~ normal(0, 10);
c ~ normal(0, 10);
for(i in 1:n) {
male[i] ~ bernoulli(inv_logit(a*weight[i] + b*height[i] + c));
}
}
generated quantities {}
"""
log_reg_dat = {
'n': len(df),
'male': df.male,
'height': df.height,
'weight': df.weight
}
fit = pystan.stan(model_code=log_reg_code, data=log_reg_dat, iter=2000, chains=1)
In [24]:
fit
Out[24]:
Inference for Stan model: anon_model_bbf283522d1e199c049bb423b4a6e0da.
1 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a 9.2e-3 5.1e-4 9.0e-3-8.3e-3 3.1e-3 9.5e-3 0.02 0.03 315 1.0
b 0.38 6.4e-3 0.1 0.19 0.32 0.38 0.45 0.58 237 1.0
c -26.9 0.38 6.27 -40.1 -30.98 -26.97 -22.39 -15.19 267 1.0
lp__ -35.36 0.08 1.23 -38.44 -35.96 -35.07 -34.45 -33.91 268 1.0
Samples were drawn using NUTS at Fri Apr 12 08:15:52 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
In [25]:
df_trace = pd.DataFrame(fit.extract(['c', 'b', 'a']))
pd.plotting.scatter_matrix(df_trace[:], diagonal='kde');

In [ ]: