Using PyMC3

PyMC3 is a Python package for doing MCMC using a variety of samplers, including Metropolis, Slice and Hamiltonian Monte Carlo. See Probabilistic Programming in Python using PyMC for a description. The GitHub site also has many examples and links for further exploration.

Installation

! pip install --process-dependency-links git+https://github.com/pymc-devs/pymc3

Other resources

Some examples are adapted from:

In [1]:
import pymc3 as pm
import numpy.random as rng
import scipy.stats as stats
from sklearn.preprocessing import StandardScaler

Coin bias

We start with the simplest model - that of determining the bias of a coin from observed outcomes.

Setting up the model

In [2]:
n = 100
heads = 61

Anaalytical solution

In [3]:
a, b = 10, 10
prior = stats.beta(a, b)
post = stats.beta(heads+a, n-heads+b)
ci = post.interval(0.95)

xs = np.linspace(0, 1, 100)
plt.plot(prior.pdf(xs), label='Prior')
plt.plot(post.pdf(xs), label='Posterior')
plt.axvline(100*heads/n, c='red', alpha=0.4, label='MLE')
plt.xlim([0, 100])
plt.axhline(0.3, ci[0], ci[1], c='black', linewidth=2, label='95% CI');
plt.legend()
pass
_images/16C_PyMC3_6_0.png

Introduction to PyMC3

In [4]:
niter = 2000
with pm.Model() as coin_context:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    trace = pm.sample(niter)
Applied logodds-transform to p and added transformed p_logodds to model.
Assigned NUTS to p_logodds
 [-----------------100%-----------------] 2000 of 2000 complete in 0.8 sec

Specifying start, sampler (step) and multiple chains

In [5]:
with coin_context:
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(niter, step=step, start=start, njobs=4, random_seed=123)
 [-----------------100%-----------------] 2000 of 2000 complete in 0.4 sec

MAP estimate

In [6]:
start
Out[6]:
{'p_logodds': array(0.42956266113384295)}

Summary of results

Discard 50% as burn-in

In [7]:
t = trace[niter//2:]
t['p'].shape
Out[7]:
(4000,)

Getting values from the trace

In [8]:
p = trace.get_values('p', burn=niter//2, combine=True, chains=[0,2])
p.shape
Out[8]:
(2000,)

Autocorrelation plot

In [9]:
pm.autocorrplot(t, varnames=['p'])
pass
_images/16C_PyMC3_19_0.png

Calculate effective sample size

\[\hat{n}_{eff} = \frac{mn}{1 + 2 \sum_{t=1}^T \hat{\rho}_t}\]

where \(m\) is the number of chains, \(n\) the number of steps per chain, \(T\) the time when the autocorrelation first becomes negative, and \(\hat{\rho}_t\) the autocorrelation at lag \(t\).

In [10]:
pm.effective_n(t)
Out[10]:
{'p': 767, 'p_logodds': 765}

Evaluate convergence

Gelman-Rubin
\[\hat{R} = \frac{\hat{V}}{W}\]

where \(W\) is the within-chain variance and \(\hat{V}\) is the posterior variance estimate for the pooled traces. Values greater than one indicate that one or more chains have not yet converged.

Discrad burn-in steps for each chain. The idea is to see if the starting values of each chain come from the same distribution as the stationary state.

  • \(W\) is the number of chains \(m \times\) the variacne of each individual chain
  • \(B\) is the number of steps \(n \times\) the variance of the chain means
  • \(\hat{V}\) is the weigthed average \((1 - \frac{1}{n})W + \frac{1}{n}B\)

The idea is that \(\hat{V}\) is an unbiased estimator of \(W\) if the starting values of each chain come from the same distribution as the stationary state. Hence if \(\hat{R}\) differs significantly from 1, there is probsbly no convergence and we need more iterations. This is done for each parameter \(\theta\).

In [11]:
pm.gelman_rubin(t)
Out[11]:
{'p': 0.99949987493746095, 'p_logodds': 0.99949987493746095}
Geweke

Compares mean of initial with later segments of a trace for a parameter. Should have absolute value less than 1 at convergence.

In [12]:
plt.plot(pm.geweke(t['p'])[:,1], 'o')
plt.axhline(1, c='red')
plt.axhline(-1, c='red')
plt.gca().margins(0.05)
pass
_images/16C_PyMC3_26_0.png

Textual summary

In [13]:
pm.summary(t, varnames=['p'])

p:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------

  0.615            0.050            0.002            [0.517, 0.698]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|

  0.517          0.581          0.616          0.654          0.698

Visual summary

In [14]:
pm.traceplot(t, varnames=['p'])
pass
_images/16C_PyMC3_30_0.png

Posterior predictive samples

In [15]:
with coin_context:
    ppc = pm.sample_ppc(t, samples=100)
ppc['y'].ravel()
Out[15]:
array([73, 76, 58, 65, 59, 61, 72, 66, 59, 60, 51, 61, 69, 68, 68, 64, 66,
       56, 70, 63, 63, 73, 49, 56, 52, 59, 55, 55, 63, 56, 66, 55, 61, 59,
       64, 63, 68, 49, 58, 57, 55, 59, 69, 62, 65, 51, 73, 61, 67, 62, 51,
       51, 59, 51, 64, 58, 67, 64, 58, 68, 64, 63, 68, 69, 70, 71, 66, 68,
       58, 57, 66, 48, 63, 67, 57, 60, 65, 64, 56, 62, 55, 56, 56, 58, 62,
       59, 69, 53, 50, 67, 58, 63, 57, 64, 62, 72, 73, 66, 67, 70])

Saving traces

In [16]:
from pymc3.backends import Text

niter = 2000
with pm.Model() as text_save_demo:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    db = Text('trace')
    trace = pm.sample(niter, trace=db)
Applied logodds-transform to p and added transformed p_logodds to model.
Assigned NUTS to p_logodds
 [-----------------100%-----------------] 2000 of 2000 complete in 0.8 sec
In [17]:
with text_save_demo:
    trace = pm.backends.text.load('trace')
    pm.traceplot(trace, varnames=['p'])
_images/16C_PyMC3_36_0.png

If you are fitting a large complex model that may not fit in memory, you can use the SQLite3 backend to save the trace incremnetally to disk.

In [18]:
from pymc3.backends import SQLite

niter = 2000
with pm.Model() as sqlie3_save_demo:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    db = SQLite('trace.db')
    trace = pm.sample(niter, trace=db)
Applied logodds-transform to p and added transformed p_logodds to model.
Assigned NUTS to p_logodds
 [-----------------100%-----------------] 2000 of 2000 complete in 0.8 sec
In [19]:
with sqlie3_save_demo:
    trace = pm.backends.sqlite.load('trace.db')
    pm.traceplot(trace, varnames=['p'])
_images/16C_PyMC3_39_0.png

Sampling from prior

Just omit the observed= argument.

In [20]:
with pm.Model() as prior_context:
    sigma = pm.Gamma('gamma', alpha=2.0, beta=1.0)
    mu = pm.Normal('mu', mu=0, sd=sigma)
    trace = pm.sample(niter, step=pm.Metropolis())
Applied log-transform to gamma and added transformed gamma_log to model.
 [-----------------100%-----------------] 2000 of 2000 complete in 0.5 sec

Univariate normal distribution

In [21]:
xs = rng.normal(loc=5, scale=2, size=100)
In [22]:
sns.distplot(xs, rug=True)
pass
_images/16C_PyMC3_44_0.png
In [23]:
pm.traceplot(trace, varnames=['mu'])
pass
_images/16C_PyMC3_45_0.png

Sampling from posterior

In [24]:
niter = 2000
with pm.Model() as normal_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.Normal('y', mu=mu, sd=sd, observed=xs)
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(niter, start=start, step=step)
Applied log-transform to sd and added transformed sd_log to model.
 [-----------------100%-----------------] 2000 of 2000 complete in 1.2 sec
In [25]:
pm.traceplot(trace, varnames=['mu', 'sd'])
pass
_images/16C_PyMC3_48_0.png

Linear regression

We will show how to estimate regression parameters using a simple linear model

\[y \sim ax + b\]

We can restate the linear model

\[y = ax + b + \epsilon\]

as sampling from a probability distribution

\[y \sim \mathcal{N}(ax + b, \sigma^2)\]

Now we can use pymc to estimate the parameters \(a\), \(b\) and \(\sigma\). We will assume the following priors

\begin{align} a \sim \mathcal{N}(0, 100) \\ b \sim \mathcal{N}(0, 100) \\ \sigma \sim | \mathcal{N(0, 1)} | \end{align}
In [26]:
# observed data
np.random.seed(123)
n = 11
_a = 6
_b = 2
x = np.linspace(0, 1, n)
y = _a*x + _b + np.random.randn(n)
In [27]:
niter = 10000
with pm.Model() as linreg:
    a = pm.Normal('a', mu=0, sd=100)
    b = pm.Normal('b', mu=0, sd=100)
    sigma = pm.HalfNormal('sigma', sd=1)

    y_est = a*x + b
    likelihood = pm.Normal('y', mu=y_est, sd=sigma, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(niter, step, start, random_seed=123)
Applied log-transform to sigma and added transformed sigma_log to model.
 [-----------------100%-----------------] 10000 of 10000 complete in 10.7 sec
In [28]:
t = trace[niter//2:]
pm.traceplot(trace, varnames=['a', 'b'])
pass
_images/16C_PyMC3_52_0.png
In [29]:
plt.scatter(x, y, s=30, label='data')
for a_, b_ in zip(t['a'][-100:], t['b'][-100:]):
    plt.plot(x, a_*x + b_, c='gray', alpha=0.1)
plt.plot(x, _a*x + _b, label='true regression line', lw=3., c='red')
plt.legend(loc='best')
pass
_images/16C_PyMC3_53_0.png

Using the GLM module

In [30]:
df = DataFrame({'x': x, 'y': y})
df.head()
Out[30]:
x y
0 0.0 0.914369
1 0.1 3.597345
2 0.2 3.482978
3 0.3 2.293705
4 0.4 3.821400
In [31]:
with pm.Model() as model:
    pm.glm.glm('y ~ x', df)
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(2000, start=start, step=step)
Applied log-transform to sd and added transformed sd_log to model.
 [-----------------100%-----------------] 2000 of 2000 complete in 2.9 sec
In [32]:
pm.traceplot(trace, varnames=['Intercept', 'x'])
pass
_images/16C_PyMC3_57_0.png

Robust linear regression

In [33]:
# observed data
np.random.seed(123)
n = 11
_a = 6
_b = 2
x = np.linspace(0, 1, n)
y = _a*x + _b + np.random.randn(n)
y[5] *=10
df = DataFrame({'x': x, 'y': y})
df.head()
Out[33]:
x y
0 0.0 0.914369
1 0.1 3.597345
2 0.2 3.482978
3 0.3 2.293705
4 0.4 3.821400
In [34]:
x, y = StandardScaler().fit_transform(df[['x', 'y']]).T
In [35]:
niter = 10000
with pm.Model() as linreg:
    a = pm.Normal('a', mu=0, sd=100)
    b = pm.Normal('b', mu=0, sd=100)
    sigma = pm.HalfNormal('sigma', sd=1)

    y_est = a*x + b
    y_obs = pm.Normal('y_obs', mu=y_est, sd=sigma, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(niter, step, start, random_seed=123)
Applied log-transform to sigma and added transformed sigma_log to model.
 [-----------------100%-----------------] 10000 of 10000 complete in 7.7 sec
In [36]:
t = trace[niter//2:]
plt.scatter(x, y, s=30, label='data')
for a_, b_ in zip(t['a'][-100:], t['b'][-100:]):
    plt.plot(x, a_*x + b_, c='gray', alpha=0.1)
plt.legend(loc='upper left')
pass
_images/16C_PyMC3_64_0.png
In [37]:
niter = 10000
with pm.Model() as robust_linreg:
    beta = pm.Normal('beta', 0, 10, shape=2)
    nu = pm.Exponential('nu', 1/len(x))
    sigma = pm.HalfCauchy('sigma', beta=1)

    y_est = beta[0] + beta[1]*x
    y_obs = pm.StudentT('y_obs', mu=y_est, sd=sigma, nu=nu, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(niter, step, start, random_seed=123)
Applied log-transform to nu and added transformed nu_log to model.
Applied log-transform to sigma and added transformed sigma_log to model.
 [-----------------100%-----------------] 10000 of 10000 complete in 15.1 sec
In [38]:
t = trace[niter//2:]
plt.scatter(x, y, s=30, label='data')
for a_, b_ in zip(t['beta'][-100:, 1], t['beta'][-100:, 0]):
    plt.plot(x, a_*x + b_, c='gray', alpha=0.1)
plt.legend(loc='upper left')
pass
_images/16C_PyMC3_67_0.png

Logistic regression

We will look at the effect of strongly correlated variabels using a data set from Kruschke’s book.

In [39]:
df = pd.read_csv('HtWt.csv')
df.head()
Out[39]:
male height weight
0 0 63.2 168.7
1 0 68.7 169.8
2 0 64.8 176.6
3 0 67.9 246.8
4 1 68.9 151.6
In [40]:
niter = 1000
with pm.Model() as model:
    pm.glm.glm('male ~ height + weight', df, family=pm.glm.families.Binomial())
    start = pm.find_MAP()
    step = pm.Slice()
    trace = pm.sample(niter, start=start, step=step, njobs=4, random_seed=123)
Applied logodds-transform to p and added transformed p_logodds to model.
 [-----------------100%-----------------] 1000 of 1000 complete in 21.1 sec
In [41]:
df_trace = pm.trace_to_dataframe(trace[niter//2:])
pd.scatter_matrix(df_trace.ix[-niter//2:, ['height', 'weight', 'Intercept']], diagonal='kde')
plt.tight_layout()
pass
_images/16C_PyMC3_72_0.png
In [42]:
niter = 1000
with pm.Model() as model:
    pm.glm.glm('male ~ height + weight', df, family=pm.glm.families.Binomial())
    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(niter, start=start, step=step, njobs=4, random_seed=123)
Applied logodds-transform to p and added transformed p_logodds to model.
 [-----------------100%-----------------] 1000 of 1000 complete in 13.0 sec
In [43]:
df_trace = pm.trace_to_dataframe(trace[niter//2:])
pd.scatter_matrix(df_trace.ix[-niter//2:, ['height', 'weight', 'Intercept']], diagonal='kde')
plt.tight_layout()
pass
_images/16C_PyMC3_75_0.png

Logistic regression for classification

In [44]:
height, weight, intercept = df_trace[['height', 'weight', 'Intercept']].mean(0)

def predict(w, h, intercept=intercept, height=height, weight=weight):
    """Predict gender given weight (w) and height (h) values."""
    v = intercept + height*h + weight*w
    return np.exp(v)/(1+np.exp(v))

# calculate predictions on grid
xs = np.linspace(df.weight.min(), df.weight.max(), 100)
ys = np.linspace(df.height.min(), df.height.max(), 100)
X, Y = np.meshgrid(xs, ys)
Z = predict(X, Y)

plt.figure(figsize=(6,6))
# plot 0.5 contour line - classify as male if above this line
plt.contour(X, Y, Z, levels=[0.5])

# classify all subjects
colors = ['lime' if i else 'yellow' for i in df.male]
ps = predict(df.weight, df.height)
errs = ((ps < 0.5) & df.male) |((ps >= 0.5) & (1-df.male))
plt.scatter(df.weight[errs], df.height[errs], facecolors='red', s=150)
plt.scatter(df.weight, df.height, facecolors=colors, edgecolors='k', s=50, alpha=1);
plt.xlabel('Weight', fontsize=16)
plt.ylabel('Height', fontsize=16)
plt.title('Gender classification by weight and height', fontsize=16)
plt.tight_layout();
_images/16C_PyMC3_77_0.png

Estimating parameters of a logistic model

Gelman’s book has an example where the dose of a drug may be affected to the number of rat deaths in an experiment.

Dose (log g/ml) # Rats # Deaths
-0.896 5 0
-0.296 5 1
-0.053 5 3
0.727 5 5

We will model the number of deaths as a random sample from a binomial distribution, where \(n\) is the number of rats and \(p\) the probability of a rat dying. We are given \(n = 5\), but we believe that \(p\) may be related to the drug dose \(x\). As \(x\) increases the number of rats dying seems to increase, and since \(p\) is a probability, we use the following model:

\begin{align} y \sim \text{Bin}(n, p) \\ \text{logit}(p) = \alpha + \beta x \\ \alpha \sim \mathcal{N}(0, 5) \\ \beta \sim \mathcal{N}(0, 10) \end{align}

where we set vague priors for \(\alpha\) and \(\beta\), the parameters for the logistic model.

In [45]:
n = 5 * np.ones(4)
x = np.array([-0.896, -0.296, -0.053, 0.727])
y = np.array([0, 1, 3, 5])
In [46]:
def invlogit(x):
    return pm.exp(x) / (1 + pm.exp(x))

with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=5)
    beta = pm.Flat('beta')

    p = invlogit(alpha + beta*x)
    y_obs = pm.Binomial('y_obs', n=n, p=p, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS()
    trace = pm.sample(niter, step, start, random_seed=123)
 [-----------------100%-----------------] 1000 of 1000 complete in 1.1 sec
In [47]:
f = lambda a, b, xp: np.exp(a + b*xp)/(1 + np.exp(a + b*xp))

xp = np.linspace(-1, 1, 100)
a = trace.get_values('alpha').mean()
b = trace.get_values('beta').mean()
plt.plot(xp, f(a, b, xp), c='red')
plt.scatter(x, y/5, s=50);
plt.xlabel('Log does of drug')
plt.ylabel('Risk of death')
pass
_images/16C_PyMC3_82_0.png

Hierarchical model

This uses the Gelman radon data set and is based off this IPython notebook. Radon levels were measured in houses from all counties in several states. Here we want to know if the presence of a basement affects the level of radon, and if this is affected by which county the house is located in.

The data set provided is just for the state of Minnesota, which has 85 counties with 2 to 116 measurements per county. We only need 3 columns for this example county, log_radon, floor, where floor=0 indicates that there is a basement.

We will perform simple linear regression on log_radon as a function of county and floor.

In [48]:
radon = pd.read_csv('radon.csv')[['county', 'floor', 'log_radon']]
radon.head()
Out[48]:
county floor log_radon
0 AITKIN 1 0.832909
1 AITKIN 0 0.832909
2 AITKIN 0 1.098612
3 AITKIN 0 0.095310
4 ANOKA 0 1.163151

With a hierarchical model, there is an \(a_c\) and a \(b_c\) for each county \(c\) just as in the individual county model, but they are no longer independent but assumed to come from a common group distribution

\begin{align} a_c \sim \mathcal{N}(\mu_a, \sigma_a^2) \\ b_c \sim \mathcal{N}(\mu_b, \sigma_b^2) \end{align}

we further assume that the hyperparameters come from the following distributions

\begin{align} \mu_a \sim \mathcal{N}(0, 10^2) \\ \sigma_a \sim \text{|Cauchy(1)|} \\ \mu_b \sim \mathcal{N}(0, 10^2) \\ \sigma_b \sim \text{|Cauchy(1)|} \\ \end{align}
In [49]:
county = pd.Categorical(radon['county']).codes

niter = 1000
with pm.Model() as hm:
    # County hyperpriors
    mu_a = pm.Normal('mu_a', mu=0, sd=10)
    sigma_a = pm.HalfCauchy('sigma_a', beta=1)
    mu_b = pm.Normal('mu_b', mu=0, sd=10)
    sigma_b = pm.HalfCauchy('sigma_b', beta=1)

    # County slopes and intercepts
    a = pm.Normal('slope', mu=mu_a, sd=sigma_a, shape=len(set(county)))
    b = pm.Normal('intercept', mu=mu_b, sd=sigma_b, shape=len(set(county)))

    # Houseehold errors
    sigma = pm.Gamma("sigma", alpha=10, beta=1)

    # Model prediction of radon level
    mu = a[county] + b[county] * radon.floor.values

    # Data likelihood
    y = pm.Normal('y', mu=mu, sd=sigma, observed=radon.log_radon)

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    hm_trace = pm.sample(niter, step, start=start)
Applied log-transform to sigma_a and added transformed sigma_a_log to model.
Applied log-transform to sigma_b and added transformed sigma_b_log to model.
Applied log-transform to sigma and added transformed sigma_log to model.
 [-----------------100%-----------------] 1001 of 1000 complete in 7623.6 sec
In [50]:
plt.figure(figsize=(8, 60))
pm.forestplot(hm_trace, varnames=['slope', 'intercept'])
pass
_images/16C_PyMC3_87_0.png

Multivariate normal

Adapted from Fitting a Multivariate Normal Model in PyMC3 with an LKJ Prior

In [51]:
from matplotlib.patches import Ellipse
In [52]:
def make_ellipse(mu, cov, ci=0.95):
    e, v = np.linalg.eig(cov)
    angle = 180/np.pi * np.arccos(v[0, 0])
    q = stats.chi2(2).ppf(ci)
    e = Ellipse(mu, 2*np.sqrt(q*e[0]), 2*np.sqrt(q*e[1]), angle=angle)
    return e
In [53]:
np.random.seed(123)
n, p = 200, 2
mu_true = np.random.uniform(-5, 5, 2)
cov_sqrt = np.random.uniform(0,2, (2,2))
cov_true = cov_sqrt.T @ cov_sqrt
y = np.random.multivariate_normal(mu_true, cov_true, n)
In [54]:
import theano.tensor as T
In [55]:
niter = 10000
with pm.Model() as mvn:
    # priors on standard deviation
    sigma = pm.Lognormal('sigma', mu=np.zeros(2), tau=np.ones(2), shape=2)
    # prior on LKJ shape
    nu = pm.Uniform('nu', 0, 5)
    # LKJ prior for correlation matrix as upper triangular vector
    C_triu = pm.LKJCorr('C_triu', n=nu, p=2)
    # convert to matrix form
    C = T.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1)
    sigma_diag = T.nlinalg.diag(sigma)
    # indduced covariance matrix
    cov = pm.Deterministic('cov', T.nlinalg.matrix_dot(sigma_diag, C, sigma_diag))
    tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
    mu = pm.MvNormal('mu', mu=0, tau=tau, shape=2)
    Y_obs_ = pm.MvNormal('Y_obs', mu=mu, tau=tau, observed=y)

    start = pm.find_MAP()

    trace = pm.sample(niter, start=start)
Applied log-transform to sigma and added transformed sigma_log to model.
Applied interval-transform to nu and added transformed nu_interval to model.
Assigned NUTS to sigma_log
Assigned NUTS to nu_interval
Assigned NUTS to C_triu
Assigned NUTS to mu
 [-----------------100%-----------------] 10001 of 10000 complete in 559.3 sec
In [56]:
burn = niter//2
trace = trace[burn:]
mu_hat = trace['mu'].mean(0)
cov_hat = trace['cov'].mean(0)
In [57]:
e = make_ellipse(mu_hat, cov_hat)
e.set_alpha(0.5)
e.set_facecolor('gray')
e.set_zorder(10)

ax = plt.subplot(111)
ax.add_artist(e);
ax.scatter(y[:,0], y[:,1])
pass
_images/16C_PyMC3_95_0.png

Gaussian Mixture Model

In [58]:
from theano.compile.ops import  as_op
import theano.tensor as T
In [59]:
np.random.seed(1)
y = np.r_[np.random.normal(-6, 2, 500),
          np.random.normal(0, 1, 200),
          np.random.normal(4, 1, 300)]
n = y.shape[0]
In [60]:
plt.hist(y, bins=20, normed=True, alpha=0.5)
pass
_images/16C_PyMC3_99_0.png
In [61]:
@as_op(itypes=[T.dvector, T.lvector], otypes=[T.dvector])
def mu_k(mu, z):
    return mu[z]

@as_op(itypes=[T.dvector, T.lvector], otypes=[T.dvector])
def sigma_k(sigma, z):
    return sigma[z]
In [62]:
k = 3
niter = 10000
with pm.Model() as gmm:
    p = pm.Dirichlet('p', np.array([1]*k), testval=np.ones(k)/k)
    z = pm.Categorical('z', p=p, shape=n)
    mu0 = pm.Normal('mu0', mu=0, sd=5*y.std(), shape=k)
    sd0 = pm.HalfCauchy('sd0', beta=1, shape=k)
    mu = pm.Normal('mu', mu=mu0, sd=sd0, shape=k)
    sigma = pm.HalfCauchy('sigma', beta=1, shape=k)
    Y_obs = pm.Normal('Y_obs', mu=mu_k(mu, z), sd=sigma_k(sigma, z), observed=y)

    step1 = pm.ElemwiseCategoricalStep(vars=[z], values=range(k))
    step2 = pm.Metropolis(vars=[p, mu0, sd0, mu, sigma])

    trace = pm.sample(niter, step=[step1, step2])
Applied stickbreaking-transform to p and added transformed p_stickbreaking to model.
Applied log-transform to sd0 and added transformed sd0_log to model.
Applied log-transform to sigma and added transformed sigma_log to model.
 [-----------------100%-----------------] 10000 of 10000 complete in 289.2 sec
In [63]:
burn = niter//2
trace = trace[burn:]
pm.traceplot(trace, varnames=['p', 'mu', 'sigma'])
pass
_images/16C_PyMC3_102_0.png

Changepoint detection

In [64]:
early = 10
late = 12
y = np.r_[np.random.poisson(early, 25), np.random.poisson(late, 75)]
In [65]:
niter = 10000
t = range(len(y))
with pm.Model() as change_point:
    cp = pm.DiscreteUniform('change_point', lower=0, upper=len(y), testval=len(y)//2)
    mu0 = pm.Exponential('mu0', 1/y.mean())
    mu1 = pm.Exponential('mu1', 1/y.mean())
    mu = T.switch(t < cp, mu0, mu1)
    Y_obs = pm.Poisson('Y_obs', mu=mu, observed=y)
    trace = pm.sample(niter)
Applied log-transform to mu0 and added transformed mu0_log to model.
Applied log-transform to mu1 and added transformed mu1_log to model.
Assigned Metropolis to change_point
Assigned NUTS to mu0_log
Assigned NUTS to mu1_log
 [-----------------100%-----------------] 10000 of 10000 complete in 8.8 sec
In [66]:
pm.traceplot(trace, varnames=['change_point', 'mu0', 'mu1'])
pass
_images/16C_PyMC3_106_0.png

Bandits

One-arm bandit refers to slot machines in a casino. The multi-arm bandit problem is that you are faced with multiple slot machines, and they each have potentially different payoffs - perhaps there is one machine where you could potentially win a lot of money and others where you would lose money in the long run. You need to balance exploration (trying different machines) against exploitation (sticking to the best machine found so far) in an optimal way to maximize your gains. Suppose you were looking for a mate among N candidates - how do you optimize your dating strategy as a good Bayesian?

Adapted from Bayesian Methods for Hackers

In [67]:
k = 6
bandits = np.random.uniform(0, 1, k)

a = np.ones(k)
b = np.ones(k)
wins = np.zeros(k)
trials = np.zeros(k)
dist = stats.beta(a, b)

n = 1000
dists = [dist]
for _ in range(n-1):
    i = np.argmax(dist.rvs())
    print(i, end=', ')
    payoff = np.random.rand() < bandits[i]
    wins[i] += payoff
    trials[i] += 1
    dist = stats.beta(a + wins, b + trials - wins)
    dists.append(dist)

plt.figure(figsize=(4*3, 5*2))
x = np.repeat(np.linspace(0, 1, 100), k).reshape(-1,k)
for i, dist in enumerate(dists[::(n//20)], 1):
    plt.subplot(20//4,4,i)
    plt.plot(x, dist.pdf(x))
    if i==1: plt.legend(range(k))
    plt.yticks([])
    plt.xticks([])
plt.tight_layout()
0, 4, 1, 4, 5, 2, 1, 3, 3, 1, 5, 3, 2, 2, 2, 5, 0, 0, 0, 0, 0, 2, 4, 2, 2, 3, 2, 3, 2, 2, 1, 2, 1, 1, 1, 1, 4, 3, 2, 4, 2, 2, 1, 2, 2, 2, 2, 2, 4, 2, 2, 4, 4, 4, 4, 4, 4, 4, 2, 4, 2, 2, 2, 2, 4, 4, 2, 2, 2, 1, 2, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 4, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2, 4, 1, 4, 2, 2, 2, 2, 4, 4, 4, 2, 4, 2, 4, 2, 4, 1, 2, 4, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 1, 2, 2, 4, 2, 2, 2, 2, 2, 4, 2, 4, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 1, 4, 4, 4, 2, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4,
_images/16C_PyMC3_108_1.png
In [68]:
np.set_printoptions(formatter={'all': lambda x: '%5.2f' % x})
bandits
Out[68]:
array([ 0.04,  0.74,  0.76,  0.18,  0.85,  0.28])
In [69]:
%load_ext version_information
%version_information numpy, scipy, matplotlib, seaborn, pymc3
Out[69]:
SoftwareVersion
Python3.5.1 64bit [GCC 4.2.1 (Apple Inc. build 5577)]
IPython4.1.2
OSDarwin 15.3.0 x86_64 i386 64bit
numpy1.10.4
scipy0.17.0
matplotlib1.5.1
seaborn0.7.0
pymc33.0
Fri Mar 11 11:37:14 2016 EST
In [ ]: