Using PyMC3¶

PyMC3 is a Python package for doing MCMC using a variety of samplers, including Metropolis, Slice and Hamiltonian Monte Carlo. See Probabilistic Programming in Python using PyMC for a description. The GitHub site also has many examples and links for further exploration.

! pip install --quiet arviz
! pip install --quiet pymc3
! pip install --quiet daft
! pip install --quiet seaborn

! conda install --yes --quiet mkl-service

In [1]:

import warnings

warnings.simplefilter('ignore', UserWarning)

Other resources

Some examples are adapted from:

In [2]:

%matplotlib inline

In [3]:

import numpy as np
import numpy.random as rng
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import pymc3 as pm
import scipy.stats as stats
import daft
import arviz as az

/Users/cliburn/anaconda3/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}
/Users/cliburn/anaconda3/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)

In [4]:

import theano
theano.config.warn.round=False

In [5]:

sns.set_context('notebook')
plt.style.use('seaborn-darkgrid')

Introduction to PyMC3¶

Distributions in pymc3¶

In [6]:

print('\n'.join([d for d in dir(pm.distributions) if d[0].isupper()]))

AR
AR1
Bernoulli
Beta
BetaBinomial
Binomial
Bound
Categorical
Cauchy
ChiSquared
Constant
ConstantDist
Continuous
DensityDist
Dirichlet
Discrete
DiscreteUniform
DiscreteWeibull
Distribution
ExGaussian
Exponential
Flat
GARCH11
Gamma
GaussianRandomWalk
Geometric
Gumbel
HalfCauchy
HalfFlat
HalfNormal
HalfStudentT
Interpolated
InverseGamma
KroneckerNormal
Kumaraswamy
LKJCholeskyCov
LKJCorr
Laplace
Logistic
LogitNormal
Lognormal
MatrixNormal
Mixture
Multinomial
MvGaussianRandomWalk
MvNormal
MvStudentT
MvStudentTRandomWalk
NegativeBinomial
NoDistribution
Normal
NormalMixture
OrderedLogistic
Pareto
Poisson
Rice
SkewNormal
StudentT
TensorType
Triangular
TruncatedNormal
Uniform
VonMises
Wald
Weibull
Wishart
WishartBartlett
ZeroInflatedBinomial
ZeroInflatedNegativeBinomial
ZeroInflatedPoisson

In [7]:

d = pm.Normal.dist(mu=0, sd=1)

In [8]:

d.dist()

Out[8]:

$\text{None} \sim \text{Normal}(\mathit{mu}=0,~\mathit{sd}=1.0)$

Random samples

In [9]:

d.random(size=5)

Out[9]:

array([-1.04374916,  0.82779499,  0.63210758, -1.25758947,  0.54865016])

Log probability

In [10]:

d.logp(0).eval()

Out[10]:

array(-0.91893853)

Custom distributions¶

The pymc3 algorithms generally only work with the log probability of a distribution. Hence it is easy to define custom distributions to use in your models by providing a logp function.

In [11]:

def logp(x, μ=0, σ=1):
    """Normal distribtuion."""
    return -0.5*np.log(2*np.pi) - np.log(σ) - (x-μ)**2/(2*σ**2)

In [12]:

d = pm.DensityDist.dist(logp)

In [13]:

d.logp(0)

Out[13]:

-0.9189385332046727

Example: Estimating coin bias¶

We start with the simplest model - that of determining the bias of a coin from observed outcomes.

Setting up the model¶

In [14]:

n = 100
heads = 61

Analytical solution¶

In [15]:

a, b = 10, 10
prior = stats.beta(a, b)
post = stats.beta(heads+a, n-heads+b)
ci = post.interval(0.95)

xs = np.linspace(0, 1, 100)
plt.plot(prior.pdf(xs), label='Prior')
plt.plot(post.pdf(xs), c='green', label='Posterior')
plt.axvline(100*heads/n, c='red', alpha=0.4, label='MLE')
plt.xlim([0, 100])
plt.axhline(0.3, ci[0], ci[1], c='black', linewidth=2, label='95% CI');
plt.legend()
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_26_0.png

Graphical model¶

In [16]:

pgm = daft.PGM(shape=[2.5, 3.0], origin=[0, -0.5])

pgm.add_node(daft.Node("alpha", r"$\alpha$", 0.5, 2, fixed=True))
pgm.add_node(daft.Node("beta", r"$\beta$", 1.5, 2, fixed=True))
pgm.add_node(daft.Node("p", r"$p$", 1, 1))
pgm.add_node(daft.Node("n", r"$n$", 2, 0, fixed=True))
pgm.add_node(daft.Node("y", r"$y$", 1, 0, observed=True))

pgm.add_edge("alpha", "p")
pgm.add_edge("beta", "p")
pgm.add_edge("n", "y")
pgm.add_edge("p", "y")

pgm.render()
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_28_0.png

The Model context¶

When you specify a model, you are adding nodes to a computation graph. When executed, the graph is compiled via Theno. Hence, pymc3 uses the Model context manager to automatically add new nodes.

In [17]:

niter = 2000
with pm.Model() as coin_context:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    trace = pm.sample(niter)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1456.13draws/s]

In [18]:

coin_context

Out[18]:

$$ \begin{array}{rcl} \text{p} &\sim & \text{Beta}(\mathit{alpha}=2,~\mathit{beta}=2)\\\text{y} &\sim & \text{Binomial}(\mathit{n}=100,~\mathit{p}=\text{p}) \end{array} $$

Transformed prior variables¶

In [19]:

coin_context.free_RVs

Out[19]:

[p_logodds__]

Prior variables¶

In [20]:

coin_context.deterministics

Out[20]:

[p]

Variables in likelihood¶

In [21]:

coin_context.observed_RVs

Out[21]:

[y]

Under the hood¶

Theano¶

Theano builds functions as mathematical expression graphs and compiles them into C for actual computation, making use of GPU resources where available.

Performing calculations in Theano generally follows the following 3 steps (from official docs):

declare variables (a,b) and give their types
build expressions for how to put those variables together
compile expression graphs to functions in order to use them for computation.

In [22]:

import theano
import theano.tensor as tt
theano.config.compute_test_value = "off"

This part builds symbolic expressions.

In [23]:

a = tt.iscalar('a')
b = tt.iscalar('x')
c = a + b

This step compiles a function whose inputs are [a, b] and outputs are [c].

In [24]:

f = theano.function([a, b], [c])

In [25]:

Out[25]:

<theano.compile.function_module.Function at 0x1c2a3d6fd0>

In [26]:

f(3, 4)

Out[26]:

[array(7, dtype=int32)]

Within a model context,

when you add an unbounded variable, it is defined as a Theano variable and added to the prior part of the log posterior function
when you add a bounded variable, a transformed version is defined as a Theano variable and and added to the log posterior function
- The inverse transformation is used to define the original variable - this is a deterministic variable
when you add an observed variable bound to data, the data is added to the likelihood part of the log posterior

See PyMC3 and Theano for details.

In [27]:

help(pm.sample)

Help on function sample in module pymc3.sampling:

sample(draws=500, step=None, init='auto', n_init=200000, start=None, trace=None, chain_idx=0, chains=None, cores=None, tune=500, nuts_kwargs=None, step_kwargs=None, progressbar=True, model=None, random_seed=None, live_plot=False, discard_tuned_samples=True, live_plot_kwargs=None, compute_convergence_checks=True, use_mmap=False, **kwargs)
Draw samples from the posterior using the given step methods.

Multiple step methods are supported via compound step methods.

Parameters
----------
draws : int
The number of samples to draw. Defaults to 500. The number of tuned samples are discarded
by default. See discard_tuned_samples.
step : function or iterable of functions
A step function or collection of functions. If there are variables without a step methods,
step methods for those variables will be assigned automatically.
init : str
Initialization method to use for auto-assigned NUTS samplers.

* auto : Choose a default initialization method automatically.
Currently, this is `'jitter+adapt_diag'`, but this can change in the future.
If you depend on the exact behaviour, choose an initialization method explicitly.
* adapt_diag : Start with a identity mass matrix and then adapt a diagonal based on the
variance of the tuning samples. All chains use the test value (usually the prior mean)
as starting point.
* jitter+adapt_diag : Same as `adapt_diag`, but add uniform jitter in [-1, 1] to the
starting point in each chain.
* advi+adapt_diag : Run ADVI and then adapt the resulting diagonal mass matrix based on the
sample variance of the tuning samples.
* advi+adapt_diag_grad : Run ADVI and then adapt the resulting diagonal mass matrix based
on the variance of the gradients during tuning. This is **experimental** and might be
removed in a future release.
* advi : Run ADVI to estimate posterior mean and diagonal mass matrix.
* advi_map: Initialize ADVI with MAP and use MAP as starting point.
* map : Use the MAP as starting point. This is discouraged.
* nuts : Run NUTS and estimate posterior mean and mass matrix from the trace.
n_init : int
Number of iterations of initializer. Only works for 'nuts' and 'ADVI'.
If 'ADVI', number of iterations, if 'nuts', number of draws.
start : dict, or array of dict
Starting point in parameter space (or partial point)
Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not
(defaults to empty dict). Initialization methods for NUTS (see `init` keyword) can
overwrite the default. For 'SMC' if should be a list of dict with length `chains`.
trace : backend, list, or MultiTrace
This should be a backend instance, a list of variables to track, or a MultiTrace object
with past values. If a MultiTrace object is given, it must contain samples for the chain
number `chain`. If None or a list of variables, the NDArray backend is used.
Passing either "text" or "sqlite" is taken as a shortcut to set up the corresponding
backend (with "mcmc" used as the base name). Ignored when using 'SMC'.
chain_idx : int
Chain number used to store sample in backend. If `chains` is greater than one, chain
numbers will start here. Ignored when using 'SMC'.
chains : int
The number of chains to sample. Running independent chains is important for some
convergence statistics and can also reveal multiple modes in the posterior. If `None`,
then set to either `cores` or 2, whichever is larger. For SMC the number of chains is the
number of draws.
cores : int
The number of chains to run in parallel. If `None`, set to the number of CPUs in the
system, but at most 4 (for 'SMC' defaults to 1). Keep in mind that some chains might
themselves be multithreaded via openmp or BLAS. In those cases it might be faster to set
this to 1.
tune : int
Number of iterations to tune, defaults to 500. Ignored when using 'SMC'. Samplers adjust
the step sizes, scalings or similar during tuning. Tuning samples will be drawn in addition
to the number specified in the `draws` argument, and will be discarded unless
`discard_tuned_samples` is set to False.
nuts_kwargs : dict
Options for the NUTS sampler. See the docstring of NUTS for a complete list of options.
Common options are:

* target_accept: float in [0, 1]. The step size is tuned such that we approximate this
acceptance rate. Higher values like 0.9 or 0.95 often work better for problematic
posteriors.
* max_treedepth: The maximum depth of the trajectory tree.
* step_scale: float, default 0.25
The initial guess for the step size scaled down by `1/n**(1/4)`.

If you want to pass options to other step methods, please use `step_kwargs`.
step_kwargs : dict
Options for step methods. Keys are the lower case names of the step method, values are
dicts of keyword arguments. You can find a full list of arguments in the docstring of the
step methods. If you want to pass arguments only to nuts, you can use `nuts_kwargs`.
progressbar : bool
Whether or not to display a progress bar in the command line. The bar shows the percentage
of completion, the sampling speed in samples per second (SPS), and the estimated remaining
time until completion ("expected time of arrival"; ETA).
model : Model (optional if in `with` context)
random_seed : int or list of ints
A list is accepted if `cores` is greater than one.
live_plot : bool
Flag for live plotting the trace while sampling. Ignored when using 'SMC'.
live_plot_kwargs : dict
Options for traceplot. Example: live_plot_kwargs={'varnames': ['x']}.
Ignored when using 'SMC'
discard_tuned_samples : bool
Whether to discard posterior samples of the tune interval. Ignored when using 'SMC'
compute_convergence_checks : bool, default=True
Whether to compute sampler statistics like gelman-rubin and effective_n.
Ignored when using 'SMC'
use_mmap : bool, default=False
Whether to use joblib's memory mapping to share numpy arrays when sampling across multiple
cores. Ignored when using 'SMC'

Returns
-------
trace : pymc3.backends.base.MultiTrace
A `MultiTrace` object that contains the samples.

Examples
--------
.. code:: ipython

>>> import pymc3 as pm
... n = 100
... h = 61
... alpha = 2
... beta = 2

.. code:: ipython

>>> with pm.Model() as model: # context management
... p = pm.Beta('p', alpha=alpha, beta=beta)
... y = pm.Binomial('y', n=n, p=p, observed=h)
... trace = pm.sample(2000, tune=1000, cores=4)
>>> pm.summary(trace)
mean sd mc_error hpd_2.5 hpd_97.5
p 0.604625 0.047086 0.00078 0.510498 0.694774

Specifying sampler (step) and multiple chains¶

In [28]:

with coin_context:
    step = pm.Metropolis()
    t = pm.sample(niter, step=step, chains=8, random_seed=123)

Multiprocess sampling (8 chains in 2 jobs)
Metropolis: [p]
Sampling 8 chains: 100%|██████████| 20000/20000 [00:05<00:00, 3712.51draws/s]
The number of effective samples is smaller than 25% for some parameters.

Samplers available¶

For continuous distributions, it is hard to beat NUTS and hence this is the default. To learn more, see A Conceptual Introduction to Hamiltonian Monte Carlo.

In [29]:

print('\n'.join(m for m in dir(pm.step_methods) if m[0].isupper()))

BinaryGibbsMetropolis
BinaryMetropolis
CSG
CategoricalGibbsMetropolis
CauchyProposal
CompoundStep
DEMetropolis
ElemwiseCategorical
EllipticalSlice
HamiltonianMC
LaplaceProposal
Metropolis
MultivariateNormalProposal
NUTS
NormalProposal
PoissonProposal
SGFS
SMC
Slice

Generally, the sampler will be automatically selected based on the type of the variable (discrete or continuous), but there are many samples that you can explicitly specify if you want to learn more about them or understand why an alternative would be better than the default for your problem.

In [30]:

niter = 2000
with pm.Model() as normal_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.Normal('y', mu=mu, sd=sd, observed=xs)

    step1 = pm.Slice(vars=mu)
    step2 = pm.Metropolis(vars=sd)

    t = pm.sample(niter, step=[step1, step2])

Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Slice: [mu]
>Metropolis: [sd]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:02<00:00, 1691.48draws/s]
The number of effective samples is smaller than 10% for some parameters.

In [32]:

pm.traceplot(t)
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_55_0.png

MAP estimate¶

In [33]:

with pm.Model() as m:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    θ = pm.find_MAP()

/Users/cliburn/anaconda3/lib/python3.6/site-packages/pymc3/tuning/starting.py:61: UserWarning: find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.
  warnings.warn('find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.')
logp = -4.5407, ||grad|| = 11: 100%|██████████| 6/6 [00:00<00:00, 563.70it/s]

In [34]:

θ

Out[34]:

{'p_logodds__': array(0.43825493), 'p': array(0.60784314)}

Getting values from the trace¶

All the information about the posterior is in the trace, and it also provides statistics about the sampler.

In [35]:

help(trace)

Help on MultiTrace in module pymc3.backends.base object:

class MultiTrace(builtins.object)
 |  Main interface for accessing values from MCMC results
 |
 |  The core method to select values is `get_values`. The method
 |  to select sampler statistics is `get_sampler_stats`. Both kinds of
 |  values can also be accessed by indexing the MultiTrace object.
 |  Indexing can behave in four ways:
 |
 |  1. Indexing with a variable or variable name (str) returns all
 |     values for that variable, combining values for all chains.
 |
 |     >>> trace[varname]
 |
 |     Slicing after the variable name can be used to burn and thin
 |     the samples.
 |
 |     >>> trace[varname, 1000:]
 |
 |     For convenience during interactive use, values can also be
 |     accessed using the variable as an attribute.
 |
 |     >>> trace.varname
 |
 |  2. Indexing with an integer returns a dictionary with values for
 |     each variable at the given index (corresponding to a single
 |     sampling iteration).
 |
 |  3. Slicing with a range returns a new trace with the number of draws
 |     corresponding to the range.
 |
 |  4. Indexing with the name of a sampler statistic that is not also
 |     the name of a variable returns those values from all chains.
 |     If there is more than one sampler that provides that statistic,
 |     the values are concatenated along a new axis.
 |
 |  For any methods that require a single trace (e.g., taking the length
 |  of the MultiTrace instance, which returns the number of draws), the
 |  trace with the highest chain number is always used.
 |
 |  Methods defined here:
 |
 |  __getattr__(self, name)
 |
 |  __getitem__(self, idx)
 |
 |  __init__(self, straces)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __len__(self)
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  add_values(self, vals, overwrite=False)
 |      add variables to traces.
 |
 |      Parameters
 |      ----------
 |      vals : dict (str: array-like)
 |           The keys should be the names of the new variables. The values are expected to be
 |           array-like object. For traces with more than one chain the length of each value
 |           should match the number of total samples already in the trace (chains * iterations),
 |           otherwise a warning is raised.
 |      overwrite : bool
 |          If `False` (default) a ValueError is raised if the variable already exists.
 |          Change to `True` to overwrite the values of variables
 |
 |  get_sampler_stats(self, varname, burn=0, thin=1, combine=True, chains=None, squeeze=True)
 |      Get sampler statistics from the trace.
 |
 |      Parameters
 |      ----------
 |      varname : str
 |      sampler_idx : int or None
 |      burn : int
 |      thin : int
 |
 |      Returns
 |      -------
 |      If the `sampler_idx` is specified, return the statistic with
 |      the given name in a numpy array. If it is not specified and there
 |      is more than one sampler that provides this statistic, return
 |      a numpy array of shape (m, n), where `m` is the number of
 |      such samplers, and `n` is the number of samples.
 |
 |  get_values(self, varname, burn=0, thin=1, combine=True, chains=None, squeeze=True)
 |      Get values from traces.
 |
 |      Parameters
 |      ----------
 |      varname : str
 |      burn : int
 |      thin : int
 |      combine : bool
 |          If True, results from `chains` will be concatenated.
 |      chains : int or list of ints
 |          Chains to retrieve. If None, all chains are used. A single
 |          chain value can also be given.
 |      squeeze : bool
 |          Return a single array element if the resulting list of
 |          values only has one element. If False, the result will
 |          always be a list of arrays, even if `combine` is True.
 |
 |      Returns
 |      -------
 |      A list of NumPy arrays or a single NumPy array (depending on
 |      `squeeze`).
 |
 |  point(self, idx, chain=None)
 |      Return a dictionary of point values at `idx`.
 |
 |      Parameters
 |      ----------
 |      idx : int
 |      chain : int
 |          If a chain is not given, the highest chain number is used.
 |
 |  points(self, chains=None)
 |      Return an iterator over all or some of the sample points
 |
 |      Parameters
 |      ----------
 |      chains : list of int or N
 |          The chains whose points should be inlcuded in the iterator.  If
 |          chains is not given, include points from all chains.
 |
 |  remove_values(self, name)
 |      remove variables from traces.
 |
 |      Parameters
 |      ----------
 |      name : str
 |          Name of the variable to remove. Raises KeyError if the variable is not present
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)
 |
 |  chains
 |
 |  nchains
 |
 |  report
 |
 |  stat_names
 |
 |  varnames

In [36]:

trace.stat_names

Out[36]:

{'depth',
 'diverging',
 'energy',
 'energy_error',
 'max_energy_error',
 'mean_tree_accept',
 'model_logp',
 'step_size',
 'step_size_bar',
 'tree_size',
 'tune'}

In [37]:

sns.distplot(trace.get_sampler_stats('model_logp'))
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_63_0.png

In [38]:

p = trace.get_values('p')
p.shape

Out[38]:

(4000,)

In [39]:

trace['p'].shape

Out[39]:

(4000,)

Convert to `pandas` data frame for downstream processing¶

In [40]:

df = pm.trace_to_dataframe(trace)
df.head()

Out[40]:

	p
0	0.597814
1	0.588101
2	0.576538
3	0.577302
4	0.550194

Posterior distribution¶

In [41]:

sns.distplot(trace['p'])
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_69_0.png

Autocorrelation plot¶

In [43]:

pm.autocorrplot(trace, varnames=['p'])
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_71_0.png

Calculate effective sample size¶

\[\hat{n}_{eff} = \frac{mn}{1 + 2 \sum_{t=1}^T \hat{\rho}_t}\]

where $m$ is the number of chains, $n$ the number of steps per chain, $T$ the time when the autocorrelation first becomes negative, and $\hat{\rho}_t$ the autocorrelation at lag $t$.

In [44]:

pm.effective_n(trace)

Out[44]:

{'p': 1639.0804767244163}

Evaluate convergence¶

\[\hat{R} = \frac{\hat{V}}{W}\]

where $W$ is the within-chain variance and $\hat{V}$ is the posterior variance estimate for the pooled traces. Values greater than one indicate that one or more chains have not yet converged.

Discrad burn-in steps for each chain. The idea is to see if the starting values of each chain come from the same distribution as the stationary state.

$W$ is the number of chains $m \times$ the variacne of each individual chain
$B$ is the number of steps $n \times$ the variance of the chain means
$\hat{V}$ is the weigthed average $(1 - \frac{1}{n})W + \frac{1}{n}B$

The idea is that $\hat{V}$ is an unbiased estimator of $W$ if the starting values of each chain come from the same distribution as the stationary state. Hence if $\hat{R}$ differs significantly from 1, there is probsbly no convergence and we need more iterations. This is done for each parameter $\theta$.

In [45]:

pm.gelman_rubin(trace)

Out[45]:

{'p': 1.0011843767338002}

Compares mean of initial with later segments of a trace for a parameter. Should have absolute value less than 1 at convergence.

In [46]:

plt.plot(pm.geweke(trace['p'])[:,1], 'o')
plt.axhline(1, c='red')
plt.axhline(-1, c='red')
plt.gca().margins(0.05)
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_78_0.png

In [47]:

pm.summary(trace, varnames=['p'])

Out[47]:

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
p	0.607161	0.047751	0.001174	0.513838	0.699736	1639.080477	1.001184

In [50]:

pm.traceplot(trace, varnames=['p'])
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_82_0.png

In [53]:

pm.forestplot(trace)
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_83_0.png

In [54]:

pm.plot_posterior(trace)
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_84_0.png

In [55]:

with coin_context:
    ps = pm.sample_prior_predictive(samples=1000)

In [56]:

sns.distplot(ps['y'])
plt.axvline(heads, c='red')
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_87_0.png

In [57]:

with coin_context:
    ps = pm.sample_posterior_predictive(trace, samples=1000)

100%|██████████| 1000/1000 [00:00<00:00, 1386.64it/s]

In [58]:

sns.distplot(ps['y'])
plt.axvline(heads, c='red')
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_90_0.png

Saving traces¶

In [59]:

pm.save_trace(trace, 'my_trace', overwrite=True)

Out[59]:

'my_trace'

You need to re-initialize the model when reloading.

In [60]:

with pm.Model() as my_trace:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    tr = pm.load_trace('my_trace')

In [61]:

pm.summary(tr)

Out[61]:

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
p	0.607161	0.047751	0.001174	0.513838	0.699736	1639.080477	1.001184

It is probably a good practice to make model reuse convenient

In [62]:

def build_model():
    with pm.Model() as m:
        p = pm.Beta('p', alpha=2, beta=2)
        y = pm.Binomial('y', n=n, p=p, observed=heads)
    return m

In [63]:

m = build_model()

In [64]:

with m:
    tr1 = pm.load_trace('my_trace')

In [65]:

pm.summary(tr1)

Out[65]:

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
p	0.607161	0.047751	0.001174	0.513838	0.699736	1639.080477	1.001184

Sampling from prior¶

Just omit the observed= argument.

In [66]:

with pm.Model() as prior_context:
    sigma = pm.Gamma('sigma', alpha=2.0, beta=1.0)
    mu = pm.Normal('mu', mu=0, sd=sigma)
    trace = pm.sample(niter, step=pm.Metropolis())

Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [mu]
>Metropolis: [sigma]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:01<00:00, 2753.43draws/s]
The number of effective samples is smaller than 10% for some parameters.

In [68]:

pm.traceplot(trace, varnames=['mu', 'sigma'])
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_103_0.png

Sampling from posterior¶

In [69]:

niter = 2000
with pm.Model() as normal_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.Normal('y', mu=mu, sd=sd, observed=xs)
    trace = pm.sample(niter)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1307.56draws/s]

In [70]:

hpd = pm.hpd(trace['mu'], alpha=0.05)
hpd

Out[70]:

array([0.44162207, 0.55971851])

In [72]:

ax = pm.traceplot(trace, varnames=['mu'],)

ymin, ymax = ax[0,0].get_ylim()
y = ymin + 0.05*(ymax-ymin)
ax[0, 0].plot(hpd, [y,y], c='red')
pass

notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_108_0.png

Evaluating goodness-of-fit¶

DIC, WAIC and BPIC are approximations to the out-of-sample error and can be used for model comparison. Likelihood is dependent on model complexity and should not be used for model comparisons.

In [73]:

post_mean = pm.summary(trace, varnames=trace.varnames)['mean']
post_mean

Out[73]:

mu          0.500164
sd_log__   -1.218045
sd          0.296556
Name: mean, dtype: float64

In [74]:

normal_context.logp(post_mean)

Out[74]:

array(-26.57766616)

In [75]:

with normal_context:
    print(pm.loo(trace))

/Users/cliburn/anaconda3/lib/python3.6/site-packages/pymc3/stats.py:167: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  return np.stack(logp)

LOO_r(LOO=40.743463301462626, LOO_se=8.891566223871754, p_LOO=1.3902023282742455, shape_warn=0)

In [76]:

with normal_context:
    print(pm.waic(trace))

WAIC_r(WAIC=40.74205569255692, WAIC_se=8.891356706673374, p_WAIC=1.3894985238213904, var_warn=0)

Using a custom likelihood¶

In [77]:

def logp(x, μ=0, σ=1):
    """Normal distribtuion."""
    return -0.5*np.log(2*np.pi) - np.log(σ) - (x-μ)**2/(2*σ**2)

In [78]:

with pm.Model() as prior_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.DensityDist('y', logp, observed=dict(x=xs, μ=mu, σ=sd))
    custom_trace = pm.sample(niter)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1316.85draws/s]

In [79]:

pm.trace_to_dataframe(custom_trace).mean()

Out[79]:

mu    0.499692
sd    0.296759
dtype: float64

Variational methods available¶

To use a variational method, use pm.fit instead of pm.sample. We’ll see examples of usage in another notebook.

In [80]:

print('\n'.join(m for m in dir(pm.variational) if m[0].isupper()))

ADVI
ASVGD
Approximation
Empirical
FullRank
FullRankADVI
Group
ImplicitGradient
Inference
KLqp
MeanField
NFVI
NormalizingFlow
SVGD
Stein

In [ ]:

Using PyMC3¶

Introduction to PyMC3¶

Distributions in pymc3¶

Custom distributions¶

Example: Estimating coin bias¶

Setting up the model¶

Analytical solution¶

Graphical model¶

The Model context¶

Transformed prior variables¶

Prior variables¶

Variables in likelihood¶

Under the hood¶

Theano¶

Specifying sampler (step) and multiple chains¶

Samplers available¶

MAP estimate¶

Getting values from the trace¶

Convert to `pandas` data frame for downstream processing¶

Posterior distribution¶

Autocorrelation plot¶

Calculate effective sample size¶

Evaluate convergence¶

Saving traces¶

Sampling from prior¶

Sampling from posterior¶

Evaluating goodness-of-fit¶

Using a custom likelihood¶

Variational methods available¶

Page contents

Previous page

Next page

This Page

Using PyMC3¶

Introduction to PyMC3¶

Distributions in pymc3¶

Custom distributions¶

Example: Estimating coin bias¶

Setting up the model¶

Analytical solution¶

Graphical model¶

The Model context¶

Transformed prior variables¶

Prior variables¶

Variables in likelihood¶

Under the hood¶

Theano¶

Specifying sampler (step) and multiple chains¶

Samplers available¶

MAP estimate¶

Getting values from the trace¶

Convert to pandas data frame for downstream processing¶

Posterior distribution¶

Autocorrelation plot¶

Calculate effective sample size¶

Evaluate convergence¶

Saving traces¶

Sampling from prior¶

Sampling from posterior¶

Evaluating goodness-of-fit¶

Using a custom likelihood¶

Variational methods available¶

Convert to `pandas` data frame for downstream processing¶