Using PyMC3

PyMC3 is a Python package for doing MCMC using a variety of samplers, including Metropolis, Slice and Hamiltonian Monte Carlo. See Probabilistic Programming in Python using PyMC for a description. The GitHub site also has many examples and links for further exploration.

! pip install --quiet arviz
! pip install --quiet pymc3
! pip install --quiet daft
! pip install --quiet seaborn
! conda install --yes --quiet mkl-service
In [1]:
import warnings

warnings.simplefilter('ignore', UserWarning)

Other resources

Some examples are adapted from:

In [2]:
%matplotlib inline
In [3]:
import numpy as np
import numpy.random as rng
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import pymc3 as pm
import scipy.stats as stats
import daft
import arviz as az
/Users/cliburn/anaconda3/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}
/Users/cliburn/anaconda3/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)
In [4]:
import theano
theano.config.warn.round=False
In [5]:
sns.set_context('notebook')
plt.style.use('seaborn-darkgrid')

Introduction to PyMC3

Distributions in pymc3

In [6]:
print('\n'.join([d for d in dir(pm.distributions) if d[0].isupper()]))
AR
AR1
Bernoulli
Beta
BetaBinomial
Binomial
Bound
Categorical
Cauchy
ChiSquared
Constant
ConstantDist
Continuous
DensityDist
Dirichlet
Discrete
DiscreteUniform
DiscreteWeibull
Distribution
ExGaussian
Exponential
Flat
GARCH11
Gamma
GaussianRandomWalk
Geometric
Gumbel
HalfCauchy
HalfFlat
HalfNormal
HalfStudentT
Interpolated
InverseGamma
KroneckerNormal
Kumaraswamy
LKJCholeskyCov
LKJCorr
Laplace
Logistic
LogitNormal
Lognormal
MatrixNormal
Mixture
Multinomial
MvGaussianRandomWalk
MvNormal
MvStudentT
MvStudentTRandomWalk
NegativeBinomial
NoDistribution
Normal
NormalMixture
OrderedLogistic
Pareto
Poisson
Rice
SkewNormal
StudentT
TensorType
Triangular
TruncatedNormal
Uniform
VonMises
Wald
Weibull
Wishart
WishartBartlett
ZeroInflatedBinomial
ZeroInflatedNegativeBinomial
ZeroInflatedPoisson
In [7]:
d = pm.Normal.dist(mu=0, sd=1)
In [8]:
d.dist()
Out[8]:
$\text{None} \sim \text{Normal}(\mathit{mu}=0,~\mathit{sd}=1.0)$

Random samples

In [9]:
d.random(size=5)
Out[9]:
array([-1.04374916,  0.82779499,  0.63210758, -1.25758947,  0.54865016])

Log probability

In [10]:
d.logp(0).eval()
Out[10]:
array(-0.91893853)

Custom distributions

The pymc3 algorithms generally only work with the log probability of a distribution. Hence it is easy to define custom distributions to use in your models by providing a logp function.

In [11]:
def logp(x, μ=0, σ=1):
    """Normal distribtuion."""
    return -0.5*np.log(2*np.pi) - np.log(σ) - (x-μ)**2/(2*σ**2)
In [12]:
d = pm.DensityDist.dist(logp)
In [13]:
d.logp(0)
Out[13]:
-0.9189385332046727

Example: Estimating coin bias

We start with the simplest model - that of determining the bias of a coin from observed outcomes.

Setting up the model

In [14]:
n = 100
heads = 61

Analytical solution

In [15]:
a, b = 10, 10
prior = stats.beta(a, b)
post = stats.beta(heads+a, n-heads+b)
ci = post.interval(0.95)

xs = np.linspace(0, 1, 100)
plt.plot(prior.pdf(xs), label='Prior')
plt.plot(post.pdf(xs), c='green', label='Posterior')
plt.axvline(100*heads/n, c='red', alpha=0.4, label='MLE')
plt.xlim([0, 100])
plt.axhline(0.3, ci[0], ci[1], c='black', linewidth=2, label='95% CI');
plt.legend()
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_26_0.png

Graphical model

In [16]:
pgm = daft.PGM(shape=[2.5, 3.0], origin=[0, -0.5])

pgm.add_node(daft.Node("alpha", r"$\alpha$", 0.5, 2, fixed=True))
pgm.add_node(daft.Node("beta", r"$\beta$", 1.5, 2, fixed=True))
pgm.add_node(daft.Node("p", r"$p$", 1, 1))
pgm.add_node(daft.Node("n", r"$n$", 2, 0, fixed=True))
pgm.add_node(daft.Node("y", r"$y$", 1, 0, observed=True))

pgm.add_edge("alpha", "p")
pgm.add_edge("beta", "p")
pgm.add_edge("n", "y")
pgm.add_edge("p", "y")

pgm.render()
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_28_0.png

The Model context

When you specify a model, you are adding nodes to a computation graph. When executed, the graph is compiled via Theno. Hence, pymc3 uses the Model context manager to automatically add new nodes.

In [17]:
niter = 2000
with pm.Model() as coin_context:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    trace = pm.sample(niter)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1456.13draws/s]
In [18]:
coin_context
Out[18]:
$$ \begin{array}{rcl} \text{p} &\sim & \text{Beta}(\mathit{alpha}=2,~\mathit{beta}=2)\\\text{y} &\sim & \text{Binomial}(\mathit{n}=100,~\mathit{p}=\text{p}) \end{array} $$

Transformed prior variables

In [19]:
coin_context.free_RVs
Out[19]:
[p_logodds__]

Prior variables

In [20]:
coin_context.deterministics
Out[20]:
[p]

Variables in likelihood

In [21]:
coin_context.observed_RVs
Out[21]:
[y]

Under the hood

Theano

Theano builds functions as mathematical expression graphs and compiles them into C for actual computation, making use of GPU resources where available.

Performing calculations in Theano generally follows the following 3 steps (from official docs):

  • declare variables (a,b) and give their types

  • build expressions for how to put those variables together

  • compile expression graphs to functions in order to use them for computation.

In [22]:
import theano
import theano.tensor as tt
theano.config.compute_test_value = "off"

This part builds symbolic expressions.

In [23]:
a = tt.iscalar('a')
b = tt.iscalar('x')
c = a + b

This step compiles a function whose inputs are [a, b] and outputs are [c].

In [24]:
f = theano.function([a, b], [c])
In [25]:
f
Out[25]:
<theano.compile.function_module.Function at 0x1c2a3d6fd0>
In [26]:
f(3, 4)
Out[26]:
[array(7, dtype=int32)]

Within a model context,

  • when you add an unbounded variable, it is defined as a Theano variable and added to the prior part of the log posterior function

  • when you add a bounded variable, a transformed version is defined as a Theano variable and and added to the log posterior function

    • The inverse transformation is used to define the original variable - this is a deterministic variable

  • when you add an observed variable bound to data, the data is added to the likelihood part of the log posterior

See PyMC3 and Theano for details.

In [27]:
help(pm.sample)
Help on function sample in module pymc3.sampling:

sample(draws=500, step=None, init='auto', n_init=200000, start=None, trace=None, chain_idx=0, chains=None, cores=None, tune=500, nuts_kwargs=None, step_kwargs=None, progressbar=True, model=None, random_seed=None, live_plot=False, discard_tuned_samples=True, live_plot_kwargs=None, compute_convergence_checks=True, use_mmap=False, **kwargs)
    Draw samples from the posterior using the given step methods.

    Multiple step methods are supported via compound step methods.

    Parameters
    ----------
    draws : int
        The number of samples to draw. Defaults to 500. The number of tuned samples are discarded
        by default. See discard_tuned_samples.
    step : function or iterable of functions
        A step function or collection of functions. If there are variables without a step methods,
        step methods for those variables will be assigned automatically.
    init : str
        Initialization method to use for auto-assigned NUTS samplers.

        * auto : Choose a default initialization method automatically.
          Currently, this is `'jitter+adapt_diag'`, but this can change in the future.
          If you depend on the exact behaviour, choose an initialization method explicitly.
        * adapt_diag : Start with a identity mass matrix and then adapt a diagonal based on the
          variance of the tuning samples. All chains use the test value (usually the prior mean)
          as starting point.
        * jitter+adapt_diag : Same as `adapt_diag`, but add uniform jitter in [-1, 1] to the
          starting point in each chain.
        * advi+adapt_diag : Run ADVI and then adapt the resulting diagonal mass matrix based on the
          sample variance of the tuning samples.
        * advi+adapt_diag_grad : Run ADVI and then adapt the resulting diagonal mass matrix based
          on the variance of the gradients during tuning. This is **experimental** and might be
          removed in a future release.
        * advi : Run ADVI to estimate posterior mean and diagonal mass matrix.
        * advi_map: Initialize ADVI with MAP and use MAP as starting point.
        * map : Use the MAP as starting point. This is discouraged.
        * nuts : Run NUTS and estimate posterior mean and mass matrix from the trace.
    n_init : int
        Number of iterations of initializer. Only works for 'nuts' and 'ADVI'.
        If 'ADVI', number of iterations, if 'nuts', number of draws.
    start : dict, or array of dict
        Starting point in parameter space (or partial point)
        Defaults to trace.point(-1)) if there is a trace provided and model.test_point if not
        (defaults to empty dict). Initialization methods for NUTS (see `init` keyword) can
        overwrite the default. For 'SMC' if should be a list of dict with length `chains`.
    trace : backend, list, or MultiTrace
        This should be a backend instance, a list of variables to track, or a MultiTrace object
        with past values. If a MultiTrace object is given, it must contain samples for the chain
        number `chain`. If None or a list of variables, the NDArray backend is used.
        Passing either "text" or "sqlite" is taken as a shortcut to set up the corresponding
        backend (with "mcmc" used as the base name). Ignored when using 'SMC'.
    chain_idx : int
        Chain number used to store sample in backend. If `chains` is greater than one, chain
        numbers will start here. Ignored when using 'SMC'.
    chains : int
        The number of chains to sample. Running independent chains is important for some
        convergence statistics and can also reveal multiple modes in the posterior. If `None`,
        then set to either `cores` or 2, whichever is larger. For SMC the number of chains is the
        number of draws.
    cores : int
        The number of chains to run in parallel. If `None`, set to the number of CPUs in the
        system, but at most 4 (for 'SMC' defaults to 1). Keep in mind that some chains might
        themselves be multithreaded via openmp or BLAS. In those cases it might be faster to set
        this to 1.
    tune : int
        Number of iterations to tune, defaults to 500. Ignored when using 'SMC'. Samplers adjust
        the step sizes, scalings or similar during tuning. Tuning samples will be drawn in addition
        to the number specified in the `draws` argument, and will be discarded unless
        `discard_tuned_samples` is set to False.
    nuts_kwargs : dict
        Options for the NUTS sampler. See the docstring of NUTS for a complete list of options.
        Common options are:

        * target_accept: float in [0, 1]. The step size is tuned such that we approximate this
          acceptance rate. Higher values like 0.9 or 0.95 often work better for problematic
          posteriors.
        * max_treedepth: The maximum depth of the trajectory tree.
        * step_scale: float, default 0.25
          The initial guess for the step size scaled down by `1/n**(1/4)`.

        If you want to pass options to other step methods, please use `step_kwargs`.
    step_kwargs : dict
        Options for step methods. Keys are the lower case names of the step method, values are
        dicts of keyword arguments. You can find a full list of arguments in the docstring of the
        step methods. If you want to pass arguments only to nuts, you can use `nuts_kwargs`.
    progressbar : bool
        Whether or not to display a progress bar in the command line. The bar shows the percentage
        of completion, the sampling speed in samples per second (SPS), and the estimated remaining
        time until completion ("expected time of arrival"; ETA).
    model : Model (optional if in `with` context)
    random_seed : int or list of ints
        A list is accepted if `cores` is greater than one.
    live_plot : bool
        Flag for live plotting the trace while sampling. Ignored when using 'SMC'.
    live_plot_kwargs : dict
        Options for traceplot. Example: live_plot_kwargs={'varnames': ['x']}.
        Ignored when using 'SMC'
    discard_tuned_samples : bool
        Whether to discard posterior samples of the tune interval. Ignored when using 'SMC'
    compute_convergence_checks : bool, default=True
        Whether to compute sampler statistics like gelman-rubin and effective_n.
        Ignored when using 'SMC'
    use_mmap : bool, default=False
        Whether to use joblib's memory mapping to share numpy arrays when sampling across multiple
        cores. Ignored when using 'SMC'

    Returns
    -------
    trace : pymc3.backends.base.MultiTrace
        A `MultiTrace` object that contains the samples.

    Examples
    --------
    .. code:: ipython

        >>> import pymc3 as pm
        ... n = 100
        ... h = 61
        ... alpha = 2
        ... beta = 2

    .. code:: ipython

        >>> with pm.Model() as model: # context management
        ...     p = pm.Beta('p', alpha=alpha, beta=beta)
        ...     y = pm.Binomial('y', n=n, p=p, observed=h)
        ...     trace = pm.sample(2000, tune=1000, cores=4)
        >>> pm.summary(trace)
               mean        sd  mc_error   hpd_2.5  hpd_97.5
        p  0.604625  0.047086   0.00078  0.510498  0.694774

Specifying sampler (step) and multiple chains

In [28]:
with coin_context:
    step = pm.Metropolis()
    t = pm.sample(niter, step=step, chains=8, random_seed=123)
Multiprocess sampling (8 chains in 2 jobs)
Metropolis: [p]
Sampling 8 chains: 100%|██████████| 20000/20000 [00:05<00:00, 3712.51draws/s]
The number of effective samples is smaller than 25% for some parameters.

Samplers available

For continuous distributions, it is hard to beat NUTS and hence this is the default. To learn more, see A Conceptual Introduction to Hamiltonian Monte Carlo.

In [29]:
print('\n'.join(m for m in dir(pm.step_methods) if m[0].isupper()))
BinaryGibbsMetropolis
BinaryMetropolis
CSG
CategoricalGibbsMetropolis
CauchyProposal
CompoundStep
DEMetropolis
ElemwiseCategorical
EllipticalSlice
HamiltonianMC
LaplaceProposal
Metropolis
MultivariateNormalProposal
NUTS
NormalProposal
PoissonProposal
SGFS
SMC
Slice

Generally, the sampler will be automatically selected based on the type of the variable (discrete or continuous), but there are many samples that you can explicitly specify if you want to learn more about them or understand why an alternative would be better than the default for your problem.

In [30]:
niter = 2000
with pm.Model() as normal_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.Normal('y', mu=mu, sd=sd, observed=xs)

    step1 = pm.Slice(vars=mu)
    step2 = pm.Metropolis(vars=sd)

    t = pm.sample(niter, step=[step1, step2])
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Slice: [mu]
>Metropolis: [sd]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:02<00:00, 1691.48draws/s]
The number of effective samples is smaller than 10% for some parameters.
In [32]:
pm.traceplot(t)
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_55_0.png

MAP estimate

In [33]:
with pm.Model() as m:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    θ = pm.find_MAP()
/Users/cliburn/anaconda3/lib/python3.6/site-packages/pymc3/tuning/starting.py:61: UserWarning: find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.
  warnings.warn('find_MAP should not be used to initialize the NUTS sampler, simply call pymc3.sample() and it will automatically initialize NUTS in a better way.')
logp = -4.5407, ||grad|| = 11: 100%|██████████| 6/6 [00:00<00:00, 563.70it/s]
In [34]:
θ
Out[34]:
{'p_logodds__': array(0.43825493), 'p': array(0.60784314)}

Getting values from the trace

All the information about the posterior is in the trace, and it also provides statistics about the sampler.

In [35]:
help(trace)
Help on MultiTrace in module pymc3.backends.base object:

class MultiTrace(builtins.object)
 |  Main interface for accessing values from MCMC results
 |
 |  The core method to select values is `get_values`. The method
 |  to select sampler statistics is `get_sampler_stats`. Both kinds of
 |  values can also be accessed by indexing the MultiTrace object.
 |  Indexing can behave in four ways:
 |
 |  1. Indexing with a variable or variable name (str) returns all
 |     values for that variable, combining values for all chains.
 |
 |     >>> trace[varname]
 |
 |     Slicing after the variable name can be used to burn and thin
 |     the samples.
 |
 |     >>> trace[varname, 1000:]
 |
 |     For convenience during interactive use, values can also be
 |     accessed using the variable as an attribute.
 |
 |     >>> trace.varname
 |
 |  2. Indexing with an integer returns a dictionary with values for
 |     each variable at the given index (corresponding to a single
 |     sampling iteration).
 |
 |  3. Slicing with a range returns a new trace with the number of draws
 |     corresponding to the range.
 |
 |  4. Indexing with the name of a sampler statistic that is not also
 |     the name of a variable returns those values from all chains.
 |     If there is more than one sampler that provides that statistic,
 |     the values are concatenated along a new axis.
 |
 |  For any methods that require a single trace (e.g., taking the length
 |  of the MultiTrace instance, which returns the number of draws), the
 |  trace with the highest chain number is always used.
 |
 |  Methods defined here:
 |
 |  __getattr__(self, name)
 |
 |  __getitem__(self, idx)
 |
 |  __init__(self, straces)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __len__(self)
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  add_values(self, vals, overwrite=False)
 |      add variables to traces.
 |
 |      Parameters
 |      ----------
 |      vals : dict (str: array-like)
 |           The keys should be the names of the new variables. The values are expected to be
 |           array-like object. For traces with more than one chain the length of each value
 |           should match the number of total samples already in the trace (chains * iterations),
 |           otherwise a warning is raised.
 |      overwrite : bool
 |          If `False` (default) a ValueError is raised if the variable already exists.
 |          Change to `True` to overwrite the values of variables
 |
 |  get_sampler_stats(self, varname, burn=0, thin=1, combine=True, chains=None, squeeze=True)
 |      Get sampler statistics from the trace.
 |
 |      Parameters
 |      ----------
 |      varname : str
 |      sampler_idx : int or None
 |      burn : int
 |      thin : int
 |
 |      Returns
 |      -------
 |      If the `sampler_idx` is specified, return the statistic with
 |      the given name in a numpy array. If it is not specified and there
 |      is more than one sampler that provides this statistic, return
 |      a numpy array of shape (m, n), where `m` is the number of
 |      such samplers, and `n` is the number of samples.
 |
 |  get_values(self, varname, burn=0, thin=1, combine=True, chains=None, squeeze=True)
 |      Get values from traces.
 |
 |      Parameters
 |      ----------
 |      varname : str
 |      burn : int
 |      thin : int
 |      combine : bool
 |          If True, results from `chains` will be concatenated.
 |      chains : int or list of ints
 |          Chains to retrieve. If None, all chains are used. A single
 |          chain value can also be given.
 |      squeeze : bool
 |          Return a single array element if the resulting list of
 |          values only has one element. If False, the result will
 |          always be a list of arrays, even if `combine` is True.
 |
 |      Returns
 |      -------
 |      A list of NumPy arrays or a single NumPy array (depending on
 |      `squeeze`).
 |
 |  point(self, idx, chain=None)
 |      Return a dictionary of point values at `idx`.
 |
 |      Parameters
 |      ----------
 |      idx : int
 |      chain : int
 |          If a chain is not given, the highest chain number is used.
 |
 |  points(self, chains=None)
 |      Return an iterator over all or some of the sample points
 |
 |      Parameters
 |      ----------
 |      chains : list of int or N
 |          The chains whose points should be inlcuded in the iterator.  If
 |          chains is not given, include points from all chains.
 |
 |  remove_values(self, name)
 |      remove variables from traces.
 |
 |      Parameters
 |      ----------
 |      name : str
 |          Name of the variable to remove. Raises KeyError if the variable is not present
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)
 |
 |  chains
 |
 |  nchains
 |
 |  report
 |
 |  stat_names
 |
 |  varnames

In [36]:
trace.stat_names
Out[36]:
{'depth',
 'diverging',
 'energy',
 'energy_error',
 'max_energy_error',
 'mean_tree_accept',
 'model_logp',
 'step_size',
 'step_size_bar',
 'tree_size',
 'tune'}
In [37]:
sns.distplot(trace.get_sampler_stats('model_logp'))
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_63_0.png
In [38]:
p = trace.get_values('p')
p.shape
Out[38]:
(4000,)
In [39]:
trace['p'].shape
Out[39]:
(4000,)

Convert to pandas data frame for downstream processing

In [40]:
df = pm.trace_to_dataframe(trace)
df.head()
Out[40]:
p
0 0.597814
1 0.588101
2 0.576538
3 0.577302
4 0.550194

Posterior distribution

In [41]:
sns.distplot(trace['p'])
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_69_0.png

Autocorrelation plot

In [43]:
pm.autocorrplot(trace, varnames=['p'])
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_71_0.png

Calculate effective sample size

\[\hat{n}_{eff} = \frac{mn}{1 + 2 \sum_{t=1}^T \hat{\rho}_t}\]

where \(m\) is the number of chains, \(n\) the number of steps per chain, \(T\) the time when the autocorrelation first becomes negative, and \(\hat{\rho}_t\) the autocorrelation at lag \(t\).

In [44]:
pm.effective_n(trace)
Out[44]:
{'p': 1639.0804767244163}

Evaluate convergence

\[\hat{R} = \frac{\hat{V}}{W}\]

where \(W\) is the within-chain variance and \(\hat{V}\) is the posterior variance estimate for the pooled traces. Values greater than one indicate that one or more chains have not yet converged.

Discrad burn-in steps for each chain. The idea is to see if the starting values of each chain come from the same distribution as the stationary state.

  • \(W\) is the number of chains \(m \times\) the variacne of each individual chain

  • \(B\) is the number of steps \(n \times\) the variance of the chain means

  • \(\hat{V}\) is the weigthed average \((1 - \frac{1}{n})W + \frac{1}{n}B\)

The idea is that \(\hat{V}\) is an unbiased estimator of \(W\) if the starting values of each chain come from the same distribution as the stationary state. Hence if \(\hat{R}\) differs significantly from 1, there is probsbly no convergence and we need more iterations. This is done for each parameter \(\theta\).

In [45]:
pm.gelman_rubin(trace)
Out[45]:
{'p': 1.0011843767338002}

Compares mean of initial with later segments of a trace for a parameter. Should have absolute value less than 1 at convergence.

In [46]:
plt.plot(pm.geweke(trace['p'])[:,1], 'o')
plt.axhline(1, c='red')
plt.axhline(-1, c='red')
plt.gca().margins(0.05)
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_78_0.png
In [47]:
pm.summary(trace, varnames=['p'])
Out[47]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
p 0.607161 0.047751 0.001174 0.513838 0.699736 1639.080477 1.001184
In [50]:
pm.traceplot(trace, varnames=['p'])
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_82_0.png
In [53]:
pm.forestplot(trace)
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_83_0.png
In [54]:
pm.plot_posterior(trace)
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_84_0.png
In [55]:
with coin_context:
    ps = pm.sample_prior_predictive(samples=1000)
In [56]:
sns.distplot(ps['y'])
plt.axvline(heads, c='red')
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_87_0.png
In [57]:
with coin_context:
    ps = pm.sample_posterior_predictive(trace, samples=1000)
100%|██████████| 1000/1000 [00:00<00:00, 1386.64it/s]
In [58]:
sns.distplot(ps['y'])
plt.axvline(heads, c='red')
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_90_0.png

Saving traces

In [59]:
pm.save_trace(trace, 'my_trace', overwrite=True)
Out[59]:
'my_trace'

You need to re-initialize the model when reloading.

In [60]:
with pm.Model() as my_trace:
    p = pm.Beta('p', alpha=2, beta=2)
    y = pm.Binomial('y', n=n, p=p, observed=heads)
    tr = pm.load_trace('my_trace')
In [61]:
pm.summary(tr)
Out[61]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
p 0.607161 0.047751 0.001174 0.513838 0.699736 1639.080477 1.001184

It is probably a good practice to make model reuse convenient

In [62]:
def build_model():
    with pm.Model() as m:
        p = pm.Beta('p', alpha=2, beta=2)
        y = pm.Binomial('y', n=n, p=p, observed=heads)
    return m
In [63]:
m = build_model()
In [64]:
with m:
    tr1 = pm.load_trace('my_trace')
In [65]:
pm.summary(tr1)
Out[65]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
p 0.607161 0.047751 0.001174 0.513838 0.699736 1639.080477 1.001184

Sampling from prior

Just omit the observed= argument.

In [66]:
with pm.Model() as prior_context:
    sigma = pm.Gamma('sigma', alpha=2.0, beta=1.0)
    mu = pm.Normal('mu', mu=0, sd=sigma)
    trace = pm.sample(niter, step=pm.Metropolis())
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [mu]
>Metropolis: [sigma]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:01<00:00, 2753.43draws/s]
The number of effective samples is smaller than 10% for some parameters.
In [68]:
pm.traceplot(trace, varnames=['mu', 'sigma'])
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_103_0.png

Sampling from posterior

In [69]:
niter = 2000
with pm.Model() as normal_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.Normal('y', mu=mu, sd=sd, observed=xs)
    trace = pm.sample(niter)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1307.56draws/s]
In [70]:
hpd = pm.hpd(trace['mu'], alpha=0.05)
hpd
Out[70]:
array([0.44162207, 0.55971851])
In [72]:
ax = pm.traceplot(trace, varnames=['mu'],)

ymin, ymax = ax[0,0].get_ylim()
y = ymin + 0.05*(ymax-ymin)
ax[0, 0].plot(hpd, [y,y], c='red')
pass
notebook/../../build/doctrees/nbsphinx/notebook_S15A_PyMC3_108_0.png

Evaluating goodness-of-fit

DIC, WAIC and BPIC are approximations to the out-of-sample error and can be used for model comparison. Likelihood is dependent on model complexity and should not be used for model comparisons.

In [73]:
post_mean = pm.summary(trace, varnames=trace.varnames)['mean']
post_mean
Out[73]:
mu          0.500164
sd_log__   -1.218045
sd          0.296556
Name: mean, dtype: float64
In [74]:
normal_context.logp(post_mean)
Out[74]:
array(-26.57766616)
In [75]:
with normal_context:
    print(pm.loo(trace))
/Users/cliburn/anaconda3/lib/python3.6/site-packages/pymc3/stats.py:167: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  return np.stack(logp)
LOO_r(LOO=40.743463301462626, LOO_se=8.891566223871754, p_LOO=1.3902023282742455, shape_warn=0)
In [76]:
with normal_context:
    print(pm.waic(trace))
WAIC_r(WAIC=40.74205569255692, WAIC_se=8.891356706673374, p_WAIC=1.3894985238213904, var_warn=0)

Using a custom likelihood

In [77]:
def logp(x, μ=0, σ=1):
    """Normal distribtuion."""
    return -0.5*np.log(2*np.pi) - np.log(σ) - (x-μ)**2/(2*σ**2)
In [78]:
with pm.Model() as prior_context:
    mu = pm.Normal('mu', mu=0, sd=100)
    sd = pm.HalfCauchy('sd', beta=2)
    y = pm.DensityDist('y', logp, observed=dict(x=xs, μ=mu, σ=sd))
    custom_trace = pm.sample(niter)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, mu]
Sampling 2 chains: 100%|██████████| 5000/5000 [00:03<00:00, 1316.85draws/s]
In [79]:
pm.trace_to_dataframe(custom_trace).mean()
Out[79]:
mu    0.499692
sd    0.296759
dtype: float64

Variational methods available

To use a variational method, use pm.fit instead of pm.sample. We’ll see examples of usage in another notebook.

In [80]:
print('\n'.join(m for m in dir(pm.variational) if m[0].isupper()))
ADVI
ASVGD
Approximation
Empirical
FullRank
FullRankADVI
Group
ImplicitGradient
Inference
KLqp
MeanField
NFVI
NormalizingFlow
SVGD
Stein
In [ ]: