Bayesian Inference With Pymc - 101

This is a simple example that shows Bayesian inference workflow with Pymc.
- Generate a Gaussian Distribution with
- Take a random sample from this distribution.
- Estimate with 95% confidence.
- Compare estimated parameters with true parameters.

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import arviz as az

RANDOM_SEED = 1235811
rng = np.random.default_rng(RANDOM_SEED)
print(f"Runing on PyMC3 v{pm.__version__}")
Runing on PyMC3 v3.11.4

Generate data from

mu = 0.5
stddev = 0.35
true_y = rng.normal(mu, stddev, 10000)
print(f"True mean = {np.round(true_y.mean(),2)}")
print(f"True standard deviation = {np.round(true_y.std(), 2)}")
True mean = 0.5
True standard deviation = 0.35

Plot true distribution

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(13,5))
ax1.plot(true_y)
ax1.set_title("True y Values")
ax1.set_ylabel("y")
ax2.hist(true_y, 50)
ax2.set_ylabel("frequency")
ax2.set_title("True y histogram")
plt.show()

png

Set Priors and check Prior Distribution

We need context to be able to set informed priors even if they are weak. Let's say, y values in this data correspond to error we get from some measurement. From our experience, we know perhaps error data follows a Gaussian like distribution. We can start with this belief/prior, and check at every step this assumption holds.

You can get characters such as by writing \sigma in a code cell and then hitting the Tab key.

with pm.Model() as model:
    # Set Prior
    μ = pm.Uniform('μ', lower=-1, upper=3)
    σ = pm.HalfNormal("σ", sigma=1)    
    y =  pm.Normal('y', mu=μ, sd=σ)    

    # Generates a distribution conditioned on priors
    trace = pm.Normal('values', y)

    # Samples from prior predictive distribution as above
    trace = pm.sample_prior_predictive(1000, random_seed=RANDOM_SEED)
    prior_y = trace["y"]
    prior_mu = trace["μ"]
    prior_stddev = trace["σ"]

    # Graph the distribution for each to make sure the model makes sense    
    f, ax = plt.subplots(2, 2, figsize=(13,7))
    ax[0,0].plot(prior_y)
    ax[0,0].set_title("Prior y Values")
    ax[0,0].set_ylabel("y")

    ax[0,1].hist(prior_y, 50)
    ax[0,1].set_ylabel("frequency")
    ax[0,1].set_title("Prior y histogram")

    ax[1,0].hist(prior_mu, 50)
    ax[1,0].set_ylabel("frequency")
    ax[1,0].set_title("Prior mean histogram")

    ax[1,1].hist(prior_stddev, 50)
    ax[1,1].set_ylabel("frequency")
    ax[1,1].set_title("Prior standard deviation histogram")

    plt.show()

png

Using Arviz for plotting the trace

Instead of matplotlib, we can also use the Arviz to plot trace. Requires less code.

az.plot_trace(trace, var_names=['y', 'μ', 'σ'], figsize=(13,9));

png

This looks like a reasonable prior.

Sample data from true distribution

We will estimate true mean and standard deviation from this sample.

obs_y = np.random.choice(true_y, 200)
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(13,5))
ax1.plot(obs_y)
ax1.set_title("Observed y Values")
ax1.set_ylabel("y")
ax2.hist(obs_y, 50)
ax2.set_ylabel("frequency")
ax2.set_title("Observed y histogram")
plt.show()

png

Inference: Estimate parameters

Here we are using MCMC, but we can also use the analytical model to get the posterior using conjugate prior for the normal distribution.

options = dict(return_inferencedata=False, random_seed=RANDOM_SEED)
with pm.Model() as model2:
    # Set Prior
    μ = pm.Uniform('μ', lower=-1, upper=3)
    σ = pm.HalfNormal("σ", sigma=1)

    # likelihood: Given our priors, the likelihood of observed data
    y =  pm.Normal('y', mu=μ, sd=σ, observed=obs_y)    

    # MCMC creates posterior distribution and we sample 500 examples from this posterior predictive distribution
    trace = pm.sample(500, **options)    
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [σ, μ]
100.00% [6000/6000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 500 draw iterations (4_000 + 2_000 draws total) took 13 seconds.

Check convergence

Instead of using a single Markov chain, PyMC3 uses multiple chains. Then we can compare results from multiple chains to make sure they are consistent.

with model2:
    az.plot_forest(trace, var_names=["μ", "σ"]) 

png

We can see that all 4 independent samples are consistent in terms of mean and standard deviation estimation.

Plot Posterior Distribution with Highest-posterior density(high density interval)

Confidence interval as a term doesn't apply to probabilistic programming, because confidence interval implies having a fixed estimated parameter value to be in the interval 95% of time. It doesn't say anything about the distribution of the estimated parameter. On the other hand, highest-posterior density directly states that For our case, this implies expected error being between a and b with 95% probability.

with model2:
    az.plot_posterior(
    trace,
    hdi_prob=0.95,    
    var_names=["μ", "σ"])
    display(az.summary(trace))
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
μ 0.500 0.022 0.455 0.539 0.001 0.0 1890.0 1409.0 1.0
σ 0.322 0.017 0.291 0.353 0.000 0.0 1466.0 1232.0 1.0

png

Given our sample with 200 random examples from the true distribution, we can estimate that:
- mean of the true distribution is between 0.45 and 0.54 with 95% probability.
- standard deviation of the true distribution is between 0.29 and 0.36 with 95% probability.

Both of the intervals cover the true distribution's parameters, where .

Posterior Predictive Check - Distribution of predicted samples

Based on the estimated parameters and the observed data, we generated predicted y. We can compare observed y and predicted y to spot differences between 2 sets.
- We sample a value of from the posterior.
- We feed that value of to the likelihood, thus obtaining a data point: predicted y.

with model2:
    y_pred = pm.sample_posterior_predictive(trace, random_seed=RANDOM_SEED)
100.00% [2000/2000 00:01<00:00]
with model2:
    data_ppc = az.from_pymc3(trace=trace, posterior_predictive=y_pred)
    ax = az.plot_ppc(data_ppc, figsize=(12, 6), mean=False)
    ax.set_title("Comparison of observed and simulated data based on the posterior")
    ax.legend(fontsize=15);

png

We see that the mean of the simulated data is slightly displaced to the right. In real life, we won't get to do this, but let's compare the simulated distribution with the true distribution.

ax = az.plot_dist(true_y, figsize=(12, 6))
ax.set_title("True Distribution");

png

It is not a perfect model, but it's estimated parameters are close to the parameters, and simulated dataset's distribution doesn't show any major deviation from the true distribution. If you read this far, I hope you enjoyed it. If you have any constructive feedback, don't hesitate to let me know.
References:
- Pymc
- Bayesian Analysis with Python by Osvaldo Martin