(Generalized) Linear and Hierarchical Linear Models in PyMC3¶
[1]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
import theano
from pandas.plotting import scatter_matrix
from pymc3 import *
from statsmodels.formula.api import glm as glm_sm
[2]:
%config InlineBackend.figure_format = 'retina'
az.style.use("arviz-darkgrid")
Linear Regression¶
Lets generate some data with known slope and intercept and fit a simple linear GLM.
[3]:
size = 50
true_intercept = 1
true_slope = 2
x = np.linspace(0, 1, size)
y = true_intercept + x * true_slope + np.random.normal(scale=0.5, size=size)
data = {"x": x, "y": y}
The glm.linear_component()
function can be used to generate the output variable y_est and coefficients of the specified linear model.
[4]:
with Model() as model:
lm = glm.LinearComponent.from_formula("y ~ x", data)
sigma = Uniform("sigma", 0, 20)
y_obs = Normal("y_obs", mu=lm.y_est, sigma=sigma, observed=y)
trace = sample(2000, cores=2)
plt.figure(figsize=(5, 5))
plt.plot(x, y, "x")
plot_posterior_predictive_glm(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, x, Intercept]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 7 seconds.
The acceptance probability does not match the target. It is 0.8974642870765553, but should be close to 0.8. Try to increase the number of tuning steps.

Since there are a couple of general linear models that are being used over and over again (Normally distributed noise, logistic regression etc), the glm.glm()
function simplifies the above step by creating the likelihood (y_obs) and its priors (sigma) for us. Since we are working in the model context, the random variables are all added to the model behind the scenes. This function also automatically finds a good starting point which it returns.
Note that the below call to glm()
is producing exactly the same model as above, just more succinctly.
[5]:
with Model() as model:
GLM.from_formula("y ~ x", data)
trace = sample(2000, cores=2)
plt.figure(figsize=(5, 5))
plt.plot(x, y, "x")
plot_posterior_predictive_glm(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, x, Intercept]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 5 seconds.
The acceptance probability does not match the target. It is 0.8878450567290662, but should be close to 0.8. Try to increase the number of tuning steps.

Robust GLM¶
Lets try the same model but with a few outliers in the data.
[6]:
x_out = np.append(x, [0.1, 0.15, 0.2])
y_out = np.append(y, [8, 6, 9])
data_outlier = dict(x=x_out, y=y_out)
[7]:
with Model() as model:
GLM.from_formula("y ~ x", data_outlier)
trace = sample(2000, cores=2)
plt.figure(figsize=(5, 5))
plt.plot(x_out, y_out, "x")
plot_posterior_predictive_glm(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, x, Intercept]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 5 seconds.

Because the normal distribution does not have a lot of mass in the tails, an outlier will affect the fit strongly.
Instead, we can replace the Normal likelihood with a student T distribution which has heavier tails and is more robust towards outliers. While this could be done with the linear_compoment()
function and manually defining the T likelihood we can use the glm()
function for more automation. By default this function uses a normal likelihood. To define the usage of a T distribution instead we can pass a family object that contains information on how to link the output to y_est
(in this
case we explicitly use the Identity link function which is also the default) and what the priors for the T distribution are. Here we fix the degrees of freedom nu
to 1.5.
[8]:
with Model() as model_robust:
family = glm.families.StudentT(
link=glm.families.Identity(), priors={"nu": 1.5, "lam": Uniform.dist(0, 20)}
)
GLM.from_formula("y ~ x", data_outlier, family=family)
trace = sample(2000, cores=2)
plt.figure(figsize=(5, 5))
plt.plot(x_out, y_out, "x")
plot_posterior_predictive_glm(trace)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [lam, x, Intercept]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 6 seconds.

Hierarchical GLM¶
[9]:
sat_data = pd.read_csv(get_data("Guber1999data.txt"))
[10]:
with Model() as model_sat:
grp_mean = Normal("grp_mean", mu=0, sigma=10)
grp_sd = Uniform("grp_sd", 0, 200)
# Define priors for intercept and regression coefficients.
priors = {
"Intercept": Normal.dist(mu=sat_data.sat_t.mean(), sigma=sat_data.sat_t.std()),
"spend": Normal.dist(mu=grp_mean, sigma=grp_sd),
"stu_tea_rat": Normal.dist(mu=grp_mean, sigma=grp_sd),
"salary": Normal.dist(mu=grp_mean, sigma=grp_sd),
"prcnt_take": Normal.dist(mu=grp_mean, sigma=grp_sd),
}
GLM.from_formula("sat_t ~ spend + stu_tea_rat + salary + prcnt_take", sat_data, priors=priors)
trace_sat = sample(2000, cores=2)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, prcnt_take, salary, stu_tea_rat, spend, Intercept, grp_sd, grp_mean]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 31 seconds.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.
There were 36 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 25% for some parameters.
[11]:
scatter_matrix(trace_to_dataframe(trace_sat), figsize=(12, 12));

[12]:
with Model() as model_sat:
grp_mean = Normal("grp_mean", mu=0, sigma=10)
grp_prec = Gamma("grp_prec", alpha=1, beta=0.1, testval=1.0)
slope = StudentT.dist(mu=grp_mean, lam=grp_prec, nu=1)
intercept = Normal.dist(mu=sat_data.sat_t.mean(), sigma=sat_data.sat_t.std())
GLM.from_formula(
"sat_t ~ spend + stu_tea_rat + salary + prcnt_take",
sat_data,
priors={"Intercept": intercept, "Regressor": slope},
)
trace_sat = sample(2000, cores=2)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, prcnt_take, salary, stu_tea_rat, spend, Intercept, grp_prec, grp_mean]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 34 seconds.
The number of effective samples is smaller than 25% for some parameters.
[13]:
scatter_matrix(trace_to_dataframe(trace_sat), figsize=(12, 12));

[14]:
tdf_gain = 5.0
with Model() as model_sat:
grp_mean = Normal("grp_mean", mu=0, sigma=10)
grp_prec = Gamma("grp_prec", alpha=1, beta=0.1, testval=1.0)
slope = StudentT.dist(mu=grp_mean, lam=grp_prec, nu=1) # grp_df)
intercept = Normal.dist(mu=sat_data.sat_t.mean(), sigma=sat_data.sat_t.std())
GLM.from_formula(
"sat_t ~ spend + stu_tea_rat + salary + prcnt_take",
sat_data,
priors={"Intercept": intercept, "Regressor": slope},
)
trace_sat = sample(2000, cores=2)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, prcnt_take, salary, stu_tea_rat, spend, Intercept, grp_prec, grp_mean]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 35 seconds.
The number of effective samples is smaller than 25% for some parameters.
[15]:
scatter_matrix(trace_to_dataframe(trace_sat), figsize=(12, 12));

Logistic Regression¶
[16]:
htwt_data = pd.read_csv(get_data("HtWt.csv"))
htwt_data.head()
[16]:
male | height | weight | |
---|---|---|---|
0 | 0 | 63.2 | 168.7 |
1 | 0 | 68.7 | 169.8 |
2 | 0 | 64.8 | 176.6 |
3 | 0 | 67.9 | 246.8 |
4 | 1 | 68.9 | 151.6 |
[17]:
m = glm_sm("male ~ height + weight", htwt_data, family=sm.families.Binomial()).fit()
print(m.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: male No. Observations: 70
Model: GLM Df Residuals: 67
Model Family: Binomial Df Model: 2
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -28.298
Date: Mon, 15 Jun 2020 Deviance: 56.597
Time: 19:32:13 Pearson chi2: 62.8
No. Iterations: 6
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -45.2059 10.887 -4.152 0.000 -66.545 -23.867
height 0.6571 0.164 4.018 0.000 0.337 0.978
weight 0.0096 0.011 0.892 0.372 -0.012 0.031
==============================================================================
[18]:
with Model() as model_htwt:
GLM.from_formula("male ~ height + weight", htwt_data, family=glm.families.Binomial())
trace_htwt = sample(
2000, cores=2, init="adapt_diag"
) # default init with jitter can cause problem
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [weight, height, Intercept]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 45 seconds.
The acceptance probability does not match the target. It is 0.895678925061463, but should be close to 0.8. Try to increase the number of tuning steps.
[19]:
trace_df = trace_to_dataframe(trace_htwt)
print(trace_df.describe().drop("count").T)
scatter_matrix(trace_df, figsize=(8, 8))
print("P(weight < 0) = ", (trace_df["weight"] < 0).mean())
print("P(height < 0) = ", (trace_df["height"] < 0).mean())
mean std min 25% 50% 75% \
Intercept -49.645288 11.533136 -95.224788 -57.259990 -48.702766 -41.405379
height 0.721918 0.172671 0.267888 0.599994 0.709910 0.834965
weight 0.010389 0.010970 -0.025444 0.002678 0.009994 0.017556
max
Intercept -19.162911
height 1.399758
weight 0.049458
P(weight < 0) = 0.18075
P(height < 0) = 0.0

Bayesian Logistic Lasso¶
[20]:
lp = Laplace.dist(mu=0, b=0.05)
x_eval = np.linspace(-0.5, 0.5, 300)
plt.plot(x_eval, theano.tensor.exp(lp.logp(x_eval)).eval())
plt.xlabel("x")
plt.ylabel("Probability")
plt.title("Laplace distribution");

[21]:
with Model() as model_lasso:
# Define priors for intercept and regression coefficients.
priors = {"Intercept": Normal.dist(mu=0, sigma=50), "Regressor": Laplace.dist(mu=0, b=0.05)}
GLM.from_formula(
"male ~ height + weight", htwt_data, family=glm.families.Binomial(), priors=priors
)
trace_lasso = sample(500, cores=2, init="adapt_diag")
trace_df = trace_to_dataframe(trace_lasso)
scatter_matrix(trace_df, figsize=(8, 8))
print(trace_df.describe().drop("count").T)
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [weight, height, Intercept]
Sampling 2 chains for 1_000 tune and 500 draw iterations (2_000 + 1_000 draws total) took 19 seconds.
mean std min 25% 50% 75% \
Intercept -25.300464 6.891574 -50.959988 -29.493681 -25.036081 -20.592226
height 0.353333 0.104866 0.020546 0.284880 0.347689 0.418672
weight 0.011656 0.009010 -0.013940 0.005477 0.011623 0.017894
max
Intercept -5.973093
height 0.754475
weight 0.038506

[22]:
%load_ext watermark
%watermark -n -u -v -iv -w
theano 1.0.4
logging 0.5.1.2
pandas 1.0.4
numpy 1.18.5
statsmodels.api 0.11.1
arviz 0.8.3
platform 1.0.8
last updated: Mon Jun 15 2020
CPython 3.7.7
IPython 7.15.0
watermark 2.0.2