causalpy.pymc_experiments

Experiment routines for PyMC models.

  • ExperimentalDesign base class

  • Pre-Post Fit

  • Interrupted Time Series

  • Synthetic Control

  • Difference in differences

  • Regression Discontinuity

  • Pretest/Posttest Nonequivalent Group Design

class causalpy.pymc_experiments.DifferenceInDifferences

A class to analyse data from Difference in Difference settings.

Note

There is no pre/post intervention data distinction for DiD, we fit all the data available.

Parameters
  • data – A pandas dataframe

  • formula – A statistical model formula

  • time_variable_name – Name of the data column for the time variable

  • group_variable_name – Name of the data column for the group variable

  • model – A PyMC model for difference in differences

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False,
...         }
...     )
...  )
>>> result.summary() 
===========================Difference in Differences============================
Formula: y ~ 1 + group*post_treatment

Results:
Causal impact = 0.5, $CI_{94%}$[0.4, 0.6]
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
__init__(data, formula, time_variable_name, group_variable_name, model=None, **kwargs)
Parameters
  • data (DataFrame) –

  • formula (str) –

  • time_variable_name (str) –

  • group_variable_name (str) –

expt_type = None
property idata

Access to the models InferenceData object

model = None
plot()

Plot the results. Creating the combined mean + HDI legend entries is a bit involved.

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None

class causalpy.pymc_experiments.ExperimentalDesign

Base class for other experiment types

See subclasses for examples of most methods

__init__(model=None, **kwargs)
expt_type = None
property idata

Access to the models InferenceData object

model = None
print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

class causalpy.pymc_experiments.InstrumentalVariable

A class to analyse instrumental variable style experiments.

Parameters
  • instruments_data – A pandas dataframe of instruments for our treatment variable. Should contain instruments Z, and treatment t

  • data – A pandas dataframe of covariates for fitting the focal regression of interest. Should contain covariates X including treatment t and outcome y

  • instruments_formula – A statistical model formula for the instrumental stage regression e.g. t ~ 1 + z1 + z2 + z3

  • formula

    A statistical model formula for the

    focal regression e.g. y ~ 1 + t + x1 + x2 + x3

  • model – A PyMC model

  • priors

    An optional dictionary of priors for the mus and sigmas of both regressions. If priors are not specified we will substitue MLE estimates for the beta coefficients. Greater control can be achieved by specifying the priors directly e.g. priors = {

    ”mus”: [0, 0], “sigmas”: [1, 1], “eta”: 2, “lkj_sd”: 2, }

>>> import pandas as pd
>>> import causalpy as cp
>>> from causalpy.pymc_experiments import InstrumentalVariable
>>> from causalpy.pymc_models import InstrumentalVariableRegression
>>> import numpy as np
>>> N = 100
>>> e1 = np.random.normal(0, 3, N)
>>> e2 = np.random.normal(0, 1, N)
>>> Z = np.random.uniform(0, 1, N)
>>> ## Ensure the endogeneity of the the treatment variable
>>> X = -1 + 4 * Z + e2 + 2 * e1
>>> y = 2 + 3 * X + 3 * e1
>>> test_data = pd.DataFrame({"y": y, "X": X, "Z": Z})
>>> sample_kwargs = {
...     "tune": 1,
...     "draws": 5,
...     "chains": 1,
...     "cores": 4,
...     "target_accept": 0.95,
...     "progressbar": False,
...     }
>>> instruments_formula = "X  ~ 1 + Z"
>>> formula = "y ~  1 + X"
>>> instruments_data = test_data[["X", "Z"]]
>>> data = test_data[["y", "X"]]
>>> iv = InstrumentalVariable(
...         instruments_data=instruments_data,
...         data=data,
...         instruments_formula=instruments_formula,
...         formula=formula,
...         model=InstrumentalVariableRegression(sample_kwargs=sample_kwargs),
... )
__init__(instruments_data, data, instruments_formula, formula, model=None, priors=None, **kwargs)
Parameters
  • instruments_data (DataFrame) –

  • data (DataFrame) –

  • instruments_formula (str) –

  • formula (str) –

expt_type = None
get_2SLS_fit()

Two Stage Least Squares Fit

This function is called by the experiment, results are used for priors if none are provided.

get_naive_OLS_fit()

Naive Ordinary Least Squares

This function is called by the experiment.

property idata

Access to the models InferenceData object

model = None
print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

class causalpy.pymc_experiments.InterruptedTimeSeries

A wrapper around PrePostFit class

Parameters
  • data – A pandas dataframe

  • treatment_time – The time when treatment occured, should be in reference to the data index

  • formula – A statistical model formula

  • model – A PyMC model

>>> import causalpy as cp
>>> df = (
...     cp.load_data("its")
...     .assign(date=lambda x: pd.to_datetime(x["date"]))
...     .set_index("date")
... )
>>> treatment_time = pd.to_datetime("2017-01-01")
>>> seed = 42
>>> result = cp.pymc_experiments.InterruptedTimeSeries(
...     df,
...     treatment_time,
...     formula="y ~ 1 + t + C(month)",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False,
...         }
...     )
... )
__init__(data, treatment_time, formula, model=None, **kwargs)
Parameters
  • data (DataFrame) –

  • treatment_time (Union[int, float, Timestamp]) –

  • formula (str) –

Return type

None

expt_type = 'Interrupted Time Series'
property idata

Access to the models InferenceData object

model = None
plot(counterfactual_label='Counterfactual', **kwargs)

Plot the results

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None

class causalpy.pymc_experiments.PrePostFit

A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.

Parameters
  • data – A pandas dataframe

  • treatment_time – The time when treatment occured, should be in reference to the data index

  • formula – A statistical model formula

  • model – A PyMC model

>>> import causalpy as cp
>>> sc = cp.load_data("sc")
>>> treatment_time = 70
>>> seed = 42
>>> result = cp.pymc_experiments.PrePostFit(
...     sc,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model=cp.pymc_models.WeightedSumFitter(
...         sample_kwargs={
...             "draws": 2000,
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False
...         }
...     ),
... )
>>> result.summary() 
==================================Pre-Post Fit==================================
Formula: actual ~ 0 + a + b + c + d + e + f + g
Model coefficients:
a                             0.3, 94% HDI [0.3, 0.3]
b                             0.0, 94% HDI [0.0, 0.0]
c                             0.3, 94% HDI [0.2, 0.3]
d                             0.0, 94% HDI [0.0, 0.1]
e                             0.0, 94% HDI [0.0, 0.0]
f                             0.1, 94% HDI [0.1, 0.2]
g                             0.0, 94% HDI [0.0, 0.0]
sigma                         0.2, 94% HDI [0.2, 0.3]
__init__(data, treatment_time, formula, model=None, **kwargs)
Parameters
  • data (DataFrame) –

  • treatment_time (Union[int, float, Timestamp]) –

  • formula (str) –

Return type

None

expt_type = None
property idata

Access to the models InferenceData object

model = None
plot(counterfactual_label='Counterfactual', **kwargs)

Plot the results

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None

class causalpy.pymc_experiments.PrePostNEGD

A class to analyse data from pretest/posttest designs

Parameters
  • data – A pandas dataframe

  • formula – A statistical model formula

  • group_variable_name – Name of the column in data for the group variable

  • pretreatment_variable_name – Name of the column in data for the pretreatment variable

  • model – A PyMC model

>>> import causalpy as cp
>>> df = cp.load_data("anova1")
>>> seed = 42
>>> result = cp.pymc_experiments.PrePostNEGD(
...     df,
...     formula="post ~ 1 + C(group) + pre",
...     group_variable_name="group",
...     pretreatment_variable_name="pre",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False,
...         }
...     )
... )
>>> result.summary() 
==================Pretest/posttest Nonequivalent Group Design===================
Formula: post ~ 1 + C(group) + pre

Results:
Causal impact = 1.8, $CI_{94%}$[1.6, 2.0]
Model coefficients:
Intercept                     -0.4, 94% HDI [-1.2, 0.2]
C(group)[T.1]                 1.8, 94% HDI [1.6, 2.0]
pre                           1.0, 94% HDI [0.9, 1.1]
sigma                         0.5, 94% HDI [0.4, 0.5]
__init__(data, formula, group_variable_name, pretreatment_variable_name, model=None, **kwargs)
Parameters
  • data (DataFrame) –

  • formula (str) –

  • group_variable_name (str) –

  • pretreatment_variable_name (str) –

expt_type = None
property idata

Access to the models InferenceData object

model = None
plot()

Plot the results

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None

class causalpy.pymc_experiments.RegressionDiscontinuity

A class to analyse sharp regression discontinuity experiments.

Parameters
  • data – A pandas dataframe

  • formula – A statistical model formula

  • treatment_threshold – A scalar threshold value at which the treatment is applied

  • model – A PyMC model

  • running_variable_name – The name of the predictor variable that the treatment threshold is based upon

  • epsilon – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.

  • bandwidth – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.

>>> import causalpy as cp
>>> df = cp.load_data("rd")
>>> seed = 42
>>> result = cp.pymc_experiments.RegressionDiscontinuity(
...     df,
...     formula="y ~ 1 + x + treated + x:treated",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "draws": 2000,
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False,
...         },
...     ),
...     treatment_threshold=0.5,
... )
>>> result.summary() 
============================Regression Discontinuity============================
Formula: y ~ 1 + x + treated + x:treated
Running variable: x
Threshold on running variable: 0.5

Results:
Discontinuity at threshold = 0.91
Model coefficients:
Intercept                     0.0, 94% HDI [0.0, 0.1]
treated[T.True]               2.4, 94% HDI [1.6, 3.2]
x                             1.3, 94% HDI [1.1, 1.5]
x:treated[T.True]             -3.0, 94% HDI [-4.1, -2.0]
sigma                         0.3, 94% HDI [0.3, 0.4]
__init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)
Parameters
  • data (DataFrame) –

  • formula (str) –

  • treatment_threshold (float) –

  • running_variable_name (str) –

  • epsilon (float) –

  • bandwidth (Optional[float]) –

expt_type = None
property idata

Access to the models InferenceData object

model = None
plot()

Plot the results

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None

class causalpy.pymc_experiments.SyntheticControl

A wrapper around the PrePostFit class

Parameters
  • data – A pandas dataframe

  • treatment_time – The time when treatment occured, should be in reference to the data index

  • formula – A statistical model formula

  • model – A PyMC model

>>> import causalpy as cp
>>> df = cp.load_data("sc")
>>> treatment_time = 70
>>> seed = 42
>>> result = cp.pymc_experiments.SyntheticControl(
...     df,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model=cp.pymc_models.WeightedSumFitter(
...         sample_kwargs={
...             "target_accept": 0.95,
...             "random_seed": seed,
...             "progressbar": False,
...         }
...     ),
... )
__init__(data, treatment_time, formula, model=None, **kwargs)
Parameters
  • data (DataFrame) –

  • treatment_time (Union[int, float, Timestamp]) –

  • formula (str) –

Return type

None

expt_type = 'Synthetic Control'
property idata

Access to the models InferenceData object

model = None
plot(plot_predictors=False, **kwargs)

Plot the results

print_coefficients()

Prints the model coefficients

>>> import causalpy as cp
>>> df = cp.load_data("did")
>>> seed = 42
>>> result = cp.pymc_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     model=cp.pymc_models.LinearRegression(
...             sample_kwargs={
...                 "draws": 2000,
...                 "random_seed": seed,
...                 "progressbar": False
...             }),
...  )
>>> result.print_coefficients() 
Model coefficients:
Intercept                     1.0, 94% HDI [1.0, 1.1]
post_treatment[T.True]        0.9, 94% HDI [0.9, 1.0]
group                         0.1, 94% HDI [0.0, 0.2]
group:post_treatment[T.True]  0.5, 94% HDI [0.4, 0.6]
sigma                         0.0, 94% HDI [0.0, 0.1]
Return type

None

summary()

Print text output summarising the results

Return type

None