causalpy.pymc_experiments
¶
Experiment routines for PyMC models.
ExperimentalDesign base class
Pre-Post Fit
Interrupted Time Series
Synthetic Control
Difference in differences
Regression Discontinuity
Pretest/Posttest Nonequivalent Group Design
- class causalpy.pymc_experiments.DifferenceInDifferences¶
A class to analyse data from Difference in Difference settings.
Note
There is no pre/post intervention data distinction for DiD, we fit all the data available.
- Parameters
data – A pandas dataframe
formula – A statistical model formula
time_variable_name – Name of the data column for the time variable
group_variable_name – Name of the data column for the group variable
model – A PyMC model for difference in differences
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... ) >>> result.summary() ===========================Difference in Differences============================ Formula: y ~ 1 + group*post_treatment Results: Causal impact = 0.5, $CI_{94%}$[0.4, 0.6] Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- __init__(data, formula, time_variable_name, group_variable_name, model=None, **kwargs)¶
- expt_type = None¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot()¶
Plot the results. Creating the combined mean + HDI legend entries is a bit involved.
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None
- class causalpy.pymc_experiments.ExperimentalDesign¶
Base class for other experiment types
See subclasses for examples of most methods
- __init__(model=None, **kwargs)¶
- expt_type = None¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- class causalpy.pymc_experiments.InstrumentalVariable¶
A class to analyse instrumental variable style experiments.
- Parameters
instruments_data – A pandas dataframe of instruments for our treatment variable. Should contain instruments Z, and treatment t
data – A pandas dataframe of covariates for fitting the focal regression of interest. Should contain covariates X including treatment t and outcome y
instruments_formula – A statistical model formula for the instrumental stage regression e.g. t ~ 1 + z1 + z2 + z3
formula –
A statistical model formula for the
focal regression e.g. y ~ 1 + t + x1 + x2 + x3
model – A PyMC model
priors –
An optional dictionary of priors for the mus and sigmas of both regressions. If priors are not specified we will substitue MLE estimates for the beta coefficients. Greater control can be achieved by specifying the priors directly e.g. priors = {
”mus”: [0, 0], “sigmas”: [1, 1], “eta”: 2, “lkj_sd”: 2, }
>>> import pandas as pd >>> import causalpy as cp >>> from causalpy.pymc_experiments import InstrumentalVariable >>> from causalpy.pymc_models import InstrumentalVariableRegression >>> import numpy as np >>> N = 100 >>> e1 = np.random.normal(0, 3, N) >>> e2 = np.random.normal(0, 1, N) >>> Z = np.random.uniform(0, 1, N) >>> ## Ensure the endogeneity of the the treatment variable >>> X = -1 + 4 * Z + e2 + 2 * e1 >>> y = 2 + 3 * X + 3 * e1 >>> test_data = pd.DataFrame({"y": y, "X": X, "Z": Z}) >>> sample_kwargs = { ... "tune": 1, ... "draws": 5, ... "chains": 1, ... "cores": 4, ... "target_accept": 0.95, ... "progressbar": False, ... } >>> instruments_formula = "X ~ 1 + Z" >>> formula = "y ~ 1 + X" >>> instruments_data = test_data[["X", "Z"]] >>> data = test_data[["y", "X"]] >>> iv = InstrumentalVariable( ... instruments_data=instruments_data, ... data=data, ... instruments_formula=instruments_formula, ... formula=formula, ... model=InstrumentalVariableRegression(sample_kwargs=sample_kwargs), ... )
- __init__(instruments_data, data, instruments_formula, formula, model=None, priors=None, **kwargs)¶
- expt_type = None¶
- get_2SLS_fit()¶
Two Stage Least Squares Fit
This function is called by the experiment, results are used for priors if none are provided.
- get_naive_OLS_fit()¶
Naive Ordinary Least Squares
This function is called by the experiment.
- property idata¶
Access to the models InferenceData object
- model = None¶
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- class causalpy.pymc_experiments.InterruptedTimeSeries¶
A wrapper around PrePostFit class
- Parameters
data – A pandas dataframe
treatment_time – The time when treatment occured, should be in reference to the data index
formula – A statistical model formula
model – A PyMC model
>>> import causalpy as cp >>> df = ( ... cp.load_data("its") ... .assign(date=lambda x: pd.to_datetime(x["date"])) ... .set_index("date") ... ) >>> treatment_time = pd.to_datetime("2017-01-01") >>> seed = 42 >>> result = cp.pymc_experiments.InterruptedTimeSeries( ... df, ... treatment_time, ... formula="y ~ 1 + t + C(month)", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... )
- __init__(data, treatment_time, formula, model=None, **kwargs)¶
- expt_type = 'Interrupted Time Series'¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot(counterfactual_label='Counterfactual', **kwargs)¶
Plot the results
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None
- class causalpy.pymc_experiments.PrePostFit¶
A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.
- Parameters
data – A pandas dataframe
treatment_time – The time when treatment occured, should be in reference to the data index
formula – A statistical model formula
model – A PyMC model
>>> import causalpy as cp >>> sc = cp.load_data("sc") >>> treatment_time = 70 >>> seed = 42 >>> result = cp.pymc_experiments.PrePostFit( ... sc, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model=cp.pymc_models.WeightedSumFitter( ... sample_kwargs={ ... "draws": 2000, ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False ... } ... ), ... ) >>> result.summary() ==================================Pre-Post Fit================================== Formula: actual ~ 0 + a + b + c + d + e + f + g Model coefficients: a 0.3, 94% HDI [0.3, 0.3] b 0.0, 94% HDI [0.0, 0.0] c 0.3, 94% HDI [0.2, 0.3] d 0.0, 94% HDI [0.0, 0.1] e 0.0, 94% HDI [0.0, 0.0] f 0.1, 94% HDI [0.1, 0.2] g 0.0, 94% HDI [0.0, 0.0] sigma 0.2, 94% HDI [0.2, 0.3]
- __init__(data, treatment_time, formula, model=None, **kwargs)¶
- expt_type = None¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot(counterfactual_label='Counterfactual', **kwargs)¶
Plot the results
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None
- class causalpy.pymc_experiments.PrePostNEGD¶
A class to analyse data from pretest/posttest designs
- Parameters
data – A pandas dataframe
formula – A statistical model formula
group_variable_name – Name of the column in data for the group variable
pretreatment_variable_name – Name of the column in data for the pretreatment variable
model – A PyMC model
>>> import causalpy as cp >>> df = cp.load_data("anova1") >>> seed = 42 >>> result = cp.pymc_experiments.PrePostNEGD( ... df, ... formula="post ~ 1 + C(group) + pre", ... group_variable_name="group", ... pretreatment_variable_name="pre", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... ) >>> result.summary() ==================Pretest/posttest Nonequivalent Group Design=================== Formula: post ~ 1 + C(group) + pre Results: Causal impact = 1.8, $CI_{94%}$[1.6, 2.0] Model coefficients: Intercept -0.4, 94% HDI [-1.2, 0.2] C(group)[T.1] 1.8, 94% HDI [1.6, 2.0] pre 1.0, 94% HDI [0.9, 1.1] sigma 0.5, 94% HDI [0.4, 0.5]
- __init__(data, formula, group_variable_name, pretreatment_variable_name, model=None, **kwargs)¶
- expt_type = None¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot()¶
Plot the results
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None
- class causalpy.pymc_experiments.RegressionDiscontinuity¶
A class to analyse sharp regression discontinuity experiments.
- Parameters
data – A pandas dataframe
formula – A statistical model formula
treatment_threshold – A scalar threshold value at which the treatment is applied
model – A PyMC model
running_variable_name – The name of the predictor variable that the treatment threshold is based upon
epsilon – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.
bandwidth – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.
>>> import causalpy as cp >>> df = cp.load_data("rd") >>> seed = 42 >>> result = cp.pymc_experiments.RegressionDiscontinuity( ... df, ... formula="y ~ 1 + x + treated + x:treated", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... }, ... ), ... treatment_threshold=0.5, ... ) >>> result.summary() ============================Regression Discontinuity============================ Formula: y ~ 1 + x + treated + x:treated Running variable: x Threshold on running variable: 0.5 Results: Discontinuity at threshold = 0.91 Model coefficients: Intercept 0.0, 94% HDI [0.0, 0.1] treated[T.True] 2.4, 94% HDI [1.6, 3.2] x 1.3, 94% HDI [1.1, 1.5] x:treated[T.True] -3.0, 94% HDI [-4.1, -2.0] sigma 0.3, 94% HDI [0.3, 0.4]
- __init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)¶
- expt_type = None¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot()¶
Plot the results
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None
- class causalpy.pymc_experiments.SyntheticControl¶
A wrapper around the PrePostFit class
- Parameters
data – A pandas dataframe
treatment_time – The time when treatment occured, should be in reference to the data index
formula – A statistical model formula
model – A PyMC model
>>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> seed = 42 >>> result = cp.pymc_experiments.SyntheticControl( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model=cp.pymc_models.WeightedSumFitter( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ), ... )
- __init__(data, treatment_time, formula, model=None, **kwargs)¶
- expt_type = 'Synthetic Control'¶
- property idata¶
Access to the models InferenceData object
- model = None¶
- plot(plot_predictors=False, **kwargs)¶
Plot the results
- print_coefficients()¶
Prints the model coefficients
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients() Model coefficients: Intercept 1.0, 94% HDI [1.0, 1.1] post_treatment[T.True] 0.9, 94% HDI [0.9, 1.0] group 0.1, 94% HDI [0.0, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.0, 94% HDI [0.0, 0.1]
- Return type
None
- summary()¶
Print text output summarising the results
- Return type
None