causalpy.skl_experiments
¶
Experiments for Scikit-Learn models
ExperimentalDesign: base class for scikit-learn experiments
PrePostFit: base class for synthetic control and interrupted time series
SyntheticControl
InterruptedTimeSeries
DifferenceInDifferences
RegressionDiscontinuity
- class causalpy.skl_experiments.DifferenceInDifferences¶
Note
There is no pre/post intervention data distinction for DiD, we fit all the data available.
- Parameters
data – A pandas data frame
formula – A statistical model formula
time_variable_name – Name of the data column for the time variable
group_variable_name – Name of the data column for the group variable
model – An scikit-learn model for difference in differences
>>> import causalpy as cp >>> from sklearn.linear_model import LinearRegression >>> df = cp.load_data("did") >>> result = cp.skl_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... treated=1, ... untreated=0, ... model=LinearRegression(), ... )
- __init__(data, formula, time_variable_name, group_variable_name, treated, untreated, model=None, **kwargs)¶
- plot()¶
Plot results
- class causalpy.skl_experiments.ExperimentalDesign¶
Base class for experiment designs
- __init__(model=None, **kwargs)¶
- model = None¶
- outcome_variable_name = None¶
- class causalpy.skl_experiments.InterruptedTimeSeries¶
Interrupted time series analysis, a wrapper around the PrePostFit class
- Parameters
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object
>>> from sklearn.linear_model import LinearRegression >>> import pandas as pd >>> import causalpy as cp >>> df = ( ... cp.load_data("its") ... .assign(date=lambda x: pd.to_datetime(x["date"])) ... .set_index("date") ... ) >>> treatment_time = pd.to_datetime("2017-01-01") >>> result = cp.skl_experiments.InterruptedTimeSeries( ... df, ... treatment_time, ... formula="y ~ 1 + t + C(month)", ... model = LinearRegression() ... )
- expt_type = 'Interrupted Time Series'¶
- class causalpy.skl_experiments.PrePostFit¶
A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.
- Parameters
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An scikit-learn model object
>>> from sklearn.linear_model import LinearRegression >>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> result = cp.skl_experiments.PrePostFit( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model = cp.skl_models.WeightedProportion() ... ) >>> result.get_coeffs() array(...)
- __init__(data, treatment_time, formula, model=None, **kwargs)¶
- get_coeffs()¶
Returns model coefficients
- plot(counterfactual_label='Counterfactual', **kwargs)¶
Plot experiment results
- plot_coeffs()¶
Plots coefficient bar plot
- class causalpy.skl_experiments.RegressionDiscontinuity¶
A class to analyse sharp regression discontinuity experiments.
- Parameters
data – A pandas dataframe
formula – A statistical model formula
treatment_threshold – A scalar threshold value at which the treatment is applied
model – A sci-kit learn model object
running_variable_name – The name of the predictor variable that the treatment threshold is based upon
epsilon – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.
bandwidth – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.
>>> import causalpy as cp >>> from sklearn.linear_model import LinearRegression >>> data = cp.load_data("rd") >>> result = cp.skl_experiments.RegressionDiscontinuity( ... data, ... formula="y ~ 1 + x + treated", ... model=LinearRegression(), ... treatment_threshold=0.5, ... ) >>> result.summary() Difference in Differences experiment Formula: y ~ 1 + x + treated Running variable: x Threshold on running variable: 0.5 Results: Discontinuity at threshold = 0.19 Model coefficients: Intercept 0.0 treated[T.True] 0.19 x 1.23
- __init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)¶
- plot()¶
Plot results
- summary()¶
Print text output summarising the results
- class causalpy.skl_experiments.SyntheticControl¶
A wrapper around the PrePostFit class
- Parameters
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object
>>> from sklearn.linear_model import LinearRegression >>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> result = cp.skl_experiments.SyntheticControl( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model = cp.skl_models.WeightedProportion() ... )
- plot(plot_predictors=False, **kwargs)¶
Plot the results