Difference-in-differences in a repeated cross-sample
Author
Ville Voutilainen
See the did repo for updated notebooks and source code.
Background
Assume we are measuring observations for a repeated cross-sample of \(N=600\) individuals (e.g., customers at a marketing platform, new bank loans, etc.) at time points \(t = 0, 1, \cdots 19\). There are two distinguished groups of individuals (\(j=C,T\)), for example, from two different areas.
Imagine that in one of the areas there is a one-off intervention (e.g., a government stimulus package) taking place between time points 2 and 3 that might affect a feature of interest (“outcome”) of the individuals (e.g., time spent on the site, pricing of the loans). We are interested in the treatment effect of the intervention on the outcome.
Data-generating process
We consider a data-generating process (DGP) of the following form:
where \(\tau_{jt}\) is the (possibly group- and time-dependend) treatment effect. We allow for either a constant or a linearly evolving \(\tau_{jt}\). See the notebook did_simulated_datasets.ipynb in the did repo for more in-depth explanation of the DGP.
Notice that, in reality, the true data-generating process is unknown to the researcher! Here we use simulated data (properties of which we know) to help us understand the properties of our research design and regression models. As advocated by Gelman, Hill, and Vehtari (2020, section 5.5), fake-data simulation is a “way of life” to evaluate the statistical methods.
Research design
Assume we are interested in the average treatment effect on the treated (ATT). That is, we would like to know how much the intervention affected (on average) the outcome for individuals in the treated group. We can write this estimand in mathematical form using potential outcome notation:
\(Y_{i}^d(post)\) denotes the potential outcome of an individual \(i\) at post-period under treatment \(D=d\);
expectation operator is to be understood to average over individuals \(i\) (and over \(t\) since our estimation is done at the level of pre/post periods instead of individual time points);
the post-period consist of time points 10-19 (and the pre-period of time points 0-9).
Used conda environment is dev2023a from here. Additionally installed packages:
r-dagitty via conda/mamba: mamba install r-dagitty. Also installs dependencies.
Imports
Code
# Python dependenciesimport warningswarnings.filterwarnings('ignore')import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport statsmodels.formula.api as smfnp.random.seed(1337)# Local helpersfrom did_helpers import( simulate_did_data, plot_repcrossec_data, parallel_trends_plot, dynamic_did_plot,)# R interfaceimport rpy2%load_ext rpy2.ipython
Code
%%Rlibrary(dagitty)
Helper functions
Code
def prepare_static_regression_frame(data): df = data["observed"].copy()# Possible interaction between time and X if X presentif"X"in df.columns: df["X_time"] = df["X"] * df["t"]# Code categories as dummies df["dummy_period"] = df["time_group"].map({"before": 0,"post": 1, }) df["dummy_group"] = df["treatment_group"].map({"control": 0,"treatment": 1, }) df["dummy_group_x_period"] = df["dummy_period"] * df["dummy_group"]# Time points as categorical/str df["t"] = df["t"].astype(str)return dfdef prepare_dynamic_regression_frame(data): df = data["observed"].copy() param_last_pre_timepoint = data["params"]["param_last_pre_timepoint"]# Variable measuring periods to first treatment time df["time_to_treat"] = ( df["t"] .sub(param_last_pre_timepoint+1) .astype('int') )# We want to create "treatment" dummies from the time_to_treat column such that the kth dummy# obtains value 1 for a given observation if the period of the observation equals k.# To this end, first set time_to_treat cor control observations to 0, as the treatment# dummy values need to be zero for them. df.loc[df["treatment_group"]=="control", "time_to_treat"] =0# Now create the "dynamic" dummies from time_to_treat. Drop out the last time point before treatment# manually to avoid multicolinearity problems (this sets the last time point before treatment as a# reference) point. df = ( pd.get_dummies( df, columns=["time_to_treat"], prefix="dummy_group_x_period", drop_first=False ) .rename(columns=lambda x: x.replace('-', 'm')) .drop(columns="dummy_group_x_period_m1") )# Time points as categorical/str df["t"] = df["t"].astype(str)return df
Classical two-period DiD
We start with a specification where in the true data generating process there is no effect \(\lambda_t X_{ijt}\).
Realized control pre-period mean 8.509
Realized control post-period mean 18.498
Realized treated pre-period mean 5.497
Realized treated post-period mean 13.491
Counterfactual (unobserved) treatment post-period mean 15.491
Counterfactual (naively estimated) treated post-period mean 15.486
Naive DiD-estimate -1.995
Research design
Represent the research design as a directed acyclical graph. Explanation of the arrows:
Intervention affects individuals in the treated group but only in the post period. Hence, there is an arrow \(D_{it} \rightarrow Y_{i, post}\).
Time fixed effects \(\xi_t\) affect outcomes in each time point of pre and post periods (\(\xi_t \rightarrow Y_{i,pre}\) and \(\xi_t \rightarrow Y_{i,post}\)).
\(Y_{i,pre}\) and \(Y_{i,post}\) trivially affect their difference, hence the arrows \(Y_{i,pre} \rightarrow Y_{i,post} - Y_{i,pre}\) and $ Y_{i,post} Y_{i,post} - Y_{i,pre}$.
Group-level effect \(\gamma_j\) acts as a confounder in cross-sectional dimension: it causes* the treatment assignment to treatment/control groups (\(\gamma_j \rightarrow D_{it}\)) as well as the outcome in each time point of pre and post periods (\(\gamma_j \rightarrow Y_{i,pre}\) and \(\gamma_j \rightarrow Y_{i,post}\)). Due to this, we could not simply compare \(E[Y_{it} | j=T]\). The DiD estimator automatically controls for time-invariant, group specific effects via the double differencing. Hence, as Zeldow & Hatfield (2021) define it, \(\gamma_j\) does not constitute a confounder in a multi-period DiD setting. In order for a time-invariant group-specific covariate to be a confounder, “the means of the covariate are different in the two groups and it has time-varying effect on the outcome” (Zeldow & Hatfield, 2021, p. 934).
*Side note: This is not entirely true in our data generating process: the treatment assignment probability is a constant, user-defined parameter. However, we make the mean value of \(\gamma_j\) differ between groups \(j\), essentially mimicking a scenario where individual observations with higher/lower value of \(\gamma_j\) are more likely to appear in the treatment group.
The most important assumption behind the DiD design is the parallel trend assumption, which assumes that there are no confounding effects in the DiD setting as Zeldow & Hatfield (2021) define them. In this particular case, we have assumed a DGP where there are only group invariant (\(\xi_t\)) and time-invariant (\(\gamma_j\)) effects, which means that the parallel trends assumption holds.
Sometimes the use of parallel trends assumption is validated by looking into the pre-period evolution of the average \(E[Y_{it}]\) over \(t\) between the treatment and control groups. The evolution of these averages can be eyeballed from the plot above (vertical short dashed lines), but let’s draw the averages in a more convenient plot for better comparison. From the plot it becomes obvious that the DGP has the same pre-trend (on average).
Code
parallel_trends_plot(data_1)
“Static” regression
Since we have a well-defined, one-off intervention without anticipation, well-defined treatment/control groups, as well as the parallel trends assumption holds, we can use a classical, static difference-in-difference (DiD) estimator to estimate \(\theta\). The hope is to recover the correct estimate \(\tau = -2\).
To estimate the ATT estimand defined above, we can construct an estimator (see the flashcard), which is effectively a two-way fixed effects estimator (TWFE). It can be employed using the following regression equation:
In the above regressions, we are controlling for group fixed effects (dummy_group) and time fixed effects (dummy_period). Notice that typically in the econometric literature regression for a TWFE is written using time- and group-fixed effects for all time points and groups:
\(\eta^j\) denotes group fixed effects (we only have two, control and treatment);
\(\nu_t\) denotes time fixed effects.
As we will see below, this makes no difference for the DiD coefficient estimate.* However, standard errors and thus significance bounds differ a bit due to differing degrees of freedom.
*Time fixed effects will become essential for the coefficient estimate when the intervention is staggered, see for example this.
Now we perform the “dynamic” version of the DiD regression, sometimes also called the “DiD event study”. Dynamic DiD is useful in evaluating the treatment effects of the pre- and post-treatment periods. The event-study plot one can get out of the dynamic DiD version can be informative.
where \(T_l < g-1\) and \(T_h \leq T\) are the time points defining the lenghts of the “event window” in the dynamic DiD setup. We estimate the coefficients for all periods, that is, \(T_l = 0\) and \(T_h = T\). Notice that the last pre-treatment time point has been omitted from the treatment dummies to avoid multicolinearity issues.
Below, we see that the regressions recover correct estimates: in the pre-period, the estimated effects are zero for each time point, whereas in the post-period, we obtain estimates of about -2 for each time point.
Average treatment effect, pre: -0.05 (should be zero!).
Average treatment effect, post: -2.04.
Classical two-period DiD with a trend in treatment effect
Consider the classical two-perid DiD from above, but let the treatment effect linearly intensify over time.
In terms of the research design, nothing changes from the above classical DiD design. Further, at the estimation level, both the “static” and the “dynamic” DiD specifications will capture the same average effect. However, the dynamic DiD can better inform us about the existence of a trend in the treatment.
Realized control pre-period mean 8.497
Realized control post-period mean 18.513
Realized treated pre-period mean 5.513
Realized treated post-period mean 12.743
Counterfactual (unobserved) treatment post-period mean 15.493
Counterfactual (naively estimated) treated post-period mean 15.529
Naive DiD-estimate -2.786
Similarly, run the dynamic DiD and average the obtained time-specific estimates. We see that the average post-period estimate agrees with the static DiD result (slight differences result probably from the comparison to the last pre-period time point vs. the entire pre-period in the static DiD).
Two-period DiD with time-variant, group-divergent covariate
Now we introduce a time-variant effect \(x_{ijt}\) that a) is similar to \(\gamma_j\) in that it causes treatment assignment* in pre-period. In addition, the covariate evolves differently between treatment and control groups, but its effect on the outcome remains constant.
*Same caveat as in the classical two-period DiD research design applies.
Realized control pre-period mean 6.984
Realized control post-period mean 8.947
Realized treated pre-period mean 8.434
Realized treated post-period mean 18.430
Counterfactual (unobserved) treatment post-period mean 20.430
Counterfactual (naively estimated) treated post-period mean 10.396
Naive DiD-estimate 8.034
Reseach design
In this case, the covariate \(X_{it}\) is time-varying. Hence, we draw both \(X_{pre}\) and \(X_{post}\) into the DAG. Covariate values start at different levels in both groups \(j\) at \(t=0\), hence \(X_{pre} \rightarrow D\).* Further, the evolution of the covariate means differ between treatment and control groups. This is indicated by the arrow \(D \rightarrow X_{post}\).
In order to close the backdoor paths from \(D\) to \(Y_{post} - Y_{pre}\), we need to control for both \(X_{pre}\) and \(X_{post}\), that is, \(X_{t}\). The naive DiD estimator will not do this for us! Notice that since the effect of \(X_{it}\) on outcome is time-invariant, we do not need to interact the covariate with time (here).
*Imagine we instead assumed that the covariate starts from the same levels between treatment and control groups in pre-period. In this case the covariate would still be a confounder in DiD setting due to diverging evolution, but here I’m having a hard time to justify arrow \(X_{pre} \rightarrow D\). This arrow is, however, essential if we insist on a DAG to represent the confounding effect of the covariate on the outcome, which we know it will have. The article here, based on Zeldow & Hatfield (2021), elude this point in DAGs presented. In conclusion, DAGs might not always be an appropriate in communicating confounding in a DiD setting.
When we adjust for the covariate, we recover almost the correct estimate; the slight deviation is most probably due to the nature of our DGP. If we simulated multiple datasets and took the average DiD estimate on those, we should get the correct ATT on average.