In this lab we will be examining the following question: does participating in a 401(k) increase an individual’s net wealth? One difficulty in answering this question is that whether or not people participate in a 401(k) is probably not random. However, whether or not somebody is eligible to participate in a 401(k) is arguably random. Additionally, the only way that 401(k) eligibility affects an individual’s net wealth is by affecting whether or not they participate in the 401(k). Given this setup, we can use Instrumental Variables to estimate the Local Average Treatment Effect (average treatment effect on the compliers) of participating in a 401(k).
Let’s begin by loading the data, which will be called
pension
.
data(pension)
# Shuffle data around
pension <- pension[sample(1:nrow(pension), nrow(pension)), ]
In the data, our variables of interest are the following:
net_tfa
: This is our outcome variable showing an
individual’s net wealth. p401
: This is our treatment
variable showing whether or not an individual participated in a 401(k).
e401
: This is our instrument showing whether or not an
individual was eligible to participate in a 401(k). age
,
inc
, fsize
, educ
,
db
, marr
, twoearn
,
pira
, hown
: These are potential confounders,
but we will ignore them for right now. Remember we are assuming our
instrument is randomly assigned.
y <- "net_tfa"
a <- "p401"
z <- "e401"
X <- c("age", "inc", "fsize", "educ", "db", "marr", "twoearn", "pira", "hown")
pension <- select(pension, all_of(c(y, a, z, X)))
glimpse(pension)
## Rows: 9,915
## Columns: 12
## $ net_tfa <int> 17315, 7999, -180, 55380, 1601, 2500, -130090, -355, 63595, 16…
## $ p401 <int> 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,…
## $ e401 <int> 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1,…
## $ age <int> 29, 37, 51, 43, 28, 38, 59, 41, 54, 31, 56, 34, 36, 61, 43, 46…
## $ inc <int> 34773, 51060, 30780, 68808, 18234, 19800, 126576, 70671, 50646…
## $ fsize <int> 4, 1, 5, 4, 3, 2, 3, 4, 3, 2, 3, 1, 2, 1, 2, 4, 5, 6, 1, 3, 2,…
## $ educ <int> 16, 15, 14, 16, 8, 8, 12, 14, 12, 18, 13, 16, 15, 5, 18, 13, 1…
## $ db <int> 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,…
## $ marr <int> 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,…
## $ twoearn <int> 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,…
## $ pira <int> 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,…
## $ hown <dbl> 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,…
Finally, we will estimate the Local Average Treatment Effect (average treatment effect on the compliers) of participating in a 401(k) using the Wald Estimator and two-stage least squares.
The estimator we discussed in class is also called the Wald estimator. This happens to be the same person, Abraham Wald, whose work is the inspiration of the survivorship bias airplane meme] It proceeds by (1) estimating the effect of the instrument on the outcome, (2) estimating the effect of the instrument on the treatment, and (3) dividing (1) / (2). \[\begin{aligned} \hat{E}(Y^{a=1}-Y^{a=0}) &= \frac{\hat{E}(Y^{z=1}-Y^{z=0})}{\hat{E}(A^{z=1}-A^{z=0})}\\ &= \frac{\hat{E}(Y\mid Z = 1) - \hat{E}(Y\mid Z = 0)}{\hat{E}(A\mid Z = 1) - \hat{E}(A\mid Z = 0)} \end{aligned}\]
First let’s calculate the effect of the instrument on the outcome:
mean_outcome_by_instrument <- pension |>
group_by(e401) |>
summarise(net_tfa = mean(net_tfa))
instrument_on_outcome <- (
mean_outcome_by_instrument$net_tfa[2] - mean_outcome_by_instrument$net_tfa[1]
)
Now, calculate the effect of the instrument on the treatment:
mean_treatment_by_instrument <- pension |>
group_by(e401) |>
summarise(p401 = mean(p401))
instrument_on_treatment <- (
mean_treatment_by_instrument$p401[2] - mean_treatment_by_instrument$p401[1]
)
Calculate the LATE by dividing these two:
wald_late <- instrument_on_outcome / instrument_on_treatment
wald_late
## [1] 27763.11
We can also estimate the LATE with a procedure called two stage least
squares. To do that we will use the ivreg
function from the
AER package. When the treatment and instrument are both binary, this
should give the exact same answer as the Wald estimator above. However,
when the treatment or instrument are continuous, the Wald estimator may
not work and two stage least squares can be used instead. To see the
help file for this function, run ?ivreg
in your console and
take a look at the documentation.
You will have to create a formula that tells ivreg
what
your outcome variable, treatment variable, and instrument variable are.
NOTE: This formula should look something like
outcome ~ treatment | instrument
.
iv_formula <- net_tfa ~ p401 | e401
iv_model <- ivreg(formula = net_tfa ~ p401 | e401, data = pension)
# Print the model
iv_model
##
## Call:
## ivreg(formula = net_tfa ~ p401 | e401, data = pension)
##
## Coefficients:
## (Intercept) p401
## 10788 27763
The Local Average Treatment Effect is the p401
coefficient that is in the printed summary above!
Of course, since we want to understand how much uncertainty is in our
estimates, we will calculate the standard errors for our estimate.
NOTE The vcov. = vcovHC
argument below is
telling R to calculate robust standard errors, which we typically want
to do in our statistical analyses.
standard_err <- summary(iv_model, vcov. = vcovHC)
late <- coef(standard_err)["p401", c("Estimate", "Std. Error")]
# 95% conf. int.
ci <- c(
late[["Estimate"]] - qnorm(0.975)*late[["Std. Error"]],
late[["Estimate"]] + qnorm(0.975)*late[["Std. Error"]]
)
# Print the results and confidence interval
cat("LATE (Lower bound, Upper bound):\n",
paste0("\r", round(late[["Estimate"]], 3)),
paste0("(", round(ci[[1]], 3), ","),
paste0(round(ci[[2]], 3), ")\n"))
## LATE (Lower bound, Upper bound):
## 27763.11 (23871.736, 31654.484)