Discussion: Instrumental Variables

Install packages

Load the data

In this lab we will be examining the following question: does participating in a 401(k) increase an individual’s net wealth? One difficulty in answering this question is that whether or not people participate in a 401(k) is probably not random. However, whether or not somebody is eligible to participate in a 401(k) is arguably random. Additionally, the only way that 401(k) eligibility affects an individual’s net wealth is by affecting whether or not they participate in the 401(k). Given this setup, we can use Instrumental Variables to estimate the Local Average Treatment Effect (average treatment effect on the compliers) of participating in a 401(k).

Let’s begin by loading the data, which will be called pension.

data(pension)
# Shuffle data around
pension <- pension[sample(1:nrow(pension), nrow(pension)), ]

Set the data up for our analysis

In the data, our variables of interest are the following: net_tfa: This is our outcome variable showing an individual’s net wealth. p401: This is our treatment variable showing whether or not an individual participated in a 401(k). e401: This is our instrument showing whether or not an individual was eligible to participate in a 401(k). age, inc, fsize, educ, db, marr, twoearn, pira, hown: These are potential confounders, but we will ignore them for right now. Remember we are assuming our instrument is randomly assigned.

y <- "net_tfa"
a <- "p401"
z <- "e401"
X <- c("age", "inc", "fsize", "educ", "db", "marr", "twoearn", "pira", "hown")

pension <- select(pension, all_of(c(y, a, z, X)))

glimpse(pension)

## Rows: 9,915
## Columns: 12
## $ net_tfa <int> 17315, 7999, -180, 55380, 1601, 2500, -130090, -355, 63595, 16…
## $ p401    <int> 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,…
## $ e401    <int> 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1,…
## $ age     <int> 29, 37, 51, 43, 28, 38, 59, 41, 54, 31, 56, 34, 36, 61, 43, 46…
## $ inc     <int> 34773, 51060, 30780, 68808, 18234, 19800, 126576, 70671, 50646…
## $ fsize   <int> 4, 1, 5, 4, 3, 2, 3, 4, 3, 2, 3, 1, 2, 1, 2, 4, 5, 6, 1, 3, 2,…
## $ educ    <int> 16, 15, 14, 16, 8, 8, 12, 14, 12, 18, 13, 16, 15, 5, 18, 13, 1…
## $ db      <int> 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,…
## $ marr    <int> 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,…
## $ twoearn <int> 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,…
## $ pira    <int> 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,…
## $ hown    <dbl> 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,…

How are 401(k) eligibility and participation related

We want to see how 401(k) eligibility and participation are related to build our intuition for this exercise. In particular it’ll be helpful to see how many people that are eligible to participate in a 401(k) actually do so.

table(pension[, c(z, a)])

##     p401
## e401    0    1
##    0 6233    0
##    1 1088 2594

Estimate the Local Average Treatment Effect

Finally, we will estimate the Local Average Treatment Effect (average treatment effect on the compliers) of participating in a 401(k) using the Wald Estimator and two-stage least squares.

Wald Estimator

The estimator we discussed in class is also called the Wald estimator. This happens to be the same person, Abraham Wald, whose work is the inspiration of the survivorship bias airplane meme] It proceeds by (1) estimating the effect of the instrument on the outcome, (2) estimating the effect of the instrument on the treatment, and (3) dividing (1) / (2). \[\begin{aligned} \hat{E}(Y^{a=1}-Y^{a=0}) &= \frac{\hat{E}(Y^{z=1}-Y^{z=0})}{\hat{E}(A^{z=1}-A^{z=0})}\\ &= \frac{\hat{E}(Y\mid Z = 1) - \hat{E}(Y\mid Z = 0)}{\hat{E}(A\mid Z = 1) - \hat{E}(A\mid Z = 0)} \end{aligned}\]

First let’s calculate the effect of the instrument on the outcome:

mean_outcome_by_instrument <- pension |>
  group_by(e401) |>
  summarise(net_tfa = mean(net_tfa))

instrument_on_outcome <- (
  mean_outcome_by_instrument$net_tfa[2] - mean_outcome_by_instrument$net_tfa[1]
)

Now, calculate the effect of the instrument on the treatment:

mean_treatment_by_instrument <- pension |>
  group_by(e401) |>
  summarise(p401 = mean(p401))

instrument_on_treatment <- (
  mean_treatment_by_instrument$p401[2] - mean_treatment_by_instrument$p401[1]
)

Calculate the LATE by dividing these two:

wald_late <- instrument_on_outcome / instrument_on_treatment

wald_late

## [1] 27763.11

Two stage least squeares

We can also estimate the LATE with a procedure called two stage least squares. To do that we will use the ivreg function from the AER package. When the treatment and instrument are both binary, this should give the exact same answer as the Wald estimator above. However, when the treatment or instrument are continuous, the Wald estimator may not work and two stage least squares can be used instead. To see the help file for this function, run ?ivreg in your console and take a look at the documentation.

You will have to create a formula that tells ivreg what your outcome variable, treatment variable, and instrument variable are. NOTE: This formula should look something like outcome ~ treatment | instrument.

iv_formula <- net_tfa ~ p401 | e401
iv_model <- ivreg(formula = net_tfa ~ p401 | e401, data = pension)

# Print the model
iv_model

## 
## Call:
## ivreg(formula = net_tfa ~ p401 | e401, data = pension)
## 
## Coefficients:
## (Intercept)         p401  
##       10788        27763

The Local Average Treatment Effect is the p401 coefficient that is in the printed summary above!

Calculate robust standard errors and create 95% conf. interval

Of course, since we want to understand how much uncertainty is in our estimates, we will calculate the standard errors for our estimate. NOTE The vcov. = vcovHC argument below is telling R to calculate robust standard errors, which we typically want to do in our statistical analyses.

standard_err <- summary(iv_model, vcov. = vcovHC)
late <- coef(standard_err)["p401", c("Estimate", "Std. Error")]
# 95% conf. int.
ci <- c(
  late[["Estimate"]] - qnorm(0.975)*late[["Std. Error"]],
  late[["Estimate"]] + qnorm(0.975)*late[["Std. Error"]]
)

# Print the results and confidence interval
cat("LATE (Lower bound, Upper bound):\n",
    paste0("\r", round(late[["Estimate"]], 3)),
    paste0("(", round(ci[[1]], 3), ","),
    paste0(round(ci[[2]], 3), ")\n"))

## LATE (Lower bound, Upper bound):
##  27763.11 (23871.736, 31654.484)

Discussion 9 - IV

2024-10-23