Discussion 4. Interference and Stat Review

STSCI/INFO/ILRST 3900: Causal Inference

September 17, 2025

Interference

You can download the slides for this week’s discussion.

Stat review reminder

\(\pi^a= P(Y^a=1)\)) under treatment condition \(a\)
Let \(\hat\pi^a\) be the estimate of that unknown probability
\(\hat\pi^a = \frac{1}{n_a}\sum_{i:A_i=a} Y_i^a\), proportion of people in that treatment condition to have outcome 1
To make a confidence interval on \(\hat\pi^a\) we need the standard error of \(\hat\pi^a\)

Standard error

Let \(Y^a\) be a Bernoulli random variable
The variance of \(Y^a\) is \(V(Y^a) = \pi^a (1-\pi^a)\)
We have estimated by an average: \(\hat\pi^a\)
If we did this many times in many hypothetical samples, we would get different estimate
The estimates sampling variance is \(V(\hat\pi^a) = \frac{\pi^a(1-\pi^a)}{n_a}\)
The standard error is the square root of the sampling variance: \[SE(\hat\pi^a) = \sqrt\frac{\pi^a(1-\pi^a)}{n_a}\]
We can estimate this SE by plugging in our estimate \(\hat\pi^a\)
In R: we translated the standard error formula into code

se_binary <- function(p, n) {
  se <- sqrt( p * (1 - p) / n )
  return(se)
}

Sampling distribution

\(\hat\pi^a\) is a sample mean
By the Central Limit Theorem: as \(n \to \infty\), across hypothetical repeated samples the distribution of \(\hat\pi^a\) estimates becomes Normal
Across repeated samples, the middle 95% of estimates will fall within a known range: \[\pi^a \pm \Phi^{-1}(.975) \times SE(\hat\pi^a)\]
where \(\Phi^{-1}()\) is the quantile of the standard Normal distribution
\(\Phi^{-1}(.975) \approx 1.96\), so the number 1.96 might be familiar to you

## Warning in annotate(geom = "label", x = 0, y = 0.2 *
## dnorm(0), label = "'Middle 95% of Sample-Based
## Estimates'~hat(pi)^a", : Ignoring unknown parameters:
## `label.size`

Plug in the estimates \(\hat\pi^a\) and \(\widehat{SE}(\hat\pi^a)\) to get a 95% confidence interval
The CI is centered on the estimate \(\hat\pi^a\)
If we repeatedly made a CI using hypothetical samples from the population, the CI would contain the unknown true parameter \(\pi^a\) 95% of the time
In R: we translated the confidence interval formula into code

ci_lower <- function(estimate, standard_error) {
  estimate - qnorm(.975) * standard_error
}
ci_upper <- function(estimate, standard_error) {
  estimate + qnorm(.975) * standard_error
}

Discussion 3. Treatment effect heterogneity in an Experiment

Discussion 5. Directed Acyclic Graphs