Lecture 10 - Counts & Hidden Confounds

Author

Isabella C. Richmond

Published

March 21, 2024

Rose / Thorn

Rose: qualitative data!!

Thorn:

Generalized Linear Models

  • Expected value is some function (link function) of an additive combination of parameters

  • interpretation that changes, not computation - no longer simply an additive combination of parameters

    • uniform changes in predictor not uniform changes in prediction
  • all predictor variables interact and moderate one another

  • inlfuences predictions and uncertainty of predictions

Confounded Admissions

  • ability is the most easy to imagine confound for admissions

  • include confound in the model (u):

    • Bernoulli variable 0, 1 - 1 is extremely high achieving individuals that are 10% of the population, everyone else is average
  • gender 1 individuals anticipate gender discrimination in department 2, so if they are average they will avoid it but if they are exceptional they will still apply

# continuing from UCBadmit example
# what happens when there is a confound?

set.seed(17)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
# unobserved common cause 
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*0.5 , 0.8 ) ) + 1
# matrix of acceptance rates [dept,gender]
accept_rate_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
accept_rate_u1 <- matrix( c(0.2,0.3,0.2,0.5) , nrow=2 )
# simulate acceptance
p <- sapply( 1:N , function(i) 
    ifelse( u[i]==0 , accept_rate_u0[D[i],G[i]] , accept_rate_u1[D[i],G[i]] ) )
A <- rbern( N , p )

table(G,D)
table(G,A)

dat_sim <- list( A=A , D=D , G=G )

# total effect gender
m1 <- ulam(
    alist(
        A ~ bernoulli(p),
        logit(p) <- a[G],
        a[G] ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )

post1 <- extract.samples(m1)
post1$fm_contrast <- post1$a[,1] - post1$a[,2]
precis(post1)

# direct effects - now confounded!
m2 <- ulam(
    alist(
        A ~ bernoulli(p),
        logit(p) <- a[G,D],
        matrix[G,D]:a ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )

precis(m2,3)
  • second model stratifies through department is now confounded because we’ve added a common cause, ability
# contrast
post2 <- extract.samples(m2)
post2$fm_contrast_D1 <- post2$a[,1,1] - post2$a[,2,1]
post2$fm_contrast_D2 <- post2$a[,1,2] - post2$a[,2,2]
precis(post2)

dens( post2$fm_contrast_D1 , lwd=4 , col=4 , xlab="F-M contrast in each department" )
dens( post2$fm_contrast_D2 , lwd=4 , col=2 , add=TRUE )
abline(v=0,lty=3)

dens( post2$a[,1,1] , lwd=4 , col=2 , xlim=c(-3,1) )
dens( post2$a[,2,1] , lwd=4 , col=4 , add=TRUE )
dens( post2$fm_contrast_D1 , lwd=4 , add=TRUE )

dens( post2$a[,1,2] , lwd=4 , col=2 , add=TRUE , lty=4 )
dens( post2$a[,2,2] , lwd=4 , col=4 , add=TRUE , lty=4)
dens( post2$fm_contrast_D2 , lwd=4 , add=TRUE , lty=4)
  • probability distribution of probabilities because we’re modelling probability

  • confound happens because exceptional individuals don’t apply to department 1, they apply to department 2 and so it looks like department 1 is discriminating when its not because its getting lower quality applicants

    • department 2 appears to have no discrimination but it does have it, for the opposite reason. they are only admitting higher ability applicants of a certain gender
  • masking effect - sorting can mask a lot of things

Collider Bias

  • We can estimate total causal effect of G but it isn’t what we want (not useful for policy)

  • We cannot estimate direct effect of department or gender

Citation vs Membership

  • paper one concludes that gender results in lower citation rates for women

  • paper two concludes that women get elected more controlling for citations

Citation Networks

  • very plausible that there is quality differences among members of the academy but they are often hidden

  • citations is a post-treatment variable (and stratifying by post-treatment variable is dangerous because it can activate collider bias)

    • if women are less likely to get citations because of discrimination, it is probably that women elected to the NAS with less citations have a higher quality

  • in the absence of strong causal assumptions, we can’t conclude anything

  • proxies for quality are often poor proxies (e.g., citations)

  • if you want causal inference, you must make causal assumptions

Sensitivity Analysis

  • what are the implications of what we don’t know?

  • assume confound exists, model its consequences for different strengths/kinds of influence

  • how strong must the confound be to change conclusions?

  • include quality as an unobserved variable

    • change parameter size/strength to test the effect of unobserved confound
  • \[ A_i \sim Bernoulli(p_i) \\ logit(p_i) = \alpha[G_i, D_i] +\beta_{G[i]}u_i \]

    • first model models total effect of gender on admissions with an unknown confound (quality)
  • \[ (D_i = 2) \sim Bernoulli(q_i) \\ logit(q_i) = \delta[G_i] + \gamma_{G[i]}u_i \]

    • second model investigates the effect of how people apply to departments
  • \(u_j \sim Normal(0, 1)\)

  • \(u_i\) = ability

# sensitivity

dat_sim$D2 <- ifelse( D==2 , 1 , 0 )
dat_sim$b <- c(1,1)
dat_sim$g <- c(1,0)
dat_sim$N <- length(dat_sim$D2)

m3s <- ulam(
    alist( 
        # A model
        A ~ bernoulli(p),
        logit(p) <- a[G,D] + b[G]*u[i],
        matrix[G,D]:a ~ normal(0,1),

        # D model
        D2 ~ bernoulli(q),
        logit(q) <- delta[G] + g[G]*u[i],
        delta[G] ~ normal(0,1),

        # declare unobserved u
        vector[N]:u ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )

precis(m3s,3,pars=c("a","delta"))

post3s <- extract.samples(m3s)
post3s$fm_contrast_D1 <- post3s$a[,1,1] - post3s$a[,2,1]
post3s$fm_contrast_D2 <- post3s$a[,1,2] - post3s$a[,2,2]

dens( post2$fm_contrast_D1 , lwd=1 , col=4 , xlab="F-M contrast in each department" , xlim=c(-2,1) )
dens( post2$fm_contrast_D2 , lwd=1 , col=2 , add=TRUE )
abline(v=0,lty=3)
dens( post3s$fm_contrast_D1 , lwd=4 , col=4 , add=TRUE )
dens( post3s$fm_contrast_D2 , lwd=4 , col=2 , add=TRUE )

plot( jitter(u) , apply(post3s$u,2,mean) , col=ifelse(G==1,2,4) , lwd=3 )
  • you can say the strength of the confound needed to undo the results you found

    • important thing to report - don’t pretend confounds don’t exist

    • can’t eliminate possibility of confounding

  • a lot of the most important science cannot be done experimentally so we need to be able to do these things

Poisson Counts

  • total count is not binomial: no maximum

    • Poisson distribution: very high maximum and very low probability of each success
  • Poisson distribution uses the log link - must be positive

    • exponential scaling can be surprising

    • large priors makes extremely long tails with very large values

    • want higher mean, lower variance

  • \[ Y_i \sim Poisson(\lambda_i) \\ log(\lambda_i) = \alpha + \beta x_i \\ \lambda_i = exp(\alpha + \beta x_i) \]

  • To add contact as an interaction term where C is contact:

    • \[ Y_i \sim Poisson(\lambda_i) \\ log(\lambda_i) = \alpha_{C[i]} + \beta_{C[i]} log(P_i) \\ \alpha_j \sim Normal(3, 0.5) \\ \beta_j \sim Normal(0, 0.2) \]
# model

library(rethinking)
data(Kline)
d <- Kline
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )

dat <- list(
    T = d$total_tools ,
    P = d$P ,
    C = d$contact_id )

# intercept only
m11.9 <- ulam(
    alist(
        T ~ dpois( lambda ),
        log(lambda) <- a,
        a ~ dnorm( 3 , 0.5 )
    ), data=dat , chains=4 , log_lik=TRUE )

# interaction model
m11.10 <- ulam(
    alist(
        T ~ dpois( lambda ),
        log(lambda) <- a[C] + b[C]*P,
        a[C] ~ dnorm( 3 , 0.5 ),
        b[C] ~ dnorm( 0 , 0.2 )
    ), data=dat , chains=4 , log_lik=TRUE )

compare( m11.9 , m11.10 , func=PSIS )

k <- PSIS( m11.10 , pointwise=TRUE )$k
plot( dat$P , dat$T , xlab="log population (std)" , ylab="total tools" ,
    col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
    ylim=c(0,75) , cex=1+normalize(k) )
# set up the horizontal axis values to compute predictions at
P_seq <- seq( from=-1.4 , to=3 , len=100 )

# predictions for C=1 (low contact)
lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( P_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) )

# predictions for C=2 (high contact)
lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( P_seq , lmu , lty=1 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3))

identify( dat$P , dat$T , d$culture )

# natural scale now

plot( d$population , d$total_tools , xlab="population" , ylab="total tools" ,
    col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
    ylim=c(0,75) , cex=1+normalize(k) )
P_seq <- seq( from=-5 , to=3 , length.out=100 )
# 1.53 is sd of log(population)
# 9 is mean of log(population)
pop_seq <- exp( P_seq*1.53 + 9 )
lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( pop_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , pop_seq , xpd=TRUE , col=col.alpha(4,0.3))

lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( pop_seq , lmu , lty=1 , lwd=1.5 )
shade( lci , pop_seq , xpd=TRUE , col=col.alpha(2,0.3) )

identify( d$population , d$total_tools , d$culture )
  • number of effective parameters penalty shows how well the model performs after you drop individual data points

    • therefore models with more parameters often have lower effective parameters
  • gamma-Poisson is the appropriate analog to a student t-test - wider tails and more robust

  • improve scientific model

    • people produce innovations so population and contact both innovate but there is also loss

    • \[ \Delta T = \alpha_C P^{\beta_C} - \gamma T \]

    • \(\Delta T\) = change in tools

    • \(\alpha\) = innovation rate

    • \(\beta\) = diminishing returns

    • \(\gamma\) = loss

    • T = number of tools

    • C = contact

  • set equation to 0 and solve for T then insert into your model

    • \[ \hat{T} = \frac {\alpha_C P^{\beta_C}} {\gamma} \\ T_i \sim Poisson (\lambda_i) \\ \lambda_i = \hat{T} \]

    • must constrain all parameters to be positive

# innovation/loss model

dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id )
m11.11 <- ulam(
    alist(
        T ~ dpois( lambda ),
        lambda <- exp(a[C])*P^b[C]/g,
        a[C] ~ dnorm(1,1),
        b[C] ~ dexp(1),
        g ~ dexp(1)
    ), data=dat2 , chains=4 , cores=4 , log_lik=TRUE )

precis(m11.11,2)

plot( d$population , d$total_tools , xlab="population" , ylab="total tools" ,
    col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
    ylim=c(0,75) , cex=1+normalize(k) )
P_seq <- seq( from=-5 , to=3 , length.out=100 )
  • still need to deal with location as confound

Count GLMs

  • distributions from constraints

  • maximum entropy priors: binomial, Poisson, and extensions

  • robust regressions: beta-binomial, gamma-Poisson

Simpson’s Paradox

  • reversal of an association when groups are combined or separated within the dataset

  • the reversal is purely statistical - no way to understand it without causal assumptions

  • non-linear haunting

    • there’s a positive effect in both groups but it cannot be estimated from this sample

    • just because your distribution overlaps zero, doesn’t mean the effect is null