Lecture 10 - Counts & Hidden Confounds
Rose / Thorn
Rose: qualitative data!!
Thorn:
Generalized Linear Models
Expected value is some function (link function) of an additive combination of parameters
interpretation that changes, not computation - no longer simply an additive combination of parameters
- uniform changes in predictor not uniform changes in prediction
all predictor variables interact and moderate one another
inlfuences predictions and uncertainty of predictions
Confounded Admissions
ability is the most easy to imagine confound for admissions
include confound in the model (u):
- Bernoulli variable 0, 1 - 1 is extremely high achieving individuals that are 10% of the population, everyone else is average
gender 1 individuals anticipate gender discrimination in department 2, so if they are average they will avoid it but if they are exceptional they will still apply
# continuing from UCBadmit example
# what happens when there is a confound?
set.seed(17)
<- 2000 # number of applicants
N # even gender distribution
<- sample( 1:2 , size=N , replace=TRUE )
G # sample ability, high (1) to average (0)
# unobserved common cause
<- rbern(N,0.1)
u # gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
<- rbern( N , ifelse( G==1 , u*0.5 , 0.8 ) ) + 1
D # matrix of acceptance rates [dept,gender]
<- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
accept_rate_u0 <- matrix( c(0.2,0.3,0.2,0.5) , nrow=2 )
accept_rate_u1 # simulate acceptance
<- sapply( 1:N , function(i)
p ifelse( u[i]==0 , accept_rate_u0[D[i],G[i]] , accept_rate_u1[D[i],G[i]] ) )
<- rbern( N , p )
A
table(G,D)
table(G,A)
<- list( A=A , D=D , G=G )
dat_sim
# total effect gender
<- ulam(
m1 alist(
~ bernoulli(p),
A logit(p) <- a[G],
~ normal(0,1)
a[G] data=dat_sim , chains=4 , cores=4 )
),
<- extract.samples(m1)
post1 $fm_contrast <- post1$a[,1] - post1$a[,2]
post1precis(post1)
# direct effects - now confounded!
<- ulam(
m2 alist(
~ bernoulli(p),
A logit(p) <- a[G,D],
:a ~ normal(0,1)
matrix[G,D]data=dat_sim , chains=4 , cores=4 )
),
precis(m2,3)
- second model stratifies through department is now confounded because we’ve added a common cause, ability
# contrast
<- extract.samples(m2)
post2 $fm_contrast_D1 <- post2$a[,1,1] - post2$a[,2,1]
post2$fm_contrast_D2 <- post2$a[,1,2] - post2$a[,2,2]
post2precis(post2)
dens( post2$fm_contrast_D1 , lwd=4 , col=4 , xlab="F-M contrast in each department" )
dens( post2$fm_contrast_D2 , lwd=4 , col=2 , add=TRUE )
abline(v=0,lty=3)
dens( post2$a[,1,1] , lwd=4 , col=2 , xlim=c(-3,1) )
dens( post2$a[,2,1] , lwd=4 , col=4 , add=TRUE )
dens( post2$fm_contrast_D1 , lwd=4 , add=TRUE )
dens( post2$a[,1,2] , lwd=4 , col=2 , add=TRUE , lty=4 )
dens( post2$a[,2,2] , lwd=4 , col=4 , add=TRUE , lty=4)
dens( post2$fm_contrast_D2 , lwd=4 , add=TRUE , lty=4)
probability distribution of probabilities because we’re modelling probability
confound happens because exceptional individuals don’t apply to department 1, they apply to department 2 and so it looks like department 1 is discriminating when its not because its getting lower quality applicants
- department 2 appears to have no discrimination but it does have it, for the opposite reason. they are only admitting higher ability applicants of a certain gender
masking effect - sorting can mask a lot of things
Collider Bias
We can estimate total causal effect of G but it isn’t what we want (not useful for policy)
We cannot estimate direct effect of department or gender
Citation vs Membership
paper one concludes that gender results in lower citation rates for women
paper two concludes that women get elected more controlling for citations
Citation Networks
very plausible that there is quality differences among members of the academy but they are often hidden
citations is a post-treatment variable (and stratifying by post-treatment variable is dangerous because it can activate collider bias)
- if women are less likely to get citations because of discrimination, it is probably that women elected to the NAS with less citations have a higher quality
in the absence of strong causal assumptions, we can’t conclude anything
proxies for quality are often poor proxies (e.g., citations)
if you want causal inference, you must make causal assumptions
Sensitivity Analysis
what are the implications of what we don’t know?
assume confound exists, model its consequences for different strengths/kinds of influence
how strong must the confound be to change conclusions?
include quality as an unobserved variable
- change parameter size/strength to test the effect of unobserved confound
\[ A_i \sim Bernoulli(p_i) \\ logit(p_i) = \alpha[G_i, D_i] +\beta_{G[i]}u_i \]
- first model models total effect of gender on admissions with an unknown confound (quality)
\[ (D_i = 2) \sim Bernoulli(q_i) \\ logit(q_i) = \delta[G_i] + \gamma_{G[i]}u_i \]
- second model investigates the effect of how people apply to departments
\(u_j \sim Normal(0, 1)\)
\(u_i\) = ability
# sensitivity
$D2 <- ifelse( D==2 , 1 , 0 )
dat_sim$b <- c(1,1)
dat_sim$g <- c(1,0)
dat_sim$N <- length(dat_sim$D2)
dat_sim
<- ulam(
m3s alist(
# A model
~ bernoulli(p),
A logit(p) <- a[G,D] + b[G]*u[i],
:a ~ normal(0,1),
matrix[G,D]
# D model
~ bernoulli(q),
D2 logit(q) <- delta[G] + g[G]*u[i],
~ normal(0,1),
delta[G]
# declare unobserved u
:u ~ normal(0,1)
vector[N]data=dat_sim , chains=4 , cores=4 )
),
precis(m3s,3,pars=c("a","delta"))
<- extract.samples(m3s)
post3s $fm_contrast_D1 <- post3s$a[,1,1] - post3s$a[,2,1]
post3s$fm_contrast_D2 <- post3s$a[,1,2] - post3s$a[,2,2]
post3s
dens( post2$fm_contrast_D1 , lwd=1 , col=4 , xlab="F-M contrast in each department" , xlim=c(-2,1) )
dens( post2$fm_contrast_D2 , lwd=1 , col=2 , add=TRUE )
abline(v=0,lty=3)
dens( post3s$fm_contrast_D1 , lwd=4 , col=4 , add=TRUE )
dens( post3s$fm_contrast_D2 , lwd=4 , col=2 , add=TRUE )
plot( jitter(u) , apply(post3s$u,2,mean) , col=ifelse(G==1,2,4) , lwd=3 )
you can say the strength of the confound needed to undo the results you found
important thing to report - don’t pretend confounds don’t exist
can’t eliminate possibility of confounding
a lot of the most important science cannot be done experimentally so we need to be able to do these things
Poisson Counts
total count is not binomial: no maximum
- Poisson distribution: very high maximum and very low probability of each success
Poisson distribution uses the log link - must be positive
exponential scaling can be surprising
large priors makes extremely long tails with very large values
want higher mean, lower variance
\[ Y_i \sim Poisson(\lambda_i) \\ log(\lambda_i) = \alpha + \beta x_i \\ \lambda_i = exp(\alpha + \beta x_i) \]
To add contact as an interaction term where C is contact:
- \[ Y_i \sim Poisson(\lambda_i) \\ log(\lambda_i) = \alpha_{C[i]} + \beta_{C[i]} log(P_i) \\ \alpha_j \sim Normal(3, 0.5) \\ \beta_j \sim Normal(0, 0.2) \]
# model
library(rethinking)
data(Kline)
<- Kline
d $P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )
d
<- list(
dat T = d$total_tools ,
P = d$P ,
C = d$contact_id )
# intercept only
.9 <- ulam(
m11alist(
~ dpois( lambda ),
T log(lambda) <- a,
~ dnorm( 3 , 0.5 )
a data=dat , chains=4 , log_lik=TRUE )
),
# interaction model
.10 <- ulam(
m11alist(
~ dpois( lambda ),
T log(lambda) <- a[C] + b[C]*P,
~ dnorm( 3 , 0.5 ),
a[C] ~ dnorm( 0 , 0.2 )
b[C] data=dat , chains=4 , log_lik=TRUE )
),
compare( m11.9 , m11.10 , func=PSIS )
<- PSIS( m11.10 , pointwise=TRUE )$k
k plot( dat$P , dat$T , xlab="log population (std)" , ylab="total tools" ,
col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
ylim=c(0,75) , cex=1+normalize(k) )
# set up the horizontal axis values to compute predictions at
<- seq( from=-1.4 , to=3 , len=100 )
P_seq
# predictions for C=1 (low contact)
<- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
lambda <- apply( lambda , 2 , mean )
lmu <- apply( lambda , 2 , PI )
lci lines( P_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) )
# predictions for C=2 (high contact)
<- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
lambda <- apply( lambda , 2 , mean )
lmu <- apply( lambda , 2 , PI )
lci lines( P_seq , lmu , lty=1 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3))
identify( dat$P , dat$T , d$culture )
# natural scale now
plot( d$population , d$total_tools , xlab="population" , ylab="total tools" ,
col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
ylim=c(0,75) , cex=1+normalize(k) )
<- seq( from=-5 , to=3 , length.out=100 )
P_seq # 1.53 is sd of log(population)
# 9 is mean of log(population)
<- exp( P_seq*1.53 + 9 )
pop_seq <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
lambda <- apply( lambda , 2 , mean )
lmu <- apply( lambda , 2 , PI )
lci lines( pop_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , pop_seq , xpd=TRUE , col=col.alpha(4,0.3))
<- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
lambda <- apply( lambda , 2 , mean )
lmu <- apply( lambda , 2 , PI )
lci lines( pop_seq , lmu , lty=1 , lwd=1.5 )
shade( lci , pop_seq , xpd=TRUE , col=col.alpha(2,0.3) )
identify( d$population , d$total_tools , d$culture )
number of effective parameters penalty shows how well the model performs after you drop individual data points
- therefore models with more parameters often have lower effective parameters
gamma-Poisson is the appropriate analog to a student t-test - wider tails and more robust
improve scientific model
people produce innovations so population and contact both innovate but there is also loss
\[ \Delta T = \alpha_C P^{\beta_C} - \gamma T \]
\(\Delta T\) = change in tools
\(\alpha\) = innovation rate
\(\beta\) = diminishing returns
\(\gamma\) = loss
T = number of tools
C = contact
set equation to 0 and solve for T then insert into your model
\[ \hat{T} = \frac {\alpha_C P^{\beta_C}} {\gamma} \\ T_i \sim Poisson (\lambda_i) \\ \lambda_i = \hat{T} \]
must constrain all parameters to be positive
# innovation/loss model
<- list( T=d$total_tools, P=d$population, C=d$contact_id )
dat2 .11 <- ulam(
m11alist(
~ dpois( lambda ),
T <- exp(a[C])*P^b[C]/g,
lambda ~ dnorm(1,1),
a[C] ~ dexp(1),
b[C] ~ dexp(1)
g data=dat2 , chains=4 , cores=4 , log_lik=TRUE )
),
precis(m11.11,2)
plot( d$population , d$total_tools , xlab="population" , ylab="total tools" ,
col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
ylim=c(0,75) , cex=1+normalize(k) )
<- seq( from=-5 , to=3 , length.out=100 ) P_seq
- still need to deal with location as confound
Count GLMs
distributions from constraints
maximum entropy priors: binomial, Poisson, and extensions
robust regressions: beta-binomial, gamma-Poisson
Simpson’s Paradox
reversal of an association when groups are combined or separated within the dataset
the reversal is purely statistical - no way to understand it without causal assumptions
non-linear haunting
there’s a positive effect in both groups but it cannot be estimated from this sample
just because your distribution overlaps zero, doesn’t mean the effect is null