Lecture 13 - Multilevel Adventures
Rose / Thorn
Rose:
Thorn:
Multilevel Adventures
cluster: kinds of groups in the data
features: aspets of the model (parameters) that vary by cluster
cluster (tanks) -> features (survival)
add clusters = more index (categorical) variables, more population priors
add features = more parameters, more dimensions in each population prior
Varying Effects
a way for us to try and estimate unmeasured confounds
varying effect strategy: unmeasured features of clusters leave an imprint on the data that can be measured by 1) repeat observations of each cluster and 2) partial pooling among clusters
predictive perspective: important source of cluster-level variation, regularize
causal perspective: competing causes or unobserved confounds
- varying effects can help us address unknown/unobserved confounds because we use repeated observations of each group to capture this
interested in varying effects from a predictive (regularization) and a causal perspective (estimate unobserved confounds)
fixed effects: varying effects with variance (sd) at infinity, no pooling
Practical Difficulties
varying effects are a good default but come with difficulties
how to use more than one cluster type at the same time?
how to calculate predictions (are you predicting new clusters or new conditions within previously explored clusters)
how to sample chains efficiently
group-level confounding
Varying Districts
cluster by district
estimand: C in each district, partially pooled
varying intercept on each district
- try to get cluster set up first, then worry about causal effects
\(C_i \sim Bernoulli(p_i)\)
\(logit(p_i) = \alpha_{D[i]}\)
\(\alpha_j \sim Normal(\bar{\alpha}, \sigma)\)
\(\bar{\alpha} \sim Normal(0,1)\)
\(\sigma \sim Exponential(1)\)
# simple varying intercepts model
library(rethinking)
data(bangladesh)
<- bangladesh
d
<- list(
dat C = d$use.contraception,
D = as.integer(d$district) )
<- ulam(
mCD alist(
~ bernoulli(p),
C logit(p) <- a[D],
61]:a ~ normal(abar,sigma),
vector[~ normal(0,1),
abar ~ exponential(1)
sigma data=dat , chains=4 , cores=4 )
) ,
# plot estimates
<- link( mCD , data=list(D=1:61) )
p # blank2(w=2)
plot( NULL , xlab="district" , lwd=3 , col=2 , xlim=c(1,61), ylim=c(0,1) , ylab="prob use contraception" )
points( 1:61 , apply(p,2,mean) , xlab="district" , lwd=3 , col=2 , ylim=c(0,1) , ylab="prob use contraception" )
for ( i in 1:61 ) lines( c(i,i) , PI(p[,i]) , lwd=8 , col=col.alpha(2,0.5) )
# show raw proportions - have to skip 54
<- table(dat$D)
n <- xtabs(dat$C ~ dat$D)
Cn <- as.numeric( Cn/n )
pC <- c( pC[1:53] , NA , pC[54:60] )
pC points( pC , lwd=2 )
# only some labels via locator
<- table(dat$D)
n <- as.numeric(n)
n <- c( n[1:53] , 0 , n[54:60] )
n identify( 1:61 , pC , labels=n , cex=1 )
partial pooling shrinks districts with low sampling numbers
towards mean
- better predictions
what is the effect of urban living? District features are potential group-level confounds
each district
\(C_i \sim Bernoulli(p_i)\)
\(logit(p_i) = \alpha_{D[i]} + \beta_{D[i]}U_i\)
\(\alpha_j \sim Normal(\bar{\alpha}, \sigma)\) = regularizing prior for rural
\(\beta_j \sim Normal(\bar{\beta}, \tau)\) = regularizing prior for urban effect
\(\bar{\alpha}, \bar{\beta} \sim Normal(0,1)\) = averages
\(\sigma, \tau \sim Exponential(1)\) = standard deviations
<- list(
dat C = d$use.contraception,
D = as.integer(d$district),
U = ifelse(d$urban==1,1,0) )
# total U
<- ulam(
mCDU alist(
~ bernoulli(p),
C logit(p) <- a[D] + b[D]*U,
61]:a ~ normal(abar,sigma),
vector[61]:b ~ normal(bbar,tau),
vector[c(abar,bbar) ~ normal(0,1),
c(sigma,tau) ~ exponential(1)
data=dat , chains=4 , cores=4 )
) ,
traceplot(mCDU,pars="tau",lwd=2,n_cols=1)
trankplot(mCDU,pars="tau",lwd=3,n_cols=1)
# non-centered version
<- ulam(
mCDUnc alist(
~ bernoulli(p),
C logit(p) <- a[D] + b[D]*U,
# define effects using other parameters
> vector[61]:a <<- abar + za*sigma,
save> vector[61]:b <<- bbar + zb*tau,
save# z-scored effects
61]:za ~ normal(0,1),
vector[61]:zb ~ normal(0,1),
vector[# ye olde hyper-priors
c(abar,bbar) ~ normal(0,1),
c(sigma,tau) ~ exponential(1)
data=dat , chains=4 , cores=4 )
) ,
# plot estimates
<- 0
Uval <- ifelse(Uval==0,2,4)
xcol <- link( mCDUnc , data=list(D=1:61,U=rep(Uval,61)) )
p # blank2(w=2,h=0.8)
plot( NULL , xlab="district" , lwd=3 , col=2 , xlim=c(1,61), ylim=c(0,1) , ylab="prob use contraception" )
abline(h=0.5,lty=2,lwd=0.5)
points( 1:61 , apply(p,2,mean) , xlab="district" , lwd=3 , col=xcol , ylim=c(0,1) , ylab="prob use contraception" )
for ( i in 1:61 ) lines( c(i,i) , PI(p[,i]) , lwd=8 , col=col.alpha(xcol,0.5) )
# show raw proportions - have to skip 54
<- table(dat$D,dat$U)
n <- xtabs(dat$C ~ dat$D + dat$U)
Cn <- as.numeric( Cn[,Uval+1]/n[,Uval+1] )
pC <- c( pC[1:53] , NA , pC[54:60] )
pC points( pC , lwd=2 )
# only some labels via locator
<- as.numeric(n[,Uval+1])
nn <- c( nn[1:53] , 0 , nn[54:60] )
nn identify( 1:61 , pC , labels=nn , cex=1 )
# show standard deviations
<- extract.samples(mCDUnc)
post dens(post$sigma,xlab="posterior standard deviation",lwd=3,col=2,xlim=c(0,1.2))
dens(post$tau,lwd=3,col=4,add=TRUE,adj=0.2)
curve(dexp(x,1),from=0,to=1.3,add=TRUE,lwd=2,lty=2)
priors inside of priors is good for models but can create ill-fitting models
can use z-scores to re-parameterize the model so that the model doesn’t have any hyperparameters
- “non-centered”
the more you cut up the data because of different varying effects, the sample sizes will inevitably get smaller -> partial pooling really useful because it guards against overfitting