Lecture 13 - Multilevel Adventures

Author

Isabella C. Richmond

Published

April 17, 2024

Rose / Thorn

Rose:

Thorn:

Multilevel Adventures

  • cluster: kinds of groups in the data

  • features: aspets of the model (parameters) that vary by cluster

  • cluster (tanks) -> features (survival)

  • add clusters = more index (categorical) variables, more population priors

  • add features = more parameters, more dimensions in each population prior

Varying Effects

  • a way for us to try and estimate unmeasured confounds

  • varying effect strategy: unmeasured features of clusters leave an imprint on the data that can be measured by 1) repeat observations of each cluster and 2) partial pooling among clusters

  • predictive perspective: important source of cluster-level variation, regularize

  • causal perspective: competing causes or unobserved confounds

    • varying effects can help us address unknown/unobserved confounds because we use repeated observations of each group to capture this
  • interested in varying effects from a predictive (regularization) and a causal perspective (estimate unobserved confounds)

  • fixed effects: varying effects with variance (sd) at infinity, no pooling

Practical Difficulties

  • varying effects are a good default but come with difficulties

    • how to use more than one cluster type at the same time?

    • how to calculate predictions (are you predicting new clusters or new conditions within previously explored clusters)

    • how to sample chains efficiently

    • group-level confounding

Varying Districts

  • cluster by district

  • estimand: C in each district, partially pooled

  • varying intercept on each district

    • try to get cluster set up first, then worry about causal effects
  • \(C_i \sim Bernoulli(p_i)\)

  • \(logit(p_i) = \alpha_{D[i]}\)

  • \(\alpha_j \sim Normal(\bar{\alpha}, \sigma)\)

  • \(\bar{\alpha} \sim Normal(0,1)\)

  • \(\sigma \sim Exponential(1)\)

# simple varying intercepts model
library(rethinking)
data(bangladesh)
d <- bangladesh

dat <- list(
    C = d$use.contraception,
    D = as.integer(d$district) )

mCD <- ulam(
    alist(
        C ~ bernoulli(p),
        logit(p) <- a[D],
        vector[61]:a ~ normal(abar,sigma),
        abar ~ normal(0,1),
        sigma ~ exponential(1)
    ) , data=dat , chains=4 , cores=4 )


# plot estimates
p <- link( mCD , data=list(D=1:61) )
# blank2(w=2)
plot( NULL , xlab="district" , lwd=3 , col=2 , xlim=c(1,61), ylim=c(0,1) , ylab="prob use contraception" )

points( 1:61 , apply(p,2,mean) , xlab="district" , lwd=3 , col=2 , ylim=c(0,1) , ylab="prob use contraception" )

 for ( i in 1:61 ) lines( c(i,i) , PI(p[,i]) , lwd=8 , col=col.alpha(2,0.5) )

# show raw proportions - have to skip 54
n <- table(dat$D)
Cn <- xtabs(dat$C ~ dat$D)
pC <- as.numeric( Cn/n )
pC <- c( pC[1:53] , NA , pC[54:60] )
points( pC , lwd=2 )

# only some labels via locator
n <- table(dat$D)
n <- as.numeric(n)
n <- c( n[1:53] , 0 , n[54:60] )
identify( 1:61 , pC , labels=n , cex=1 )
  • partial pooling shrinks districts with low sampling numbers

  • towards mean

    • better predictions
  • what is the effect of urban living? District features are potential group-level confounds

  • each district

  • \(C_i \sim Bernoulli(p_i)\)

  • \(logit(p_i) = \alpha_{D[i]} + \beta_{D[i]}U_i\)

  • \(\alpha_j \sim Normal(\bar{\alpha}, \sigma)\) = regularizing prior for rural

  • \(\beta_j \sim Normal(\bar{\beta}, \tau)\) = regularizing prior for urban effect

  • \(\bar{\alpha}, \bar{\beta} \sim Normal(0,1)\) = averages

  • \(\sigma, \tau \sim Exponential(1)\) = standard deviations

dat <- list(
    C = d$use.contraception,
    D = as.integer(d$district),
    U = ifelse(d$urban==1,1,0) )

# total U
mCDU <- ulam(
    alist(
        C ~ bernoulli(p),
        logit(p) <- a[D] + b[D]*U,
        vector[61]:a ~ normal(abar,sigma),
        vector[61]:b ~ normal(bbar,tau),
        c(abar,bbar) ~ normal(0,1),
        c(sigma,tau) ~ exponential(1)
    ) , data=dat , chains=4 , cores=4 )

traceplot(mCDU,pars="tau",lwd=2,n_cols=1)
trankplot(mCDU,pars="tau",lwd=3,n_cols=1)

# non-centered version
mCDUnc <- ulam(
    alist(
        C ~ bernoulli(p),
        logit(p) <- a[D] + b[D]*U,
        # define effects using other parameters
        save> vector[61]:a <<- abar + za*sigma,
        save> vector[61]:b <<- bbar + zb*tau,
        # z-scored effects
        vector[61]:za ~ normal(0,1),
        vector[61]:zb ~ normal(0,1),
        # ye olde hyper-priors
        c(abar,bbar) ~ normal(0,1),
        c(sigma,tau) ~ exponential(1)
    ) , data=dat , chains=4 , cores=4 )

# plot estimates

Uval <- 0
xcol <- ifelse(Uval==0,2,4)
p <- link( mCDUnc , data=list(D=1:61,U=rep(Uval,61)) )
# blank2(w=2,h=0.8)
plot( NULL , xlab="district" , lwd=3 , col=2 , xlim=c(1,61), ylim=c(0,1) , ylab="prob use contraception" )
abline(h=0.5,lty=2,lwd=0.5)

points( 1:61 , apply(p,2,mean) , xlab="district" , lwd=3 , col=xcol , ylim=c(0,1) , ylab="prob use contraception" )

 for ( i in 1:61 ) lines( c(i,i) , PI(p[,i]) , lwd=8 , col=col.alpha(xcol,0.5) )

# show raw proportions - have to skip 54
n <- table(dat$D,dat$U)
Cn <- xtabs(dat$C ~ dat$D + dat$U)
pC <- as.numeric( Cn[,Uval+1]/n[,Uval+1] )
pC <- c( pC[1:53] , NA , pC[54:60] )
points( pC , lwd=2 )

# only some labels via locator
nn <- as.numeric(n[,Uval+1])
nn <- c( nn[1:53] , 0 , nn[54:60] )
identify( 1:61 , pC , labels=nn , cex=1 )

# show standard deviations
post <- extract.samples(mCDUnc)
dens(post$sigma,xlab="posterior standard deviation",lwd=3,col=2,xlim=c(0,1.2))
dens(post$tau,lwd=3,col=4,add=TRUE,adj=0.2)
curve(dexp(x,1),from=0,to=1.3,add=TRUE,lwd=2,lty=2)
  • priors inside of priors is good for models but can create ill-fitting models

  • can use z-scores to re-parameterize the model so that the model doesn’t have any hyperparameters

    • “non-centered”
  • the more you cut up the data because of different varying effects, the sample sizes will inevitably get smaller -> partial pooling really useful because it guards against overfitting