Lecture 09 - Modeling Events

Author

Isabella C. Richmond

Published

March 20, 2024

Rose / Thorn

Rose:

Thorn:

Modeling Events

NOTE: read causal foundations of bias, fairness, and …

  • observations are counts

  • unknowns are possibilities, odds

  • everything is interacting

  • “log-odds” scale

Estimand: what is the effect of gender on university admissions?

  • perception of gender by admissions officer is what truly matters here

  • total effect of discrimination is what people experience

  • direct discrimination: status-based (referees are biased for or against certain gender)

  • indirect discrimination: structural (gender influences what discipline you are interested in and that discipline is underfunded)

  • quite often, the thing we are able to estimate is not what we want (total effects are easier to estimate but subpaths are interesting and important to understanding equity)

  • many unobserved confounds in this setup

Generative Model:

N <- 1000 # number of applicants
G <- sample(1:2, size = N, replace = T) # gender distribution
D <- rbern(N, ifelse(G==1, 0.3, 0.8)) + 1 #gener 1 tends to apply to appartment 1, 2 to 2
accept_rate <- matrix(c(0.1, 0.3, 0.1, 0.3), nrow = 2) # matrix of acceptance rates [dept, gender]
A <- rbern(N, accept_rate[D,G]) # simulate acceptance

Modeling Events

  • we observe: count of events

  • we estimate: probability (or log-odds) of events

    • like the globe tossing model, but need “proportion of water” stratified by other variables

Generalized Linear Models

  • linear models: expected value is additive “linear” combination of parameters

  • generalized linear models: expected value is some function of an additive combination of parameters

    • works because normal distribution is unbounded on both sides

    • when you have probabilities, which are bounded by 0 and 1, you make the expected value a function of the additive model

    • choosing the function so that it works well is the art of linear models

\[ Y_i \sim Bernoulli (p_i) \\ f(p_i) = \alpha + \beta_XX_i + \beta_Z Z_i \\ p_i = f^{-1}(\alpha + \beta_XX_i + \beta_Z Z_i ) \]

  • p = probability of the event
  • f is the link function - links parameters of distribution to linear model

  • f-1 is the inverse link function

    • the probability = inverse link function
  • distributions: relative number of ways to observe data, given assumptions about rates, probabilities, slopes, etc.

  • distributions are matched to constraints on observed variables

  • link functions are matched to distributions, essentially automatically matched to distributions

  • exponential distribution - the time to an event that has a constant rate. Values you sample from exponential distribution are latencies/rates (continuous data above one)

    • process that produces events and then you count those events = binomial distribution

    • don’t know maximum number of counts = poisson (special case of binomial)

    • sum exponential processes = gamma

    • large means of gamma distribution = normal distribution

  • distribution you want is governed by the constraints you assume about your variable

    • you cannot test if your data are normal

    • need to think about constraints and match to distributions

Logistic Priors

  • logit link compresses parameter distributions

  • on log-odds scale, above +4 = almost always, below -4 = almost never

    • therefore, tighter priors are better

Statistical Model:

  • \(A_i \sim Bernoulli(p_i)\)

  • \(logit(p_i) = \alpha [G_i, D_i]\)

    • this creates a matrix
  • \(\alpha_{j,k} \sim Normal(0,1)\)

# generative model, basic mediator scenario

N <- 1000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# gender 1 tends to apply to department 1, 2 to 2
D <- rbern( N , ifelse( G==1 , 0.3 , 0.8 ) ) + 1
# matrix of acceptance rates [dept,gender]
accept_rate <- matrix( c(0.1,0.3,0.1,0.3) , nrow=2 )
accept_rate <- matrix( c(0.05,0.2,0.1,0.3) , nrow=2 )
# simulate acceptance
p <- sapply( 1:N , function(i) accept_rate[D[i],G[i]] )
A <- rbern( N , p )

# total effect gender
dat_sim <- list( A=A , D=D , G=G )

m1 <- ulam(
    alist(
        A ~ bernoulli(p),
        logit(p) <- a[G],
        a[G] ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )

precis(m1,depth=2)

# direct effects
m2 <- ulam(
    alist(
        A ~ bernoulli(p),
        logit(p) <- a[G,D],
        matrix[G,D]:a ~ normal(0,1)
    ), data=dat_sim , chains=4 , cores=4 )

precis(m2,depth=3)

# aggregate the dat_sim for binomial instead of Bernoulli

x <- as.data.frame(cbind( A=dat_sim$A , G=dat_sim$G , D=dat_sim$D  ))
head(x,20)

dat_sim2 <- aggregate( A ~ G + D , dat_sim , sum )
dat_sim2$N <- aggregate( A ~ G + D , dat_sim , length )$A

m2_bin <- ulam(
    alist(
        A ~ binomial(N,p),
        logit(p) <- a[G,D],
        matrix[G,D]:a ~ normal(0,1)
    ), data=dat_sim2 , chains=4 , cores=4 )

precis(m2_bin,3)

Logistic & Binomial Regression with Data

  • logistic is Bernoulli - binary 0, 1

  • binomial is count 0, N

  • both have logit link and are equivalent for inference

data(UCBadmit)
d <- UCBadmit

dat <- list( 
    A = d$admit,
    N = d$applications,
    G = ifelse(d$applicant.gender=="female",1,2),
    D = as.integer(d$dept)
)

# total effect gender
mG <- ulam(
    alist(
        A ~ binomial(N,p),
        logit(p) <- a[G],
        a[G] ~ normal(0,1)
    ), data=dat , chains=4 , cores=4 )

precis(mG,2)

# direct effects
mGD <- ulam(
    alist(
        A ~ binomial(N,p),
        logit(p) <- a[G,D],
        matrix[G,D]:a ~ normal(0,1)
    ), data=dat , chains=4 , cores=4 )

precis(mGD,3)

# check chains

traceplot(mGD)
trankplot(mGD)

# contrasts
# on probability scale

post1 <- extract.samples(mG)
PrA_G1 <- inv_logit( post1$a[,1] )
PrA_G2 <- inv_logit( post1$a[,2] )
diff_prob <- PrA_G1 - PrA_G2
dens(diff_prob,lwd=4,col=2,xlab="Gender contrast (probability)")

post2 <- extract.samples(mGD)
PrA <- inv_logit( post2$a ) 
diff_prob_D_ <- sapply( 1:6 , function(i) PrA[,1,i] - PrA[,2,i] )
plot(NULL,xlim=c(-0.2,0.3),ylim=c(0,25),xlab="Gender contrast (probability)",ylab="Density")
for ( i in 1:6 ) dens( diff_prob_D_[,i] , lwd=4 , col=1+i , add=TRUE )
abline(v=0,lty=3)


# marginal effect of gender perception (direct effect)

# compute department weights via simulation
# we can just compute predictions as if all applications had been perceived as men
# and then again as if all had been perceived as women
# difference is marginal effect of perception, beause does not change department assignments (G -> A only, no G -> D)

# OLD WRONG CODE!
#p_G1 <- link( mGD , data=list(N=dat$N,D=dat$D,G=rep(1,12)) )
#p_G2 <- link( mGD , data=list(N=dat$N,D=dat$D,G=rep(2,12)) )

# NEW CORRECT CODE

# number of applicatons to simulate
total_apps <- sum(dat$N)

# number of applications per department
apps_per_dept <- sapply( 1:6 , function(i) sum(dat$N[dat$D==i]) )

# simulate as if all apps from women
p_G1 <- link(mGD,data=list(
    D=rep(1:6,times=apps_per_dept),
    N=rep(1,total_apps),
    G=rep(1,total_apps)))

# simulate as if all apps from men
p_G2 <- link(mGD,data=list(
    D=rep(1:6,times=apps_per_dept),
    N=rep(1,total_apps),
    G=rep(2,total_apps)))

# summarize
dens( p_G1 - p_G2 , lwd=4 , col=2 , xlab="effect of gender perception" )
abline(v=0,lty=3)

# show each dept density with weight as in population
w <- xtabs( dat$N ~ dat$D ) / sum(dat$N)
w <- w/max(w)
plot(NULL,xlim=c(-0.2,0.3),ylim=c(0,25),xlab="Gender contrast (probability)",ylab="Density")
for ( i in 1:6 ) dens( diff_prob_D_[,i] , lwd=2+8*w[i]^3 , col=1+i , add=TRUE )
abline(v=0,lty=3)

Post-Stratification

  • description, prediction, and causal inference often require post-stratification

  • way to predict what the intervention will do to a specific population

  • post-stratification = re-weighting estimates for target population

  • for university example, we would reweight the application numbers to each department

Discrimination

  • there are large structural effects

  • distribution of applications can also be a consequence of discrimination but data cannot respond to this question

  • confounds here are likely

Survival Analysis

  • another way of modelling events but we care about the time it took for an event to happen instead of number of times it happened

  • cannot ignore censored cases (left-censored = don’t know when time started, right-censored = observation ended before event)

    • ignoring leads to inferential error
  • exponential or gamma distribution

    • exponential for single part failure

    • gamma for multiple part failure

  • inverse rate is average waiting time in this example

  • \[ D_i \sim Exponential(\lambda_i) \\ p(D_i|\lambda_i) = \lambda_iexp(-\lambda_iD_i) \]

  • If the event happened we use the cumulative distribution (CDF) from the exponential - probability event before-or-at time x

  • if the event didn’t happen yet (censored) use complementary cumulative distribution (CCDF) - probability not-event before-or-at time x

    • for any time on the x axis, how many cats have not yet been adopted
  • \[ D_i |A_i = 1\sim Exponential(\lambda_i) \\ D_i|A_i = 0 \sim Exponential-CCDF(\lambda_i) \\ \lambda_i = 1/\mu_i \\ log\ \mu_i = \alpha_{CID[i]} \]

  • First line is observed adoptions, second line is not-yet adoptions

  • CID = colour id

  • \(1/\mu_i\) is to get average waiting time for each cat

  • D = days at shelter, A = day at adoption