Lecture 08 - Markov chain Monte Carlo

Author

Isabella C. Richmond

Published

March 8, 2023

Rose / Thorn

Rose:

Thorn:

Modelling Approaches

  • quadratic approximation makes strong assumptions about what the model looks like - approximately Gaussian

  • MCMC is intensive but with less assumptions and more flexible

Markov chain Monte Carlo

  • chain: sequence of draws from distribution

  • Markov chain: history doesn’t matter, just where you are now

  • Monte Carlo: random simulation

Hamiltonian Monte Carlo

  • probability space is essentially a skate park
  • uses random pathways that follow the distribution of the variables
dag <- dagify(
  S ~ Q + J + X,
  Q ~ X,
  J ~ Z,
  outcome = 'S',
  latent = 'Q',
    coords = list(x = c(Q = 0, X = 0, S = 1, J = 1, Z = 2),
                y = c(X = 0, J = 0, Z = 0, Q = 1, S = 1))
)

ggdag(dag) + theme_dag()

  • Estimand: association between wine quality and wine origin. Stratify by judge for efficiency.
data(Wines2012)
d <- Wines2012

dat <- list(
  S = standardize(d$score),
  J = as.numeric(d$judge), 
  W = as.numeric(d$wine),
  X = ifelse(d$wine.amer == 1,1,2),
  Z = ifelse(d$judge.amer == 1,1,2)
)

mQ <- ulam(alist(
  S ~ dnorm(mu, sigma), 
  mu <- Q[W],
  Q[W] ~ dnorm(0, 1), 
  sigma ~ dexp(1)),
  data = dat, chains = 4, cores = 4)

precis(mQ, 2)
  • trace plots: visualization of the Markov chain

  • want more than one chain in order to check convergence

    • convergence: each chain explores the right distribution and the same distribution
  • R-hat is chain convergence diagnostic

    • variance ratio

    • if chains do converge, beginning of the chain and end of chain should be exploring the same place and therefore the chain is stationary

    • as total variance (among chains) approaches average variance within chains, R-hat approaches 1

    • if chains were exploring different regions, the total variance would be bigger and Rhat is larger than 1

    • does not guarantee convergence but gives an idea that the chains are working when ~ 1

  • n_eff = effective samples

    • approximation of how long the chain would be if each sample was completely independent of the one before it (perfectly uncorrelated, truly random)

    • when samples are autocorrelated, you have fewer effective samples

    • typically n_eff is smaller than number of samples you actually took

      • because perfect chain will need fewer samples/be shorter
mQO <- ulam(alist(
  S ~ dnorm(mu, sigma), 
  mu <- Q[W] + O[X],
  Q[W] ~ dnorm(0, 1),
  O[X] ~ dnorm(0, 1),
  sigma ~ dexp(1)),
  data = dat, chains = 4, cores = 4)

plot(precis(mQO, 2))

mQOJ <- ulam(alist(
  S ~ dnorm(mu, sigma), 
  mu <- (Q[W] + O[X] - H[J])*D[J],
  Q[W] ~ dnorm(0, 1),
  O[X] ~ dnorm(0, 1),
  H[J] ~ dnorm(0, 1),
  D[J] ~ dexp(1),
  sigma ~ dexp(1)),
  data = dat, chains = 4, cores = 4)

plot(precis(mQOJ, 2))
  • adding judges adds information to the model and allows for better estimates

  • often if there is an issue, it is with your model

    • loop back to basic scientific questions and assumptions

    • divergent transitions are one of the signs of this - “rejected proposal”