Homework - Week 03
Q1
The first two problems are based on the same data. The data in
data(foxes)
are 116 foxes from 30 different urban groups in England. These fox groups are like street gangs. Group size (groupsize
) varies from 2 to 8 individuals. Each group maintains its own (almost exclusive) urban territory. Some territories are larger than others. Thearea
variable encodes this information. Some territories also have moreavgfood
than others. And food influences theweight
of each fox. Assume this DAG:
where F is
avgfood
, G isgroupsize
, A isarea
, and W isweight
. Use the backdoor criterion and estimate the total causal influence of A on F. What effect would increasing the area of a territory have on the amount of food inside it?
To estimate the total effect of A on F, we just need to model F ~ A because there are no shared causes.
tar_load(fox)
tar_load(h03_q1_prior)
# look at priors we selected
prior_summary(h03_q1_prior)
prior class coef group resp dpar nlpar lb ub source
normal(0, 0.5) b user
normal(0, 0.5) b area_s (vectorized)
normal(0, 0.5) Intercept user
exponential(1) sigma 0 user
# prior predictive
<- add_epred_draws(fox, h03_q1_prior, ndraws = 100)
epred
ggplot(epred, aes(x = area_s, y = .epred)) +
geom_line(aes(group=.draw), alpha = 0.1) +
labs(y = "Predicted Food (standardized)", x = "Area (standardized)") +
theme_classic()
# posterior distribution
tar_load(h03_q1)
# make dataset sequencing from -2 to 2 (reasonable range of values for standardized numbers)
<- data.frame(area_s = seq(-2, 2, length.out = 1000))
df
<- h03_q1 %>%
epred_data epred_draws(df)
# plot slopes
<- ggplot(data = fox, aes(x = area_s)) +
p stat_lineribbon(aes(x = area_s, y = .epred), data = epred_data) +
geom_jitter(aes(y = avgfood_s), width = .1, height = 0, alpha = 0.7) +
scale_fill_brewer(palette = "Greys") +
theme_classic() +
scale_y_continuous(limits = c(-2, 2), breaks = c(-2, -1, 0.5, 0, 0.5, 1, 2, 5))
# extract plot breaks on x and y axes
<- c(as.numeric(na.omit(layer_scales(p)$x$break_positions())))
atx <- c(as.numeric(na.omit(layer_scales(p)$y$break_positions())))
aty
# unscale axis labels for interpretation
+
p scale_x_continuous(name = "Area",
breaks = atx,
labels = round(atx * sd(fox$area) + mean(fox$area), 1)) +
scale_y_continuous(name = "Average Food",
breaks = aty,
labels = round(aty * sd(fox$avgfood) + mean(fox$avgfood), 1)) +
labs(fill = "Credible Interval")
As area of a territory goes up, average food also increases.
Q2
Infer the total causal effect of adding food F to a territory on the weight W of foxes. Can you calculate the causal effect by simulating an intervention on food?
Total effect of food on weight is estimated by modelling W ~ F (no backdoor paths).
tar_load(fox)
tar_load(h03_q2_prior)
# look at priors we selected
prior_summary(h03_q2_prior)
prior class coef group resp dpar nlpar lb ub source
normal(0, 0.5) b user
normal(0, 0.5) b avgfood_s (vectorized)
normal(0, 0.5) Intercept user
exponential(1) sigma 0 user
# prior predictive
<- add_epred_draws(fox, h03_q2_prior, ndraws = 100)
epred
ggplot(epred, aes(x = avgfood_s, y = .epred)) +
geom_line(aes(group=.draw), alpha = 0.1) +
labs(y = "Predicted Weight (standardized)", x = "Area (standardized)") +
theme_classic()
# posterior distribution
tar_load(h03_q2)
# make dataset sequencing from -2 to 2 (reasonable range of values for standardized numbers)
<- data.frame(avgfood_s = seq(-2, 2, length.out = 1000))
df
<- h03_q2 %>%
epred_data epred_draws(df)
# plot slopes
<- ggplot(data = fox, aes(x = avgfood_s)) +
p stat_lineribbon(aes(x = avgfood_s, y = .epred), data = epred_data) +
geom_jitter(aes(y = weight_s), width = .1, height = 0, alpha = 0.7) +
scale_fill_brewer(palette = "Greys") +
theme_classic() +
scale_y_continuous(limits = c(-2, 2), breaks = c(-2, -1, 0.5, 0, 0.5, 1, 2, 5))
# extract plot breaks on x and y axes
<- c(as.numeric(na.omit(layer_scales(p)$x$break_positions())))
atx <- c(as.numeric(na.omit(layer_scales(p)$y$break_positions())))
aty
# unscale axis labels for interpretation
+
p scale_x_continuous(name = "Average Food",
breaks = atx,
labels = round(atx * sd(fox$avgfood) + mean(fox$avgfood), 1)) +
scale_y_continuous(name = "Weight",
breaks = aty,
labels = round(aty * sd(fox$weight) + mean(fox$weight), 1)) +
labs(fill = "Credible Interval")
# does slope change at different values of food?
%>%
h03_q2 emtrends(~ avgfood_s,
var = "avgfood_s",
at = list(avgfood_s = c(0, 2)),
epred = TRUE, re_formula = NA)
avgfood_s avgfood_s.trend lower.HPD upper.HPD
0 -0.0232 -0.205 0.161
2 -0.0232 -0.205 0.161
Point estimate displayed: median
HPD interval probability: 0.95
<- h03_q2 %>%
cont emtrends(~ avgfood_s,
var = "avgfood_s",
at = list(avgfood_s = c(0, 2)),
epred = TRUE, re_formula = NA) %>%
gather_emmeans_draws()
ggplot(cont, aes(x = .value, fill = factor(avgfood_s))) +
stat_halfeye(slab_alpha = 0.75) +
scale_fill_manual(values = c('darkseagreen', 'lightblue')) +
labs(y = "Density", x = "Average marginal effect of a unit increase in avgfood", fill = "avgfood (standardized)") +
theme_classic() +
theme(legend.position = "bottom")
# NO! Linear model, slope doesn't change across different values
# simulate an intervention: calculate the contrast between a lot of food and average amount of food
<- data.frame(avgfood_s = 0)
mf <- data.frame(avgfood_s = 1)
hf
<- h03_q2 %>%
med epred_draws(newdata = mf)
<- h03_q2 %>%
high epred_draws(newdata = hf)
<- data.frame(cont = med$.epred - high$.epred)
cont
ggplot(cont, aes(x = cont)) +
stat_halfeye() +
labs(x = "avgfood_s of 0 - avgfood_s of 1", y = "") +
theme_classic()
Total effect of food on weight is very small, we see very little effect of increasing food
Q3
Infer the direct causal effect of adding food F to a territory on the weight W of foxes. In light of your estimates from this problem and the previous one, what do you think is going on with these foxes?
Direct causal effect of food on weight requires stratifying by groupsize (G).
tar_load(h03_q3)
# make dataset sequencing from -2 to 2 (reasonable range of values for standardized numbers)
# hold groupsize constant (at average level)
<- data.frame(avgfood_s = seq(-2, 2, length.out = 1000),
df groupsize_s = 0)
<- h03_q3 %>%
epred_data epred_draws(df)
# plot slopes
<- ggplot(data = fox, aes(x = avgfood_s)) +
p stat_lineribbon(aes(x = avgfood_s, y = .epred), data = epred_data) +
geom_jitter(aes(y = weight_s), width = .1, height = 0, alpha = 0.7) +
scale_fill_brewer(palette = "Greys") +
theme_classic() +
scale_y_continuous(limits = c(-2, 2), breaks = c(-2, -1, 0.5, 0, 0.5, 1, 2, 5))
# extract plot breaks on x and y axes
<- c(as.numeric(na.omit(layer_scales(p)$x$break_positions())))
atx <- c(as.numeric(na.omit(layer_scales(p)$y$break_positions())))
aty
# unscale axis labels for interpretation
+
p scale_x_continuous(name = "Average Food",
breaks = atx,
labels = round(atx * sd(fox$avgfood) + mean(fox$avgfood), 1)) +
scale_y_continuous(name = "Weight",
breaks = aty,
labels = round(aty * sd(fox$weight) + mean(fox$weight), 1)) +
labs(fill = "Credible Interval", title = "Group size held constant")
When we stratify by group size, the direct effect of food on weight is strongly positive. Group size may diminish the total effect of food on weight because as avgfood increases, group size increases and the same amount of food is available for each individual, resulting in no change in weight.
Q4 (optional)
Suppose there is an unobserved confound that influences F and G, like this:
Assuming the DAG above is correct, again estimate both the total and direct causal effects of F on W. What impact does the unobserved confound have?
Total causal effect is impossible to estimate, because you can’t close the backdoor path through U. Direct causal effect is the same, estimated by stratifying on G.