Lecture 18 - Missing Data

Author

Isabella C. Richmond

Published

May 5, 2023

Rose / Thorn

Rose:

Thorn:

Missing Data

observed data is a special case - we trick ourselves into believing there is no error
missing data = some cases unobserved
not totally missing - they have constraints and relationships to other variables

Workflow

dropping cases with missing values is sometimes justifiable
right thing to do depends upon causal assumptions
imputation is often beneficial/necessary
Bayesian imputation: compute posterior probability distribution of missing values
Marginalizing unknowns: averaging over distribution of missing values

Bayesian Imputation

causal model of all variables implies strategy for imputation
sometimes imputation is unnecessary, e.g., discrete parameters
sometimes imputation is easier, e.g., censored observations

Imputing Primates

missing values already have probability distributions
express causal model for each partially-observed variable
replace each missing value with a parameter
not the same as non-Bayesian imputation
- that generates datasets and runs the model multiple times
- this estimates probability distributions using other parameters and relationships
imputation without relationships among predictors is risky