Introduction
Formal model comparison in bayesma uses predictive accuracy criteria. This vignette covers the widely applicable information criterion (WAIC), LOO information criterion (LOO-IC), and leave-one-study-out cross-validation (LOSO-CV), explaining when to use each and how to interpret the output.
For the comparison workflow and function reference, see Model Comparison & Diagnostics.
WAIC
WAIC (Watanabe, 2010) approximates the expected log predictive density for new data:
The first term is the in-sample log predictive density; the second term penalises model complexity (effective number of parameters). Lower WAIC is better.
WAIC is fully Bayesian, using the entire posterior distribution rather than a point estimate. It can be computed directly from MCMC samples without refitting.
fit$waic()LOO-IC
LOO-IC approximates leave-one-observation-out cross-validation using Pareto-smoothed importance sampling (PSIS-LOO; Vehtari et al., 2017):
PSIS-LOO is more reliable than WAIC for heavy-tailed posteriors and provides per-observation diagnostics (Pareto ).
fit$loo()Pareto values: - : LOO reliable - : LOO somewhat reliable - : LOO unreliable for this observation; use LOSO-CV
LOSO-CV
LOSO-CV (leave-one-study-out) refits the model times, each time withholding one study and predicting it from the remaining studies. It is the gold standard for meta-analytic model comparison because:
- It is exact (no approximation).
- It is defined on the effect-size scale, enabling comparison across one-stage and two-stage models.
- It naturally handles influential studies (high- observations in LOO).
compare_models(model1, model2, criterion = "loso")LOSO-CV is computationally expensive ( additional fits per model). Use it for the primary model comparison after candidates have been screened with LOO-IC.
Comparing models
comp <- compare_models(
"Common effect" = fit_ce,
"RE Gaussian" = fit_re,
"RE Student-t" = fit_t,
criterion = "loso"
)
compare_table(comp)
compare_plot(comp)Interpreting compare_table()
| Column | Meaning |
|---|---|
loso_crps |
Mean LOSO-CRPS (lower = better) |
delta_crps |
Difference from best model |
se_delta |
SE of the CRPS difference |
coverage_50/80/90/95 |
Empirical PI coverage at each nominal level |
A model is preferred if its loso_crps is lowest AND its coverage is well-calibrated (close to the nominal levels). A model with lower CRPS but miscalibrated coverage should be investigated further.
Comparing non-nested models
WAIC and LOO-IC differences between non-nested models (e.g., random-effects Gaussian vs random-effects Student-) can be tested using the standard error of the difference:
A large indicates the difference is unlikely to be sampling noise. However, the null hypothesis (the two models are equally good) should be interpreted cautiously: a small CRPS difference may be practically negligible even if statistically significant.
When model comparison is not appropriate
-
Nested models where the simpler model is the null. Use Bayes factors (via
robma()) instead of LOO-IC for testing vs . - Non-comparable likelihoods. LOO-IC on different likelihoods (e.g., log-OR vs log-RR) are not comparable; use LOSO-CV on a common effect scale.
- Bias-corrected vs uncorrected models. These models are answering different questions; model comparison is misleading. Use sensitivity analysis instead.
