Selection models for publication bias • bayesma

Introduction

Selection models address publication bias by explicitly modelling the mechanism by which studies enter (or fail to enter) the published literature. Rather than adjusting the effect estimate via regression, selection models embed the publication process into the likelihood, correcting inference about $\mu$ by down-weighting observations that are over-represented due to selection.

bayesma implements two complementary families:

The Copas selection model — a latent-variable model in which selection depends on study precision.
Vevea-Hedges weight-function models — a step-function model in which selection depends on the reported p-value.

Both are available through bayesma().

Copas selection model

Mechanism

Copas & Shi (2000) model each study as having been published if a latent selection variable exceeds zero:

$Z_i = \gamma_0 + \frac{\gamma_1}{SE_i} + \delta_i, \qquad \delta_i \sim \mathcal{N}(0, 1)$

Study $i$ is observed if $Z_i > 0$ . The selection parameters $(\gamma_0, \gamma_1)$ control the shape of the selection mechanism:

$\gamma_0$ is a baseline selection intercept (more negative → more severe selection overall).
$\gamma_1$ governs how much easier large-precision (small- $SE$ ) studies are to publish. Positive $\gamma_1$ means precise studies are more likely to be published regardless of their result.

The selection variable $Z_i$ is correlated with the observed effect $y_i$ through a joint bivariate normal:

$\begin{pmatrix} y_i \\ Z_i \end{pmatrix} \sim \mathcal{N}\!\left(\begin{pmatrix} \theta_i \\ \gamma_0 + \gamma_1 / SE_i \end{pmatrix},\; \begin{pmatrix} s_i^2 & \rho\,s_i \\ \rho\,s_i & 1 \end{pmatrix}\right)$

The correlation $\rho$ captures the tendency for studies with large effects to be published (effect-dependent selection). Under the null $\rho = 0$ , selection depends only on precision, not on the effect magnitude.

Priors

$\mu \sim \mathcal{N}(0, 1), \qquad \tau \sim \text{Half-Cauchy}(0, 0.5)$

$\gamma_0, \gamma_1 \sim \mathcal{N}(0, 1), \qquad \rho \sim \text{Uniform}(-1, 1)$

Informative priors can be supplied via prior_copas():

#| eval: false
fit_copas <- bayesma(
  data,
  model        = "copas",
  prior_copas  = prior_copas(gamma0 = normal(0, 1), rho = uniform(-1, 0))
)

Restricting $\rho \leq 0$ encodes the expectation that publication bias inflates effects upward.

Vevea-Hedges weight-function models

Mechanism

Vevea & Hedges (1995) model selection as a step function of the two-tailed p-value. Studies are retained with probability $\omega_j$ if their p-value falls in interval $j$ :

$p(\text{study observed} \mid p_i) = \omega_j, \quad p_i \in (a_{j-1},\; a_j]$

The likelihood is reweighted so that over-represented p-value regions (e.g., $p < .05$ ) contribute proportionally less to the pooled estimate.

bayesma implements the one-sided step function by default, with cut-points at $.025,\; .05,\; .10,\; .25,\; .50,\; 1.0$ :

$\omega_1 \geq \omega_2 \geq \cdots \geq \omega_6 = 1$

The final weight $\omega_6 = 1$ is fixed as the reference (non-significant studies), and each $\omega_j$ gives the relative probability that a study in p-value category $j$ is published compared to a study with $p > .50$ .

Priors

The weight parameters are assigned a Dirichlet prior to ensure proper identification:

$(\omega_1, \ldots, \omega_5) \sim \text{Dirichlet}(\mathbf{1})$

Custom weights can be specified via prior_weight_function().

One-sided vs. two-sided step function

#| eval: false
fit_wf1 <- bayesma(
  data,
  model       = "weight_function",
  weight_type = "one_sided"   # default
)

fit_wf2 <- bayesma(
  data,
  model       = "weight_function",
  weight_type = "two_sided"
)

The two-sided version uses the absolute p-value and is more appropriate when effects can be in either direction.

Fitting and comparing selection models

#| eval: false
fit_re    <- bayesma(data)
fit_copas <- bayesma(data, model = "copas")
fit_wf    <- bayesma(data, model = "weight_function")

compare_models(fit_re, fit_copas, fit_wf,
               labels = c("Random-effects", "Copas", "Weight function"))

compare_models() overlays the posteriors for $\mu$ under each model, making it easy to see how much the publication-bias correction shifts the estimate and whether the two selection models agree.

Interpreting the output

For the Copas model, summary() returns:

Parameter	Interpretation
`mu`	Bias-corrected pooled effect
`tau`	Between-study heterogeneity
`gamma0`	Selection intercept
`gamma1`	Precision-dependent selection
`rho`	Effect-dependent selection correlation

For the weight-function model:

Parameter	Interpretation
`mu`	Bias-corrected pooled effect
`tau`	Between-study heterogeneity
`omega_j`	Relative publication probability for p-value category $j$

Limitations

Both models require sufficient studies ( $k \geq 20$ is a practical minimum for the weight-function model; Copas can work with fewer but is sensitive to the prior on $\rho$ ).
The Copas model assumes a specific functional form for the selection mechanism. If the true mechanism differs, the correction can be imprecise.
Weight-function models assume that selection operates exclusively through p-values. Selection based on effect size direction or magnitude (without reference to p-values) is not captured.
Neither model identifies publication bias if the filed studies are completely missing (i.e., the selection mechanism is so strong that no unpublished studies share observable characteristics with the published set).

For a model-averaged approach that includes both PET-PEESE and weight functions as bias components, see Robust Bayesian Meta-Analysis (RoBMA).

References

Copas J, Shi JQ (2000). Meta-analysis, funnel plots and sensitivity analysis. Biostatistics, 1(3), 247–262.

Vevea JL, Hedges LV (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419–435.