Model description
The Vevea-Hedges (1995) selection model adjusts for publication bias by explicitly modelling the probability that a study is published as a function of its -value. Studies in low-significance bins (high -values) are underrepresented in the literature; the model corrects for this by upweighting their contribution to the likelihood.
Mathematical specification
-value bins:
The -value range is divided into intervals . Typical bin boundaries are .
Selection weights:
The weights are constrained to and are typically set so (studies with are always published).
Weighted likelihood:
where is the bin of study ’s -value and is the probability of falling in each bin under the model.
Priors:
Stan code
data {
int<lower=1> N;
int<lower=1> J;
vector[N] y;
vector<lower=0>[N] se;
array[N] int<lower=1> bin; // p-value bin for each study
matrix[N, J] bin_probs; // P(p in bin j | mu, tau) for each study
}
parameters {
real mu;
real<lower=0> tau;
simplex[J] w_raw;
}
transformed parameters {
vector[J] w = w_raw;
w[1] = 1.0;
}
model {
target += normal_lpdf(mu | 0, 1);
target += cauchy_lpdf(tau | 0, 0.5);
for (j in 2:J) {
target += beta_lpdf(w[j] | 1, 1);
}
for (i in 1:N) {
real denom = dot_product(w, bin_probs[i]);
target += log(w[bin[i]]) + normal_lpdf(y[i] | mu, sqrt(square(se[i]) + square(tau)))
- log(denom);
}
}
generated quantities {
real b_Intercept = mu;
}How bayesma calls this model
bayesma(
data,
model_type = "selection_weight",
p_cutoffs = c(0.025, 0.05, 0.10, 0.25, 0.50, 1.0),
selection_priors = list(w2 = beta(1, 1), w3 = beta(1, 1))
)bin_probs is computed internally by bayesma_stan_data() using the normal CDF evaluated at the bin boundaries.
Parameterisation notes
The normalising denominator in the weighted likelihood is the key computational component. It ensures the model correctly accounts for the fact that only published studies are observed.
The parameterisation treats the first bin (most significant) as the reference with . Weaker significance bins have by construction.
Known sampling difficulties
The normalising denominator depends on both and , creating a complex likelihood surface. With many bins and small , the posterior is multimodal. Increasing adapt_delta to 0.99 and using iter_warmup = 2000 is recommended.
