---
title: "Validating predictInterval() against brms"
author: "Jared Knowles"
date: "Written 2026-05-29 (last updated 2026-05-29)"
output: rmarkdown::html_vignette
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Validating predictInterval() against brms}
%\VignetteEncoding{UTF-8}
---
## Why compare to brms?
`merTools::predictInterval()` was written to put prediction intervals on
mixed-effect models that are too large or too slow to bootstrap. It does this by
simulating from the estimated distribution of the fixed and random effects --
fast, but an approximation. Since then, [brms](https://paulbuerkner.com/brms/)
has become a standard for *fully* Bayesian multilevel modeling, and its
posterior predictive distribution is about as close to a gold standard as we
have. This vignette asks a simple question: **how close are
`predictInterval()`'s intervals to the ones brms produces?**
We look at three things: a Gaussian random-slopes model, what happens for an
entirely unobserved group, and a binomial GLMM. This vignette is *precompiled* --
reproducing it requires the `brms` package and a working Stan toolchain.
## A random-slopes model
We use the bundled `hsb` data (7,185 students in 160 schools) and model math
achievement from student SES, school-mean SES, and a school-varying SES slope.
We hold out 20% of students for testing, plus six whole schools to stand in for
"new groups" later.
``` r
data(hsb)
hsb$schid <- factor(hsb$schid)
new_schools <- sample(levels(hsb$schid), 6)
hsb$.set <- "train"
hsb$.set[hsb$schid %in% new_schools] <- "test_new"
seen <- which(hsb$.set == "train")
hsb$.set[sample(seen, round(0.2 * length(seen)))] <- "test_seen"
train <- droplevels(hsb[hsb$.set == "train", ])
test_seen <- hsb[hsb$.set == "test_seen" & hsb$schid %in% levels(train$schid), ]
test_new <- hsb[hsb$.set == "test_new", ]
f1 <- mathach ~ ses + meanses + (ses | schid)
m_lme <- lmer(f1, data = train)
brms_fit_seconds <- system.time(
m_brm <- fit_brm(f1, train, gaussian(), "brms_hsb_meanses"))[["elapsed"]]
```
We draw a predictive distribution for each held-out student from each method --
`predictInterval(returnSims = TRUE)` and `brms::posterior_predict()` -- and
compute everything (point estimates, intervals, coverage) from those draws.
``` r
mt_seen <- mt_draws(m_lme, test_seen)
pp_seen <- posterior_predict(m_brm, newdata = test_seen, allow_new_levels = TRUE)
```
### Point estimates are essentially identical
Both methods condition on the same estimated fixed effects and school BLUPs, so
the conditional-mean predictions should agree almost exactly -- and they do.
``` r
lme_mean <- predict(m_lme, newdata = test_seen)
brm_mean <- colMeans(posterior_epred(m_brm, newdata = test_seen, allow_new_levels = TRUE))
c(correlation = cor(lme_mean, brm_mean),
mean_abs_diff = mean(abs(lme_mean - brm_mean)),
response_sd = sd(test_seen$mathach))
```
```
#> correlation mean_abs_diff response_sd
#> 0.99984045 0.03670134 6.71745425
```
``` r
ggplot(data.frame(merTools = lme_mean, brms = brm_mean), aes(merTools, brms)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "grey50") +
geom_point(alpha = 0.3, size = 0.8, color = "#1B9E77") + coord_equal() +
labs(x = "lme4 predict()", y = "brms posterior_epred()") +
theme_minimal(base_size = 12)
```
### The prediction intervals agree
``` r
b_mt <- qband(mt_seen, 0.90); b_brm <- qband(pp_seen, 0.90)
c(width_merTools = median(b_mt$upr - b_mt$lwr),
width_brms = median(b_brm$upr - b_brm$lwr),
cor_lower = cor(b_mt$lwr, b_brm$lwr),
cor_upper = cor(b_mt$upr, b_brm$upr))
```
```
#> width_merTools width_brms cor_lower cor_upper
#> 20.1789145 20.2344075 0.9901934 0.9899334
```
``` r
endpts <- rbind(data.frame(bound = "lower", merTools = b_mt$lwr, brms = b_brm$lwr),
data.frame(bound = "upper", merTools = b_mt$upr, brms = b_brm$upr))
ggplot(endpts, aes(merTools, brms, color = bound)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "grey50") +
geom_point(alpha = 0.3, size = 0.8) + coord_equal() +
labs(title = "90% prediction-interval endpoints", x = "merTools", y = "brms") +
theme_minimal(base_size = 12)
```
### And they are equally well calibrated
The real test is out-of-sample: do nominal intervals cover held-out scores at
the nominal rate? Both methods track the diagonal, and each other.
``` r
data.frame(nominal = NOMINAL,
merTools = coverage(mt_seen, test_seen$mathach, NOMINAL),
brms = coverage(pp_seen, test_seen$mathach, NOMINAL))
```
```
#> nominal merTools brms
#> 1 0.50 0.4569776 0.4598698
#> 2 0.80 0.7895879 0.7932032
#> 3 0.90 0.9168474 0.9168474
#> 4 0.95 0.9660159 0.9703543
```
### At a fraction of the cost
``` r
t_lme <- system.time(lmer(f1, data = train))[["elapsed"]]
t_mt <- system.time(mt_draws(m_lme, test_seen))[["elapsed"]]
t_pp <- system.time(posterior_predict(m_brm, newdata = test_seen,
allow_new_levels = TRUE))[["elapsed"]]
data.frame( # brms_fit_seconds was captured when the model was fit, above
step = c("lme4 fit", "predictInterval", "brms fit (compile+sample)", "posterior_predict"),
seconds = round(c(t_lme, t_mt, brms_fit_seconds, t_pp), 2))
```
```
#> step seconds
#> 1 lme4 fit 0.05
#> 2 predictInterval 0.53
#> 3 brms fit (compile+sample) 398.97
#> 4 posterior_predict 1.52
```
For this model, the entire lme4 + `predictInterval()` path runs in about a
second, while a single brms fit takes several minutes -- a couple of orders of
magnitude difference, before the (much larger) models `predictInterval()` was
really built for.
## What about an entirely new group?
For a school that was never seen during fitting there is a genuine modeling
choice: do you *drop* the school's effect and predict from the fixed effects
only, or *sample* a plausible effect for it from the estimated school-level
distribution? Both packages support either, and the comparison is only fair when
the choices match:
| treatment of an unseen group | `predictInterval()` | brms |
|---|---|---|
| drop the effect (population prediction) | `new.levels = "zero"` *(default)* | `re_formula = NA` |
| sample a new effect | `new.levels = "draw"` | `allow_new_levels = TRUE` *(default)* |
Matched up, the two tools agree on both the width and the coverage of the
new-school intervals:
``` r
nd <- test_new
pp_pop <- posterior_predict(m_brm, newdata = nd, re_formula = NA)
pp_draw <- posterior_predict(m_brm, newdata = nd, allow_new_levels = TRUE)
mt_zero <- mt_draws(m_lme, nd, new.levels = "zero")
mt_draw <- mt_draws(m_lme, nd, new.levels = "draw")
w <- function(d) median(qband(d, 0.90)$upr - qband(d, 0.90)$lwr)
data.frame(
unseen_group = c("drop effect", "sample effect"),
merTools_width = c(w(mt_zero), w(mt_draw)),
brms_width = c(w(pp_pop), w(pp_draw)),
merTools_cov90 = c(coverage(mt_zero, nd$mathach, 0.90), coverage(mt_draw, nd$mathach, 0.90)),
brms_cov90 = c(coverage(pp_pop, nd$mathach, 0.90), coverage(pp_draw, nd$mathach, 0.90)))
```
```
#> unseen_group merTools_width brms_width merTools_cov90 brms_cov90
#> 1 drop effect 19.96801 20.02855 0.8708487 0.8745387
#> 2 sample effect 20.67910 20.76924 0.8892989 0.8856089
```
So the interval width for an unseen group is set by the *choice*, not the method.
Earlier I compared `predictInterval()`'s default against brms's default, which is
an apples-to-oranges pairing -- the apparent "gap" was the two packages making
different default assumptions, not a deficiency in either.
How much the choice matters depends on how much group variance is left after the
fixed effects. Here `meanses` already absorbs most of the between-school
variation, so dropping vs. sampling the school effect barely changes the width.
Remove `meanses` and the school SD roughly doubles, so the two options diverge:
``` r
m0 <- lmer(mathach ~ ses + (ses | schid), data = train)
data.frame(
model = c("with meanses", "without meanses"),
school_SD = c(attr(VarCorr(m_lme)$schid, "stddev")[["(Intercept)"]],
attr(VarCorr(m0)$schid, "stddev")[["(Intercept)"]]),
width_zero = c(w(mt_draws(m_lme, nd, new.levels = "zero")),
w(mt_draws(m0, nd, new.levels = "zero"))),
width_draw = c(w(mt_draws(m_lme, nd, new.levels = "draw")),
w(mt_draws(m0, nd, new.levels = "draw"))))
```
```
#> model school_SD width_zero width_draw
#> 1 with meanses 1.598503 19.96801 20.67910
#> 2 without meanses 2.125770 19.94993 21.22576
```
The practical guidance: `predictInterval()`'s default (`new.levels = "zero"`)
answers "what is the expected outcome for a typical member of an unseen group,"
while `new.levels = "draw"` answers "what is the outcome for an unseen group
drawn from the population" -- the same quantity brms returns by default. Pick the
one that matches your question.
## A generalized linear mixed model
The story carries over to GLMMs. We fit a binomial model for whether grouse had
ticks and compare predicted probabilities -- `predictInterval(type =
"probability")` against `brms::posterior_epred()`.
``` r
data(grouseticks)
grouseticks$TICKS_BIN <- as.integer(grouseticks$TICKS >= 1)
grouseticks$cHEIGHT <- as.numeric(scale(grouseticks$HEIGHT))
gk <- grouseticks
gk$.set <- ifelse(runif(nrow(gk)) < 0.2, "test", "train")
gk_tr <- droplevels(gk[gk$.set == "train", ])
gk_te <- gk[gk$.set == "test" & gk$BROOD %in% levels(gk_tr$BROOD) &
gk$LOCATION %in% unique(gk_tr$LOCATION), ]
gf <- TICKS_BIN ~ YEAR + cHEIGHT + (1 | BROOD) + (1 | LOCATION)
g_lme <- glmer(gf, family = binomial, data = gk_tr,
control = glmerControl(optimizer = "bobyqa"))
g_brm <- fit_brm(gf, gk_tr, bernoulli(), "brms_grouseticks")
mt_pr <- mt_draws(g_lme, gk_te, type = "probability", include.resid.var = FALSE)
if (max(mt_pr) > 1 || min(mt_pr) < 0) mt_pr <- plogis(mt_pr)
brm_pr <- posterior_epred(g_brm, newdata = gk_te, allow_new_levels = TRUE)
mt_p <- apply(mt_pr, 2, median); brm_p <- apply(brm_pr, 2, median)
c(correlation = cor(mt_p, brm_p), mean_abs_diff = mean(abs(mt_p - brm_p)))
```
```
#> correlation mean_abs_diff
#> 0.99488587 0.02704796
```
``` r
ggplot(data.frame(merTools = mt_p, brms = brm_p), aes(merTools, brms)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "grey50") +
geom_point(alpha = 0.4, color = "#7570B3") + coord_equal(xlim = c(0, 1), ylim = c(0, 1)) +
labs(title = "Predicted P(tick): merTools vs brms",
x = "predictInterval(type = 'probability')", y = "posterior_epred()") +
theme_minimal(base_size = 12)
```
## Takeaways
- For **observed groups**, `predictInterval()` reproduces brms's point estimates
and prediction intervals almost exactly, with the same out-of-sample coverage,
at a tiny fraction of the computational cost.
- For **new groups**, you choose how to treat the unseen effect:
`new.levels = "zero"` (the default) drops it and predicts from the fixed
effects, while `new.levels = "draw"` samples it from the estimated covariance
-- matching brms's `re_formula = NA` and `allow_new_levels = TRUE`
respectively. Either way the two packages agree; pick the convention that fits
your question.
- The same agreement holds for **GLMMs** on the probability scale.
In short, when bootstrapping or full MCMC is impractical, `predictInterval()` is
a fast, well-calibrated stand-in for the cases it was designed to handle.