This vignette is a faithful reproduction of Example 1 from Schroeders and Gnambs (2025), “Sample Size Planning for Item Response Models: A Tutorial for the Quantitative Researcher” and its companion R code at https://ulrich-schroeders.github.io/IRT-sample-size/example_1.html. The goal is to let irtsim users compare its Monte Carlo output directly against the published reference.
The paper’s Example 1 asks a classic question: how many examinees are needed to recover item difficulty parameters in a linked, two-form achievement test fit with a Rasch (1PL) model?
| Decision | Paper value | irtsim mapping |
|---|---|---|
| Estimation model | 1PL (Rasch) | estimation_model = "1PL" |
| Number of items | 30 | n_items = 30 |
| Item discriminations (generation) | rnorm(30, 1, 0.1) |
item_params$a |
| Item difficulties | seq(-2, 2, length.out = 30) |
item_params$b |
| Two forms, common block items 13–18 | 2 × 30 linking matrix | missing = "linking" |
| Monte Carlo iterations | 438 | iterations = 438 |
| Sample sizes | seq(100, 600, 50) |
sample_sizes |
| Performance criterion | MSE, threshold 0.05 | summary(res)$item_summary$mse |
Note on the data-generating model: the paper uses a near-constant
discrimination (mean 1, sd 0.1) and fits a Rasch model. irtsim’s 1PL
generation fixes a = 1 exactly, so to match the paper we
generate under a 2PL with a ~ rnorm(30, 1, 0.1) and set
estimation_model = "1PL". The estimation model is the one
the paper targets; the generation model is a faithful implementation of
the paper’s a draws.
The code below mirrors the paper. It is shown for reference; the
actual simulation is precomputed and cached in
inst/extdata/vignette_ex1_paper.rds to keep vignette build
time low.
library(irtsim)
set.seed(2024)
n_items <- 30L
# Item parameters exactly as in the paper
a_vals <- rnorm(n_items, mean = 1, sd = 0.1)
b_vals <- seq(-2, 2, length.out = n_items)
# Linking matrix: form 1 = odd items + common block 13-18,
# form 2 = even items + common block 13-18
linking_matrix <- matrix(0L, nrow = 2L, ncol = n_items)
linking_matrix[1L, sort(unique(c(seq(1L, n_items, 2L), 13:18)))] <- 1L
linking_matrix[2L, sort(unique(c(seq(2L, n_items, 2L), 13:18)))] <- 1L
design <- irt_design(
model = "2PL",
n_items = n_items,
item_params = list(a = a_vals, b = b_vals),
theta_dist = "normal"
)
study <- irt_study(
design,
sample_sizes = seq(100L, 600L, by = 50L),
missing = "linking",
test_design = list(linking_matrix = linking_matrix),
estimation_model = "1PL"
)
res <- irt_simulate(
study,
iterations = 438L,
seed = 2024L,
parallel = TRUE
)We summarize the recovered item-difficulty MSE and pair it with its Monte Carlo standard error (MCSE), following Morris et al. (2019).
s <- summary(res, criterion = c("mse", "mcse_mse"), param = "b")
head(s$item_summary, 10)
#> sample_size item param true_value mse mcse_mse n_converged
#> 1 100 1 b -2.0000000 0.2503685 0.018046678 438
#> 2 100 2 b -1.8620690 0.2138392 0.017983836 438
#> 3 100 3 b -1.7241379 0.1597980 0.011247343 438
#> 4 100 4 b -1.5862069 0.1667535 0.012496766 438
#> 5 100 5 b -1.4482759 0.1862439 0.018703552 438
#> 6 100 6 b -1.3103448 0.1711703 0.012482780 438
#> 7 100 7 b -1.1724138 0.1426810 0.011621785 438
#> 8 100 8 b -1.0344828 0.1378920 0.008631983 438
#> 9 100 9 b -0.8965517 0.1351424 0.009190359 438
#> 10 100 10 b -0.7586207 0.1482162 0.010252653 438The paper plots the MSE trajectory for two representative items: item 1 (difficulty ≈ −2, extreme) and item 15 (difficulty ≈ 0, central). We do the same.
item_df <- s$item_summary
focal <- subset(item_df, item %in% c(1L, 15L))
focal$item_label <- factor(
focal$item,
levels = c(1L, 15L),
labels = c("Item 1 (b \u2248 -2)", "Item 15 (b \u2248 0)")
)
ggplot(focal, aes(x = sample_size, y = mse, colour = item_label)) +
geom_hline(yintercept = 0.05, linetype = "dashed", colour = "grey40") +
geom_line(linewidth = 0.8) +
geom_point(size = 2) +
geom_errorbar(
aes(
ymin = pmax(mse - 1.96 * mcse_mse, 0),
ymax = mse + 1.96 * mcse_mse
),
width = 15
) +
scale_x_continuous(breaks = seq(100, 600, 100)) +
labs(
title = "Example 1: MSE of b-parameter vs. sample size",
subtitle = "Dashed line = paper's 0.05 MSE threshold",
x = "Sample size (N)",
y = "MSE(b)",
colour = NULL
) +
theme_minimal(base_size = 12)Using irtsim’s built-in recommended_n() helper we can
extract the smallest N that meets the paper’s MSE ≤ 0.05 threshold for
each item. The paper reports sample-size requirements in the same
sense.
sim_summary <- summary(res, criterion = "mse", param = "b")
rec <- recommended_n(sim_summary, criterion = "mse", threshold = 0.05, param = "b")
head(rec, 10)
#> item param recommended_n criterion threshold
#> 1 1 b NA mse 0.05
#> 2 2 b 450 mse 0.05
#> 3 3 b 300 mse 0.05
#> 4 4 b 350 mse 0.05
#> 5 5 b 450 mse 0.05
#> 6 6 b 400 mse 0.05
#> 7 7 b 300 mse 0.05
#> 8 8 b 250 mse 0.05
#> 9 9 b 300 mse 0.05
#> 10 10 b 300 mse 0.05Should reproduce (within MC noise):
NA in the
recommended_n column means the criterion was never met —
this is informative, not an error.Expected small numerical differences:
sample(c(1, 2), n, replace = TRUE)).
irtsim’s apply_missing_structured() uses deterministic
round-robin assignment. At N ≥ 100 the induced difference in per-form
sample sizes is small (< 1 examinee gap) and does not materially
shift the MSE trajectories.future.seed = TRUE, which uses L’Ecuyer-CMRG substreams.
The paper uses the session default Mersenne-Twister. Both are valid
Monte Carlo streams; the specific numbers differ. Only trajectory shape
is expected to match, not bit-for-bit numerical equality.irt_iterations()]; users can recompute it if
they want a different MC precision.Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in Medicine, 25, 4279–4292. https://doi.org/10.1002/sim.2673
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38, 2074–2102. https://doi.org/10.1002/sim.8086
Schroeders, U., & Gnambs, T. (2025). Sample size planning for item response models: A tutorial for the quantitative researcher. Companion code: https://ulrich-schroeders.github.io/IRT-sample-size/.