PSpower is a design-stage tool for planning observational and randomized studies that estimate treatment effects using propensity score (PS) weighting. It answers: how large does my study need to be?
Two functions cover the most common outcome types:
| Function | Outcome | Default test |
|---|---|---|
power_ps() |
Continuous or binary | Two-sided |
power_cox() |
Time-to-event (survival) | One-sided |
This is a prospective planning tool. Computing power after data are collected using the observed sample size (“post-hoc power”) is uninformative and should not be done.
For all outcome types, the required sample size takes the form
\[N = \left\lceil V \cdot \frac{(z_{1-\alpha/k} + z_\beta)^2}{\delta^2} \right\rceil,\]
where \(\delta\) is the standardized effect size, \(\alpha\) is the significance level, \(\beta\) is the target power, and \(k = 2\) for a two-sided test or \(k = 1\) for a one-sided test. All reported sample sizes are ceiling integers (rounded up to the nearest whole number).
The asymptotic variance \(V\) is what distinguishes the two functions:
power_ps): \(V\) is the variance of the Hájek estimator,
determined by \(r\), \(\phi\), \(\rho^2\), and the estimand.power_cox): \(V\) is the variance of the PS-weighted
partial likelihood estimator, determined by \(r\), \(d_1\), \(d_0\), \(\tau_0 =
\log(\text{HR})\), and (for observational studies) \(\phi\) and the estimand.Common inputs
Effect size — the standardized treatment effect \(\delta = \tau / \text{SD}(Y)\) for continuous/binary outcomes, or the log hazard ratio \(\tau_0 = \log(\text{HR})\) for survival outcomes. For continuous outcomes, \(|\delta| = 0.2\) is considered small; 0.5 moderate; 0.8 large. For survival, \(\text{HR} = 0.75\) gives \(\tau_0 \approx -0.29\).
Treatment proportion r — the fraction of participants who receive treatment, \(r = \Pr(Z = 1)\). A balanced design uses \(r = 0.5\); in observational data, \(r\) is estimated from the study population.
Estimand — the scientific quantity of interest:
power_ps(); for power_cox(),
re-label groups and use estimand = "ATT".In a randomized trial all four estimands coincide.
Continuous and binary outcomes
(power_ps)
Overlap coefficient φ — a number in \((0, 1)\) measuring how similar the PS distributions of the two groups are. \(\phi = 1\) in a randomized trial; \(\phi < 1\) reflects confounding, and sample size increases as \(\phi\) decreases. Rule of thumb: \(\phi \geq 0.95\) good; \([0.90, 0.95)\) moderate; \([0.85, 0.90)\) poor; \(< 0.85\) very poor.
\(\phi\) can be estimated from a
pilot study or any dataset with fitted propensity scores using
overlap_coef(). When it is not yet known, supplying a
vector of plausible values (e.g., \(\phi \in
\{0.85, 0.90, 0.95\}\)) allows \(N\) to be computed across the plausible
range.
Confounder coefficient ρ² — the squared correlation between the potential outcome \(Y(0)\) and the PS linear predictor. When \(\rho^2 = 0\) (the default), the outcome is unrelated to the confounders beyond the treatment indicator. A sensitivity analysis over \(\rho^2 \in [0, 0.05]\) is recommended; values above 0.05 are uncommon in practice.
Survival outcomes (power_cox)
Event rates d₁ and d₀ — the probability of observing an event during follow-up in the treated group (\(d_1\)) and the control group (\(d_0\)). These can be estimated from registries, pilot data, or published survival curves. If \(d_0\) is omitted it is set equal to \(d_1\).
Study type — "rct" for a randomized trial or
"obs" for an observational study.
Overlap coefficient φ — required when
study_type = "obs"; the same quantity as in
power_ps(). See the rule of thumb and estimation guidance
under the continuous/binary section above.
Method — "robust" (default) uses the sandwich
variance, which accounts for the actual hazard ratio;
"schoenfeld" is the classical formula derived under a null
effect and may overestimate or underestimate the required \(N\) relative to the robust method.
Available only for randomized trials.
power_ps()Suppose we plan a study with:
res <- power_ps(
effect_size = 0.2,
r = 0.5,
phi = 0.9,
estimand = "ATE",
power = 0.80
)
res
#> PSpower: power_ps
#> --------------------------------------------------
#> Estimand: ATE
#> Weights: inverse probability weights
#> Effect size: 0.200
#> Treatment proportion: 0.500
#> Overlap (phi): 0.900
#> Confounding (rho2): 0.000 [default]
#> --------------------------------------------------
#> Required sample size: 1058
#> Target power: 0.80 | sig. level: 0.050 | two-sided testVectors can be supplied to any input; all combinations are computed automatically.
Effect of treatment allocation
The required sample size depends on how evenly participants are split between treatment and control. The following evaluates five values of \(r\) across three estimands (\(5 \times 3 = 15\) scenarios):
res_r <- power_ps(
effect_size = 0.2,
r = c(0.30, 0.40, 0.50, 0.60, 0.70),
phi = 0.90,
estimand = c("ATE", "ATT", "ATO"),
power = 0.80
)
plot(res_r)\(N\) is minimized at \(r = 0.5\) for ATE; the ATT and ATC estimands are not symmetric in \(r\) — ATT favors a larger treated fraction, and vice versa for ATC.
Effect of covariate overlap
res_phi <- power_ps(
effect_size = 0.2,
r = 0.5,
phi = c(0.85, 0.90, 0.95),
estimand = c("ATE", "ATT", "ATO"),
power = 0.80
)
plot(res_phi)summary(res_phi)
#> PSpower: power_ps
#> -------------------------------------------------------
#> Scenarios: 9
#> Varying: phi, estimand
#> Fixed: effect_size = 0.2, r = 0.5, rho2 = 0 [default]
#> Target power: 0.80 | sig. level: 0.050 | two-sided test
#> -------------------------------------------------------
#>
#> Sample size (N, ceiling integer) distribution:
#>
#> sample_size phi estimand
#> Min: 867 0.95 ATO
#> 25%: 958 0.9 ATO
#> 50%: 1058 0.9 ATE
#> 75%: 1330 0.9 ATT
#> Max: 1978 0.85 ATTThe ATO estimand consistently requires fewer participants than ATE, especially under poor overlap.
The table below shows how \(N\) changes across \(\rho^2 \in [0, 0.05]\) for ATE with \(\phi = 0.90\). A sensitivity analysis over this range is recommended; values above 0.05 are uncommon in practice.
power_ps(
effect_size = 0.2,
r = 0.5,
phi = 0.90,
rho2 = c(0, 0.01, 0.02, 0.05),
estimand = "ATE",
power = 0.80
)
#> PSpower: power_ps
#> --------------------------------------------------
#> Scenarios: 4
#> Varying: rho2
#> Fixed: effect_size = 0.2, r = 0.5, phi = 0.9, estimand = ATE
#> Target power: 0.80 | sig. level: 0.050 | two-sided test
#> --------------------------------------------------
#> rho2 sample_size
#> 0.00 1058
#> 0.01 1065
#> 0.02 1072
#> 0.05 1093An upper bound on \(\rho^2\) can be obtained from the \(R^2\) of a regression of the outcome on the study covariates using historical data.
power_cox()res_rct <- power_cox(
effect_size = log(0.75), # HR = 0.75
r = 0.5,
d1 = 0.40,
study_type = "rct",
power = 0.80
)
res_rct
#> PSpower: power_cox
#> -------------------------------------------------------
#> Study type: Randomized trial
#> Method: Robust sandwich variance
#> Effect size (log hazard ratio): -0.288 [hazard ratio = 0.750]
#> Treatment proportion: 0.500
#> Event rate (group 1): 0.400
#> Event rate (group 0): 0.400 [= group 1]
#> -------------------------------------------------------
#> Required sample size: 795
#> Target power: 0.80 | sig. level: 0.050 | one-sided testThe default method uses the robust sandwich variance, which accounts for the actual hazard ratio. The Schoenfeld formula (derived under a null effect) does not correct for the effect size and may give a larger or smaller \(N\):
power_cox(
effect_size = log(0.75),
r = 0.5,
d1 = 0.40,
study_type = "rct",
method = "schoenfeld",
power = 0.80
)
#> PSpower: power_cox
#> -------------------------------------------------------
#> Study type: Randomized trial
#> Method: Schoenfeld formula
#> Effect size (log hazard ratio): -0.288 [hazard ratio = 0.750]
#> Treatment proportion: 0.500
#> Event rate (group 1): 0.400
#> Event rate (group 0): 0.400 [= group 1]
#> -------------------------------------------------------
#> Required sample size: 748
#> Target power: 0.80 | sig. level: 0.050 | one-sided testThe following evaluates three hazard ratios and three treatment proportions (\(3 \times 3 = 9\) scenarios):
res_rct_grid <- power_cox(
effect_size = c(log(0.65), log(0.75), log(0.85)),
r = c(0.40, 0.50, 0.60),
d1 = 0.40,
study_type = "rct",
power = 0.80
)
plot(res_rct_grid)summary(res_rct_grid)
#> PSpower: power_cox
#> ------------------------------------------------------------
#> Scenarios: 9
#> Varying: effect_size, r
#> Fixed: d1 = 0.4, d0 = 0.4 [= group 1], phi = NA, study_type = rct, estimand = ATE, method = robust
#> Target power: 0.80 | sig. level: 0.050 | one-sided test
#> ------------------------------------------------------------
#>
#> Sample size (N, ceiling integer) distribution:
#>
#> sample_size effect_size r
#> Min: 336 -0.431 0.6
#> 25%: 468 -0.431 0.4
#> 50%: 795 -0.288 0.5
#> 75%: 2330 -0.163 0.6
#> Max: 2651 -0.163 0.4For observational studies, the overlap coefficient \(\phi\) must also be provided. The ATE estimand uses inverse probability weights; ATO uses overlap weights and is more efficient when overlap is limited.
res_obs <- power_cox(
effect_size = log(0.75),
r = 0.5,
d1 = 0.40,
phi = 0.90,
study_type = "obs",
estimand = "ATE",
power = 0.80
)
res_obs
#> PSpower: power_cox
#> -------------------------------------------------------
#> Study type: Observational study
#> Method: Robust sandwich variance
#> Estimand: ATE
#> Weights: inverse probability weights
#> Effect size (log hazard ratio): -0.288 [hazard ratio = 0.750]
#> Treatment proportion: 0.500
#> Event rate (group 1): 0.400
#> Event rate (group 0): 0.400 [= group 1]
#> Overlap (phi): 0.900
#> -------------------------------------------------------
#> Required sample size: 1088
#> Target power: 0.80 | sig. level: 0.050 | one-sided testres_obs_grid <- power_cox(
effect_size = log(0.75),
r = 0.5,
d1 = 0.40,
phi = c(0.85, 0.90, 0.95),
study_type = "obs",
estimand = c("ATE", "ATO"),
power = 0.80,
n_mc = 5e4 # use the default 1e6 in practice
)
plot(res_obs_grid)summary(res_obs_grid)
#> PSpower: power_cox
#> ------------------------------------------------------------
#> Scenarios: 6
#> Varying: phi, estimand
#> Fixed: effect_size = -0.288, r = 0.5, d1 = 0.4, d0 = 0.4 [= group 1], study_type = obs, method = robust
#> Target power: 0.80 | sig. level: 0.050 | one-sided test
#> ------------------------------------------------------------
#>
#> Sample size (N, ceiling integer) distribution:
#>
#> sample_size phi estimand
#> Min: 877 0.95 ATO
#> 25%: 898 0.95 ATE
#> 50%: 1058 0.85 ATO
#> 75%: 1088 0.9 ATE
#> Max: 1569 0.85 ATEIf a pilot dataset or a similar published cohort is available, \(\phi\) can be estimated directly from
fitted propensity scores using overlap_coef().
set.seed(2026)
n <- 500
X <- rnorm(n)
ps <- plogis(0.5 * X) # fitted propensity scores from a PS model
Z <- rbinom(n, 1, ps)
overlap_coef(ps = ps, Z = Z)
#> $phi
#> [1] 0.9696848
#>
#> $r
#> [1] 0.49When no data are available, use the rule of thumb above and compute \(N\) across the plausible range (e.g., \(\phi \in \{0.85, 0.90, 0.95\}\)), as shown in the examples above.
overlap_coef() also accepts Beta distribution parameters
directly, which is useful when the PS distribution has been summarized
in a published paper: