Getting Started with PSpower

Overview

PSpower is a design-stage tool for planning observational and randomized studies that estimate treatment effects using propensity score (PS) weighting. It answers: how large does my study need to be?

Two functions cover the most common outcome types:

Function Outcome Default test
power_ps() Continuous or binary Two-sided
power_cox() Time-to-event (survival) One-sided

This is a prospective planning tool. Computing power after data are collected using the observed sample size (“post-hoc power”) is uninformative and should not be done.


The sample size formula

For all outcome types, the required sample size takes the form

\[N = \left\lceil V \cdot \frac{(z_{1-\alpha/k} + z_\beta)^2}{\delta^2} \right\rceil,\]

where \(\delta\) is the standardized effect size, \(\alpha\) is the significance level, \(\beta\) is the target power, and \(k = 2\) for a two-sided test or \(k = 1\) for a one-sided test. All reported sample sizes are ceiling integers (rounded up to the nearest whole number).

The asymptotic variance \(V\) is what distinguishes the two functions:


Inputs at a glance

Common inputs

Effect size — the standardized treatment effect \(\delta = \tau / \text{SD}(Y)\) for continuous/binary outcomes, or the log hazard ratio \(\tau_0 = \log(\text{HR})\) for survival outcomes. For continuous outcomes, \(|\delta| = 0.2\) is considered small; 0.5 moderate; 0.8 large. For survival, \(\text{HR} = 0.75\) gives \(\tau_0 \approx -0.29\).

Treatment proportion r — the fraction of participants who receive treatment, \(r = \Pr(Z = 1)\). A balanced design uses \(r = 0.5\); in observational data, \(r\) is estimated from the study population.

Estimand — the scientific quantity of interest:

In a randomized trial all four estimands coincide.

Continuous and binary outcomes (power_ps)

Overlap coefficient φ — a number in \((0, 1)\) measuring how similar the PS distributions of the two groups are. \(\phi = 1\) in a randomized trial; \(\phi < 1\) reflects confounding, and sample size increases as \(\phi\) decreases. Rule of thumb: \(\phi \geq 0.95\) good; \([0.90, 0.95)\) moderate; \([0.85, 0.90)\) poor; \(< 0.85\) very poor.

\(\phi\) can be estimated from a pilot study or any dataset with fitted propensity scores using overlap_coef(). When it is not yet known, supplying a vector of plausible values (e.g., \(\phi \in \{0.85, 0.90, 0.95\}\)) allows \(N\) to be computed across the plausible range.

Confounder coefficient ρ² — the squared correlation between the potential outcome \(Y(0)\) and the PS linear predictor. When \(\rho^2 = 0\) (the default), the outcome is unrelated to the confounders beyond the treatment indicator. A sensitivity analysis over \(\rho^2 \in [0, 0.05]\) is recommended; values above 0.05 are uncommon in practice.

Survival outcomes (power_cox)

Event rates d₁ and d₀ — the probability of observing an event during follow-up in the treated group (\(d_1\)) and the control group (\(d_0\)). These can be estimated from registries, pilot data, or published survival curves. If \(d_0\) is omitted it is set equal to \(d_1\).

Study type"rct" for a randomized trial or "obs" for an observational study.

Overlap coefficient φ — required when study_type = "obs"; the same quantity as in power_ps(). See the rule of thumb and estimation guidance under the continuous/binary section above.

Method"robust" (default) uses the sandwich variance, which accounts for the actual hazard ratio; "schoenfeld" is the classical formula derived under a null effect and may overestimate or underestimate the required \(N\) relative to the robust method. Available only for randomized trials.


1. Continuous and binary outcomes: power_ps()

Single scenario

Suppose we plan a study with:

res <- power_ps(
  effect_size = 0.2,
  r           = 0.5,
  phi         = 0.9,
  estimand    = "ATE",
  power       = 0.80
)
res
#> PSpower: power_ps
#> -------------------------------------------------- 
#> Estimand:                  ATE
#> Weights:                   inverse probability weights
#> Effect size:               0.200
#> Treatment proportion:      0.500
#> Overlap (phi):             0.900
#> Confounding (rho2):        0.000  [default]
#> -------------------------------------------------- 
#> Required sample size: 1058
#> Target power: 0.80  |  sig. level: 0.050  |  two-sided test

Multiple configurations

Vectors can be supplied to any input; all combinations are computed automatically.

Effect of treatment allocation

The required sample size depends on how evenly participants are split between treatment and control. The following evaluates five values of \(r\) across three estimands (\(5 \times 3 = 15\) scenarios):

res_r <- power_ps(
  effect_size = 0.2,
  r           = c(0.30, 0.40, 0.50, 0.60, 0.70),
  phi         = 0.90,
  estimand    = c("ATE", "ATT", "ATO"),
  power       = 0.80
)
plot(res_r)

\(N\) is minimized at \(r = 0.5\) for ATE; the ATT and ATC estimands are not symmetric in \(r\) — ATT favors a larger treated fraction, and vice versa for ATC.

Effect of covariate overlap

res_phi <- power_ps(
  effect_size = 0.2,
  r           = 0.5,
  phi         = c(0.85, 0.90, 0.95),
  estimand    = c("ATE", "ATT", "ATO"),
  power       = 0.80
)
plot(res_phi)

summary(res_phi)
#> PSpower: power_ps
#> ------------------------------------------------------- 
#> Scenarios: 9
#> Varying:  phi, estimand 
#> Fixed:    effect_size = 0.2,  r = 0.5,  rho2 = 0 [default] 
#> Target power: 0.80  |  sig. level: 0.050  |  two-sided test
#> ------------------------------------------------------- 
#> 
#> Sample size (N, ceiling integer) distribution:
#> 
#>           sample_size   phi           estimand       
#>   Min:    867           0.95          ATO           
#>   25%:    958           0.9           ATO           
#>   50%:    1058          0.9           ATE           
#>   75%:    1330          0.9           ATT           
#>   Max:    1978          0.85          ATT

The ATO estimand consistently requires fewer participants than ATE, especially under poor overlap.

Role of the confounder coefficient ρ²

The table below shows how \(N\) changes across \(\rho^2 \in [0, 0.05]\) for ATE with \(\phi = 0.90\). A sensitivity analysis over this range is recommended; values above 0.05 are uncommon in practice.

power_ps(
  effect_size = 0.2,
  r           = 0.5,
  phi         = 0.90,
  rho2        = c(0, 0.01, 0.02, 0.05),
  estimand    = "ATE",
  power       = 0.80
)
#> PSpower: power_ps
#> -------------------------------------------------- 
#> Scenarios: 4
#> Varying:  rho2 
#> Fixed:    effect_size = 0.2,  r = 0.5,  phi = 0.9,  estimand = ATE 
#> Target power: 0.80  |  sig. level: 0.050  |  two-sided test 
#> -------------------------------------------------- 
#>  rho2 sample_size
#>  0.00        1058
#>  0.01        1065
#>  0.02        1072
#>  0.05        1093

An upper bound on \(\rho^2\) can be obtained from the \(R^2\) of a regression of the outcome on the study covariates using historical data.


2. Survival outcomes: power_cox()

Randomized trial

Single scenario

res_rct <- power_cox(
  effect_size = log(0.75),   # HR = 0.75
  r           = 0.5,
  d1          = 0.40,
  study_type  = "rct",
  power       = 0.80
)
res_rct
#> PSpower: power_cox
#> ------------------------------------------------------- 
#> Study type:                    Randomized trial
#> Method:                        Robust sandwich variance
#> Effect size (log hazard ratio): -0.288  [hazard ratio = 0.750]
#> Treatment proportion:          0.500
#> Event rate (group 1):          0.400
#> Event rate (group 0):          0.400  [= group 1]
#> ------------------------------------------------------- 
#> Required sample size: 795
#> Target power: 0.80  |  sig. level: 0.050  |  one-sided test

The default method uses the robust sandwich variance, which accounts for the actual hazard ratio. The Schoenfeld formula (derived under a null effect) does not correct for the effect size and may give a larger or smaller \(N\):

power_cox(
  effect_size = log(0.75),
  r           = 0.5,
  d1          = 0.40,
  study_type  = "rct",
  method      = "schoenfeld",
  power       = 0.80
)
#> PSpower: power_cox
#> ------------------------------------------------------- 
#> Study type:                    Randomized trial
#> Method:                        Schoenfeld formula
#> Effect size (log hazard ratio): -0.288  [hazard ratio = 0.750]
#> Treatment proportion:          0.500
#> Event rate (group 1):          0.400
#> Event rate (group 0):          0.400  [= group 1]
#> ------------------------------------------------------- 
#> Required sample size: 748
#> Target power: 0.80  |  sig. level: 0.050  |  one-sided test

Multiple configurations

The following evaluates three hazard ratios and three treatment proportions (\(3 \times 3 = 9\) scenarios):

res_rct_grid <- power_cox(
  effect_size = c(log(0.65), log(0.75), log(0.85)),
  r           = c(0.40, 0.50, 0.60),
  d1          = 0.40,
  study_type  = "rct",
  power       = 0.80
)
plot(res_rct_grid)

summary(res_rct_grid)
#> PSpower: power_cox
#> ------------------------------------------------------------ 
#> Scenarios: 9
#> Varying:  effect_size, r 
#> Fixed:    d1 = 0.4,  d0 = 0.4 [= group 1],  phi = NA,  study_type = rct,  estimand = ATE,  method = robust 
#> Target power: 0.80  |  sig. level: 0.050  |  one-sided test
#> ------------------------------------------------------------ 
#> 
#> Sample size (N, ceiling integer) distribution:
#> 
#>           sample_size   effect_size   r              
#>   Min:    336           -0.431        0.6           
#>   25%:    468           -0.431        0.4           
#>   50%:    795           -0.288        0.5           
#>   75%:    2330          -0.163        0.6           
#>   Max:    2651          -0.163        0.4

Observational study

For observational studies, the overlap coefficient \(\phi\) must also be provided. The ATE estimand uses inverse probability weights; ATO uses overlap weights and is more efficient when overlap is limited.

Single scenario

res_obs <- power_cox(
  effect_size = log(0.75),
  r           = 0.5,
  d1          = 0.40,
  phi         = 0.90,
  study_type  = "obs",
  estimand    = "ATE",
  power       = 0.80
)
res_obs
#> PSpower: power_cox
#> ------------------------------------------------------- 
#> Study type:                    Observational study
#> Method:                        Robust sandwich variance
#> Estimand:                      ATE
#> Weights:                       inverse probability weights
#> Effect size (log hazard ratio): -0.288  [hazard ratio = 0.750]
#> Treatment proportion:          0.500
#> Event rate (group 1):          0.400
#> Event rate (group 0):          0.400  [= group 1]
#> Overlap (phi):                 0.900
#> ------------------------------------------------------- 
#> Required sample size: 1088
#> Target power: 0.80  |  sig. level: 0.050  |  one-sided test

Multiple configurations

res_obs_grid <- power_cox(
  effect_size = log(0.75),
  r           = 0.5,
  d1          = 0.40,
  phi         = c(0.85, 0.90, 0.95),
  study_type  = "obs",
  estimand    = c("ATE", "ATO"),
  power       = 0.80,
  n_mc        = 5e4    # use the default 1e6 in practice
)
plot(res_obs_grid)

summary(res_obs_grid)
#> PSpower: power_cox
#> ------------------------------------------------------------ 
#> Scenarios: 6
#> Varying:  phi, estimand 
#> Fixed:    effect_size = -0.288,  r = 0.5,  d1 = 0.4,  d0 = 0.4 [= group 1],  study_type = obs,  method = robust 
#> Target power: 0.80  |  sig. level: 0.050  |  one-sided test
#> ------------------------------------------------------------ 
#> 
#> Sample size (N, ceiling integer) distribution:
#> 
#>           sample_size   phi           estimand       
#>   Min:    877           0.95          ATO           
#>   25%:    898           0.95          ATE           
#>   50%:    1058          0.85          ATO           
#>   75%:    1088          0.9           ATE           
#>   Max:    1569          0.85          ATE

3. Estimating propensity score overlap from data

If a pilot dataset or a similar published cohort is available, \(\phi\) can be estimated directly from fitted propensity scores using overlap_coef().

set.seed(2026)
n  <- 500
X  <- rnorm(n)
ps <- plogis(0.5 * X)   # fitted propensity scores from a PS model
Z  <- rbinom(n, 1, ps)

overlap_coef(ps = ps, Z = Z)
#> $phi
#> [1] 0.9696848
#> 
#> $r
#> [1] 0.49

When no data are available, use the rule of thumb above and compute \(N\) across the plausible range (e.g., \(\phi \in \{0.85, 0.90, 0.95\}\)), as shown in the examples above.

overlap_coef() also accepts Beta distribution parameters directly, which is useful when the PS distribution has been summarized in a published paper:

overlap_coef(a = 3, b = 2)   # r = a/(a+b) = 0.60
#> $phi
#> [1] 0.9017928
#> 
#> $r
#> [1] 0.6