---
title: "Package MKpower"
author: "Matthias Kohl"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
bibliography: MKpower.bib
vignette: >
  %\VignetteIndexEntry{Package MKpower}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{utf8}
---


## Introduction
The package includes functions for power analysis and sample size calculation 
for Welch and Hsu [@Hedderich2018] t tests including Monte-Carlo simulations 
of empirical power and type-I-error. In addition, power and sample size calculation 
for Wilcoxon rank sum and signed rank tests via Monte-Carlo simulations can 
be performed. Moreover, power and sample size required for the evaluation of a 
diagnostic test(-system) [@Flahault2005; @Dobbin2007] as well as for a single 
proportion [@Fleiss2003], comparing two negative binomial rates [@Zhu2014], 
ANCOVA [@Shieh2020], reference ranges [@Jennen2005], and multiple primary 
endpoints [@Sozu2015].

We first load the package.
```{r}
library(MKpower)
```


## Single Proportion
The computation of the required sample size for investigating a single proportion
can be determined via the respective test or confidence interval [@Fleiss2003]. 
First, we consider the asymptotic test.
```{r}
## with continuity correction
power.prop1.test(p1 = 0.4, power = 0.8)
## without continuity correction
power.prop1.test(p1 = 0.4, power = 0.8, cont.corr = FALSE)
```

Next, we compute the sample size via the respective asymptotic confidence interval.
```{r}
## with continuity correction
ssize.prop.ci(prop = 0.4, width = 0.14)
## without continuity correction
ssize.prop.ci(prop = 0.4, width = 0.14, method = "wald")
```


## Welch Two-Sample t-Test
For computing the sample size of the Welch t-test, we only consider the situation 
of equal group size (balanced design).
```{r}
## identical results as power.t.test, since sd = sd1 = sd2 = 1
power.welch.t.test(n = 20, delta = 1)
power.welch.t.test(power = .90, delta = 1)
power.welch.t.test(power = .90, delta = 1, alternative = "one.sided")

## sd1 = 0.5, sd2 = 1
power.welch.t.test(delta = 1, sd1 = 0.5, sd2 = 1, power = 0.9)
```


## Hsu Two-Sample t-Test
For computing the sample size of the Hsu t-test [@Hedderich2018], we only consider 
the situation of equal group size (balanced design).
```{r}
## slightly more conservative than Welch t-test
power.hsu.t.test(n = 20, delta = 1)
power.hsu.t.test(power = .90, delta = 1)
power.hsu.t.test(power = .90, delta = 1, alternative = "one.sided")

## sd1 = 0.5, sd2 = 1
power.welch.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9)
power.hsu.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9)
```


## ANCOVA
With function power.ancova one can compute power and sample size in ANCOVA 
designs [@Shieh2020]. The default matrix of contrasts used in the function is
```{r}
## 3 groups
cbind(rep(1,2), -diag(2))
## 4 groups
cbind(rep(1,3), -diag(3))
```

We use the example provided in Table 9.7 of @Maxwell2004.
```{r}
## Example from Maxwell and Delaney (2004) according to Shieh (2020)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8)
power.ancova(n = rep(45/3, 3), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.9)
power.ancova(n = rep(57/3, 3), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898)
```

Based on the reported adjusted group means and error variance, a (total) sample size
of 45 is required to achieve a power of at least 80%. The calculated power is 
82.2%. With a sample size of 57 the power will be at least 90%, where the
calculated power is 91.2%.

Introducing additional covariates (random covariate model) will increase the required 
sample size.
```{r}
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 2)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 3)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 5)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 10)
```

We can also calculate the per group sample sizes for an imbalanced design.
```{r}
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8, 
             group.ratio = c(1, 1.25, 1.5))
power.ancova(n = c(13, 16, 19), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898,  
             group.ratio = c(1, 1.25, 1.5))
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8, 
             group.ratio = c(1, 0.8, 2/3))
power.ancova(n = c(17, 14, 12), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898,  
             group.ratio = c(1, 0.8, 2/3))
```

We get a smaller total sample size with an imbalanced design (43 instead of 45).


## Wilcoxon Rank Sum and Signed Rank Tests
For computing the sample size of these tests, we offer a function that 
performs Monte-Carlo simulations to determine their (empirical) power.
```{r}
rx <- function(n) rnorm(n, mean = 0, sd = 1) 
ry <- function(n) rnorm(n, mean = 0.5, sd = 1) 
## two-sample
sim.ssize.wilcox.test(rx = rx, ry = ry, n.max = 100, iter = 1000)
sim.ssize.wilcox.test(rx = rx, ry = ry, n.min = 65, n.max = 70, 
                      step.size = 1, iter = 1000, BREAK = FALSE)
power.t.test(delta = 0.5, power = 0.8)

## one-sample
sim.ssize.wilcox.test(rx = ry, n.max = 100, iter = 1000, type = "one.sample")
sim.ssize.wilcox.test(rx = ry, type = "one.sample", n.min = 33, 
                      n.max = 38, step.size = 1, iter = 1000, BREAK = FALSE)
power.t.test(delta = 0.5, power = 0.8, type = "one.sample")

## two-sample
rx <- function(n) rgamma(n, scale = 2, shape = 1.5)
ry <- function(n) rgamma(n, scale = 4, shape = 1.5) # equal shape
## two-sample
sim.ssize.wilcox.test(rx = rx, ry = ry, n.max = 100, iter = 1000)
sim.ssize.wilcox.test(rx = rx, ry = ry, n.min = 25, n.max = 30, 
                      step.size = 1, iter = 1000, BREAK = FALSE)
```

For practical applications we recommend to use a higher number of iterations. 
For more detailed examples we refer to the help page of the function.


## Two Negative Binomial Rates
When we consider two negative binomial rates [@Zhu2014], we can compute sample 
size or power applying function power.nb.test.
```{r}
## examples from Table III in Zhu and Lakkis (2014)
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 1)
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 2)
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 3)
```


## Diagnostic Test
Given an expected sensitivity and specificity we can compute sample size, 
power (assurance probability), delta or significance level (error probability) 
of diagnostic test [@Flahault2005]. The calculation is based on a one-sided 
confidence interval.
```{r}
## see n2 on page 1202 of Chu and Cole (2007)
ssize.sens.ci(sens = 0.99, delta = 0.14, power = 0.95) # 40
ssize.sens.ci(sens = 0.99, delta = 0.13, power = 0.95) # 43
ssize.spec.ci(spec = 0.99, delta = 0.12, power = 0.95) # 47
```

Given an expected AUC we can compute sample size, power (assurance probability), 
delta or significance level (error probability) of the AUC based on a one-sided
confidence interval.
```{r}
ssize.auc.ci(AUC = 0.9, delta = 0.1, power = 0.95)
```

The sample size planning for developing binary classifiers in case of high
dimensional data, we can apply function ssize.pcc, which is based on the 
probability of correct classification (PCC) [@Dobbin2007].
```{r}
## see Table 2 of Dobbin et al. (2008)
g <- 0.1
fc <- 1.6
ssize.pcc(gamma = g, stdFC = fc, nrFeatures = 22000)
```


## Reference Range
We can apply function ssize.reference.range to determine the sample size required
for a study planned to establish a reference range. The parametric approach 
assumes a normal distribution whereas the non-parametric approach only assumes
a continuous distribution.
```{r}
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = TRUE)
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = FALSE)
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "nonparametric", exact = TRUE)
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "nonparametric", exact = FALSE)
```

We can also calculate the sample size for one-sided reference ranges.
```{r}
ssize.reference.range(delta = 0.015, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = TRUE, 
                      alternative = "one.sided")
```

## Multiple Primary Endpoints (MPE)

We demonstrate how to calculate the sample size for a trial with two co-primary 
endpoints with known covariance.
 
```{r}
Sigma <- matrix(c(1, 0.8, 0.8, 1), nrow = 2)
power.mpe.known.var(K = 2, delta = c(0.25, 0.4), Sigma = Sigma, 
                    sig.level = 0.025, power = 0.8)
## equivalent: specify SDs and correlation rho
power.mpe.known.var(K = 2, delta = c(0.25, 0.4), SD = c(1,1), rho = 0.8, 
                    sig.level = 0.025, power = 0.8)
```

Next, we show how to calculate the sample size for a trial with two 
co-primary endpoints with unknown covariance. Here, we follow three steps to 
determine the sample size.

+ Step 1: As we need starting values for our algorithm that computes the sample 
size in this case, we first act as if the covariance would be known and compute 
the sample size by applying our function `power.mpe.known.var`.

+ Step 2: The resulting value of `n` is considered as lower bound for the sample 
size in case of unknown covariance and is used as `n.min` in function 
`power.mpe.unkown.var`. Moreover, we specify a reasonable `n.max`, which must be 
larger than `n.min`.

+ Step 3: Finally, by using the arguments from the step 2, we can compute the 
sample size for the situation with unknown covariance.
  

```{r}
## Step 1:
power.mpe.known.var(K = 2, delta = c(0.5, 0.4), Sigma = Sigma, 
                    sig.level = 0.025, power = 0.8)
## Step 2 + 3:
Sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
power.mpe.unknown.var(K = 2, delta = c(0.5, 0.4), Sigma = Sigma, 
                      sig.level = 0.025, power = 0.8, 
                      n.min = 105, n.max = 120)
## equivalent: specify SDs and correlation rho
power.mpe.unknown.var(K = 2, delta = c(0.5, 0.4), SD = c(1,1), rho = 0.5, 
                      sig.level = 0.025, power = 0.8, 
                      n.min = 105, n.max = 120)
```

We finally demonstrate how to calculate the sample size for a trial with two primary 
endpoints with known covariance, where the trial is designed to find a significant 
difference for at least one endpoint.

```{r}
Sigma <- matrix(c(1, 0.3, 0.3, 1), nrow = 2)
power.mpe.atleast.one(K = 2, delta = c(0.2, 0.3), Sigma = Sigma,
                      power = 0.8, sig.level = 0.025)
## equivalent: specify SDs and correlation rho
power.mpe.atleast.one(K = 2, delta = c(0.2, 0.3), SD = c(1,1), rho = 0.3,  
                      power = 0.8, sig.level = 0.025)
```


## Power and Type-I-Error Simulations
There are quite some discussions and various proposals, which test is the best 
under which circumstances when we want to compare the location (mean, median) 
of two groups [@Fagerland2009; @Fagerland2012; @Sezer2017]. In addition, it is 
questioned whether pre-testing of assumptions is appropriate/useful at least from 
a practical point of view [@Zimmerman2004; @Rasch2011; @Rochon2012; @Hoekstra2012].

We provide functions to simulate power and type-I-error of classical [@student1908], 
Welch [@Welch1947] and Hsu [@Hsu1938] t-tests as well as of Wilcoxon-Mann-Whitney
tests [@Wilcoxon1945; @Mann1947].
```{r}
## functions to simulate the data
## group 1
rx <- function(n) rnorm(n, mean = 0, sd = 1)
rx.H0 <- rx
## group 2
ry <- function(n) rnorm(n, mean = 3, sd = 3)
ry.H0 <- function(n) rnorm(n, mean = 0, sd = 3)
## theoretical results
power.welch.t.test(n = 10, delta = 3, sd1 = 1, sd2 = 3)
power.hsu.t.test(n = 10, delta = 3, sd1 = 1, sd2 = 3)
## simulation
res.t <- sim.power.t.test(nx = 10, rx = rx, rx.H0 = rx.H0,
                          ny = 10, ry = ry, ry.H0 = ry.H0)
res.t
res.w <- sim.power.wilcox.test(nx = 10, rx = rx, rx.H0 = rx.H0,
                               ny = 10, ry = ry, ry.H0 = ry.H0)
res.w
```

For further investigation of the results, we provide some diagnostic plots.
```{r, fig.width=7, fig.height=7}
## t-tests
hist(res.t)
qqunif(res.t, alpha = 0.05)
volcano(res.t, hex = TRUE)
##  Wilcoxon-Mann-Whitney tests
hist(res.w)
qqunif(res.w, alpha = 0.05)
```

We can also generate a volcano plot for the Wilcoxon-Mann-Whitney test, but this
would require setting argument <tt>conf.int</tt> to <tt>TRUE</tt>, which would
lead to a much higher computation time, hence we omitted it here. Furthermore,
it is also possible to compute an approximate version of the test by setting
argument <tt>approximate</tt> to <tt>TRUE</tt> [@Hothorn2008] again by the cost
of an increased computation time. 


## sessionInfo
```{r}
sessionInfo()
```


## References