This package highDmean is an implementation of the high-dimensional
two-sample test proposed by Zhang and Wang (2020) “Result consistency of
high dimensional two-sample tests applied to gene ontology terms with
gene sets”. Testing multivariate two-sample mean equality has a
classical solution–Hotelling’s T-square test. When the dimensionality is
greater than the sample sizes, Hotelling’s test fails due to the
singularity of covariance matrix. In this case, the test proposed by
Zhang and Wang (2020), referred to as zwl_test()
in this
package, can tackle the issue and provide reliable and powerful test. It
also implement the test proposed by Srivastava, Katayama, and Kano
(2013) “A two sample test in high dimensional data.”
You can install the released version of highDmean from CRAN with:
install.packages("highDmean")
This is a basic example which shows you how to solve a common problem:
library(highDmean)
<- buildData(n = 45, m =60, p = 300,
data muX = rep(0,300), muY = rep(0,300),
dep = 'IND', S = 1, innov = rnorm)
zwl_test(data[[1]]$X, data[[1]]$Y, order = 2)
#> $statistic
#> [1] 0.7534648
#>
#> $pvalue
#> [1] 0.4511707
#>
#> $Tn
#> [1] 1.08859
#>
#> $var
#> [1] 0.007897337
The functions zwl_test()
and SKK_test()
accept n by p and m by p data matrices with sample data from the first
and second populations and return test statistics and p-values for the
null hypothesis of equal means.
The buildData()
function simulates high-dimensional data
in the two-population setting with specified sample sizes, numbers of
components, covariance structure, etc., and the functions
zwl_sim()
and SKK_sim()
return test statistic
values and p-values for lists of simulated data sets generated by
buildData()
.