---
title: "coverage_correlation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{coverage_correlation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Function `coverage_correlation`, implements the **coverage correlation coefficient** introduced in the paper  [*Coverage correlation: detecting singular dependencies between random variables*](https://arxiv.org/abs/2508.06402). The coverage correlation coefficient, is a nonparametric measure of statistical association designed to detect dependencies concentrated on low-dimensional structures within the joint distribution of two random variables or vectors. Based on Monge--Kantorovich ranks and geometric coverage processes, this statistic quantifies the extent to which the joint distribution concentrates on a singular subset with respect to the product of the marginals. The coverage correlation coefficient is distribution-free, admits an analytically tractable asymptotic null distribution, and can be computed efficiently, making it well-suited for uncovering complex, potentially nonlinear associations in large-scale pairwise testing.
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(covercorr)
```
### Example 1
In this example, we demonstrate how to use`coverage_correlation` with a simple simulation. We compute the coverage correlation coefficient between two one dimensional Normal random variables, `X` and `Y`, and then vary the strength of their relationship to observe how the statistic changes.
```{r}
n <- 100
p <- 1
X <- rnorm(n)
Y <- rnorm(n)
result <- coverage_correlation(Y, X, visualise = TRUE)
str(result)
```
In the example above, `X` and `Y` are independent.  
The parameter `visualise` defaults to `FALSE`, but setting it to `TRUE`
produces a plot that illustrates the intuition behind the coverage correlation coefficient.

The coverage correlation coefficient first transforms `X` and `Y` into their
Monge–Kantorovich ranks, denoted by `X_rank` and `Y_rank`, which are uniformly distributed
on \([0, 1]\). The plot displays the pairs \((X_{\text{rank}_i}, Y_{\text{rank}_i})\) along with cubes of
volume \(n^{-1}\).

Inside the function `coverage_correlation`, we compute
\(V_n\), the total uncovered area after taking the union of these cubes.
The coverage correlation coefficient is then defined as

\[
\kappa_n^{X, Y} := \frac{V_n - e^{-1}}{1 - e^{-1}}.
\]

The function returns a list with four elements:

- `$stat`: the value of the coverage correlation coefficient.
- `$pval`: the p-value of this statistic under the null hypothesis of independence.
- `$method`: the method used to compute the statistic (e.g., `"exact"` or `"approx"`).
- `$mc_se`: If method `"approx"` was used \code{mc_se} is the standard error of the Monte Carlo approximation, otherwise it is 0.

By default, `method = "auto"`. In this mode, if the **total dimension** of `X` and `Y`  
(i.e., `ncol(X) + ncol(Y)`, treating vectors as one-dimensional) is at most 6,  
the method is set to `"exact"`; otherwise, it uses `"approx"`.

Next we can see how the result changes as we introduces dependence between `X` and `Y`

```{r}
n <- 100
p <- 1
X <- rnorm(n)
Z <- rnorm(n)
rho <- 0.9
Y <- rho * X + sqrt(1 - rho^2) * Z
result <- coverage_correlation(Y, X, visualise = TRUE)
str(result)
```
You may notice parts of some cubes appearing at the corners of the plot.  
This happens because we treat \([0, 1]^2\) as a **torus**.  
If a cube centered at one of the rank points lies partially outside  
\([0, 1]^2\), we *wrap it around* so that the plot reflects this topology.

### Example 2

The coverage correlation coefficient can handle multidimensional random vectors as well. 
```{r}
n <- 100
p <- 2
X <- matrix(rnorm(p * n), ncol = p)
Y <- matrix(0, nrow = n, ncol = p)
Y[, 1] <- X[, 1]^2
Y[, 2] <- X[, 1] * X[, 2]
result <- coverage_correlation(Y, X)
str(result)
```
In this case we cannot visualise the whole plot as `X` and `Y` are not one-dimensional.

### Example 3
In the example below, `X` and `Y` are independent and 2-dimensional. We set the `method` parameter equal to `approx`.
```{r}
n <- 50
p <- 2
X <- matrix(rnorm(p * n), ncol = p)
Y <- matrix(rnorm(p * n), ncol = p)
result <- coverage_correlation(Y, X, method = 'approx')
str(result)
```