---
title: "irdc-demo"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{irdc-demo}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(FORD) # Our package
library(FOCI) # For comparison
library(ggplot2) # For visualization
```
# Introduction
We propose a new dependence measure $\nu(Y, \mathbf{X})$ ([*A New Measure Of Dependence: Integrated R2*](http://arxiv.org/abs/2505.18146)) to assess how much a random vector $\mathbf{X}$ explains a univariate response $Y$. Let $Y$ be a random variable and $\mathbf{X} = (X_1, \cdots, X_p)$ a random vector defined on the same probability space. Let $\mu$ be the probability law of $Y$, and $S$ be the support of $\mu$. Define:
$$
\tilde{S} =
\begin{cases}
S \setminus \{s_{\max}\} & \text{if } S \text{ has a maximum } s_{\max} \\
S & \text{otherwise}
\end{cases}
$$
We define the measure $\tilde{\mu}$ on $S$ as:
$$
\tilde{\mu}(A) = \frac{\mu(A \cap \tilde{S})}{\mu(\tilde{S})}, \quad \text{for measurable } A \subseteq S
$$
Then the **irdc dependence coefficient** is defined as:
$$
\nu(Y, \mathbf{X}) := \int \frac{\mathrm{Var}(\mathbb{E}[\mathbf{1}\{Y > t\} \mid \mathbf{X}])}{\mathrm{Var}(\mathbf{1}\{Y > t\})} d\tilde{\mu}(t)
$$
In contrast, [*A Simple Measure Of Conditional Dependence*](https://www.jstor.org/stable/27170947) consider:
$$
T(Y, \mathbf{X}) = \frac{\int \mathrm{Var}(\mathbb{E}[\mathbf{1}\{Y \ge t\} \mid \mathbf{X}]) d\mu(t)}{\int \mathrm{Var}(\mathbf{1}\{Y \ge t\}) d\mu(t)}
$$
# Continuous Case
```{r continuous}
n <- 1000
x <- matrix(runif(n * 3), nrow = n)
y <- (x[, 1] + x[, 2]) %% 1
irdc(y, x[, 1])
irdc(y, x[, 2])
irdc(y, x[, 3])
```
# Discrete Case
## Example 1
```{r discrete-1}
n <- 10000
s <- 0.1
x1 <- c(rep(0, n * s), runif(n * (1 - s)))
x2 <- runif(n)
y <- x1
irdc(y, x1, dist.type.X = "discrete")
irdc(y, x2)
```
## Example 2
```{r discrete-2}
n <- 10000
x1 <- runif(n)
y1 <- rbinom(n, 1, 0.5)
y2 <- as.numeric(x1 >= 0.5)
irdc(y1, x1, dist.type.X = "discrete")
irdc(y2, x1, dist.type.X = "discrete")
FOCI::codec(y1, x1)
FOCI::codec(y2, x1)
```
## Example 3: Hurdle vs Gamma Mixture
```{r hurdle-vs-gamma}
r_hurdle_poisson <- function(n, p_zero = 0.3, lambda = 2) {
is_zero <- rbinom(n, 1, p_zero)
rztpois <- function(m, lambda) {
samples <- numeric(m)
for (i in 1:m) {
repeat {
x <- rpois(1, lambda)
if (x > 0) {
samples[i] <- x
break
}
}
}
samples
}
result <- numeric(n)
result[is_zero == 0] <- rztpois(sum(is_zero == 0), lambda)
result
}
set.seed(123)
n <- 1000
p_zero <- 0.4
lambda <- 10
hurdle <- r_hurdle_poisson(n, p_zero, lambda)
gamma_mix <- c(rep(0, round(p_zero * n)), rgamma(round((1 - p_zero) * n), shape = lambda, rate = 1))
df <- data.frame(
value = c(hurdle, gamma_mix),
source = rep(c("Hurdle Poisson", "Gamma Mixture"), each = n)
)
ggplot(df, aes(x = value, fill = source)) +
geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
labs(title = "Comparison: Hurdle Poisson vs Gamma Mixture",
x = "Value", y = "Count", fill = "Distribution") +
theme_bw()
```
## Example 3 Continued
```{r discrete-3}
x1 <- sort(gamma_mix)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(hurdle)
irdc(y1, x1, dist.type.X = "discrete")
irdc(y2, x1, dist.type.X = "discrete")
FOCI::codec(y1, x1)
FOCI::codec(y2, x1)
```
## Example 4
```{r discrete-4}
x1 <- sort(hurdle)
y1 <- rbinom(n, 1, 0.5)
y2 <- sort(gamma_mix)
irdc(y1, x1, dist.type.X = "discrete")
irdc(y2, x1, dist.type.X = "discrete")
FOCI::codec(y1, x1)
FOCI::codec(y2, x1)
```
# Conclusion
*irdc* provides a flexible and theoretically grounded dependence measure that works for both continuous and discrete predictors.
For further theoretical details, see our paper:
Azadkia and Roudaki (2025), [*A New Measure Of Dependence: Integrated R2*](http://arxiv.org/abs/2505.18146)