---
title: "Mhorseshoe"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Mhorseshoe}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(Mhorseshoe)
library(ggplot2)
library(horseshoe)
```
In this vignette, the functions exact_horseshoe, approx_horseshoe are explained
in the following sections.
1. About the horseshoe estimator in this package
2. modified approximate algorithm
# 1. About the horseshoe estimator in this package
The horseshoe prior is a continuous reduced prior distribution frequently used
in high-dimensional Bayesian linear models, and can be effectively applied to
sparse data. This is a method that theoretically guarantees excellent shrinkage
properties. Mhorseshoe provides an estimator applying the horseshoe prior, The
likelihood function of the Gaussian linear model to be considered in this
package is as follows:
$$L(y\ |\ x, \beta, \sigma^2) = (\frac{1}{\sqrt{2\pi}\sigma})^{-N/2}exp
\{ -\frac{1}{2\sigma^2}(y-X\beta)^T(y-X\beta)\},\\ X \in
\mathbb{R}^{N \times p},\ y \in \mathbb{R}^{N},\ \beta \in \mathbb{R}^{p}$$
And the form of the hierarchical model applying the horseshoe prior to $\beta$
is expressed as follows:
$$\beta_{j}\ |\ \sigma^{2}, \tau^{2}, \lambda_{j}^{2}
\overset{i.i.d}{\sim} N\left(0, \sigma^{2}\tau^{2}\lambda_{j}^{2} \right),
\quad \lambda_{j}\overset{i.i.d}{\sim}C^{+}(0,1),\quad j=1,2,...,p, \\ \tau
\overset{i.i.d}{\sim}C^{+}(0,1),\quad p(\sigma^2)\propto gamma\left(
w/2, w/2\right).$$
Where $C^{+}(0,1)$ is a half-Cauchy distribution,
$\lambda_{1}, ..., \lambda_{p}$ are local shrinkage parameters, and $\tau$ is
global shrinkage parameter. The exact_horseshoe in this package is a horseshoe
estimator for the prior defined above, as described in exact MCMC algorithm of
Johndrow et al. (2020). The blocked Metropolis-within-Gibb that samples
$\lambda,\ \left(\tau, \sigma, \beta \right)$ is adopted. Since the algorithm
considers the case of $p >> N$, the computational cost can be
lowered by using fast sampling(Bhattacharya et al.,2016) in the step of
sampling $\beta$.
For simulation, the data size was set to N = 300, p = 500, and the design
matrix $X=\{x_{j} \}_{j=1}^{p}$, true value of beta were set as follows.
$$x_{j} \sim N_{N}\left(0, I_{N} \right), \\ y_{i} \sim N(x_{i}\beta, 4), \\
\beta_{j} = \begin{cases} 1, & \mbox{if }\ \mbox{j<51,} \\ 0, &
\mbox{otherwise.}\end{cases}$$
As a caution, the data is for testing the application of the algorithm and is
not a simulation that strictly considers sparsity condition.
```{r}
# making simulation data.
set.seed(123)
N <- 300
p <- 500
p_star <- 50
true_beta <- c(rep(1, p_star), rep(0, p-p_star))
# design matrix X.
X <- matrix(1, nrow = N, ncol = p)
for (i in 1:p) {
X[, i] <- stats::rnorm(N, mean = 0, sd = 1)
}
# response variable y.
y <- vector(mode = "numeric", length = N)
e <- rnorm(N, mean = 0, sd = 2)
for (i in 1:p_star) {
y <- y + true_beta[i] * X[, i]
}
y <- y + e
```
For the corresponding simulation data, the results were compared with the
horseshoe function of the horseshoe package. We fit the data to the
horseshoe and exact_horseshoe functions and plot the results for the first 100
indices, including the first 50 indices that are non-zero coefficients.
```{r fig.width = 6, fig.height= 4}
# horseshoe in horseshoe package.
horseshoe_result <- horseshoe::horseshoe(y, X, method.tau = "halfCauchy",
method.sigma = "Jeffreys",
burn = 0, nmc = 500)
# exact_horseshoe in Mhorseshoe package.
exact_horseshoe_result <- exact_horseshoe(y, X, burn = 0, iter = 500)
df <- data.frame(index = 1:100,
horseshoe_BetaHat = horseshoe_result$BetaHat[1:100],
horseshoe_LeftCI = horseshoe_result$LeftCI[1:100],
horseshoe_RightCI = horseshoe_result$RightCI[1:100],
exhorseshoe_BetaHat = exact_horseshoe_result$BetaHat[1:100],
exhorseshoe_LeftCI = exact_horseshoe_result$LeftCI[1:100],
exhorseshoe_RightCI = exact_horseshoe_result$RightCI[1:100],
true_beta = true_beta[1:100])
# Estimation results of the horseshoe in horseshoe package.
ggplot(data = df, aes(x = index, y = true_beta)) +
geom_point(size = 2) +
geom_point(aes(x = index, y = horseshoe_BetaHat), size = 2, col = "red") +
geom_errorbar(aes(ymin = horseshoe_LeftCI,
ymax = horseshoe_RightCI), width = .1, col = "red") +
labs(title = "95% Credible intervals of the horseshoe in horseshoe package",
y = "beta")
# Estimation results of the exact_horseshoe in Mhorseshoe package.
ggplot(data = df, aes(x = index, y = true_beta)) +
geom_point(size = 2) +
geom_point(aes(x = index, y = exhorseshoe_BetaHat),
size = 2, col = "red") +
geom_errorbar(aes(ymin = exhorseshoe_LeftCI,
ymax = exhorseshoe_RightCI), width = .1, col = "red") +
labs(title = "95% Credible intervals of the exact_horseshoe in Mhorseshoe
package", y = "beta")
```
The results of both functions are the same. Both functions estimate that
95% of the credible intervals for all non-zero indices except 4, 36, 41, and 46
include 1.
# 2. Modified approximate algorithm
Johndrow et al. (2020) proposed a scalable approximate MCMC algorithm that can
reduce computational costs by introducing a thresholding method while applying
the horseshoe prior. The set of columns that satisfies the condition
($\tau^{2}\lambda_{j}^{2} > \delta$) is defined as the active set, and let's
define $S$ as the set of indices of columns that satisfy the condition.
$$S = \{j\ |\ \tau^{2}\lambda_{j}^{2} > \delta,\ j=1,2,...,p. \}.$$
If $\tau^{2}\lambda_{j}^{2}$ is very small, the posterior of $\beta$
will have a mean and variance close to 0. Let the number of elements
of $S$ be $s_{\delta}$ and the diagonal matrix for the local shrinkage
parameters be $\Lambda = diag(\lambda_1^2,...,\lambda_p^2)$. The approximate
algorithm uses a reduced version of the matrix as
$\Lambda_{S} \in R^{s_{\delta} \times s_{\delta}}$, which reduces the
computational cost, is applied to the main sampling part. This changed
algorithm is implemented in the approx_horseshoe function.
Johndrow et al. (2020) uses a fixed $\delta$ in the approximate algorithm, On
the other hand, our package applies a new method that sets $\delta$ with the
adaptive probability.
The adaptive algorithm developed in this package estimates a new threshold
$\delta^{New}$ with updated shrinkage parameters by MCMC sampling, Adopt
$\delta^{New}$ through adaptive probability. the process of updating a new
threshold is added by applying the properties of the shrinkage weight
$k_{j},\ j=1,2,...,p$ proposed by Piironen and Vehtari (2017). In the prior of
$\beta_{j} \sim N(0, \sigma^{2}\tau^{2}\lambda_{j}^{2})$, the variable
$m_{eff}$ is defined as follows.
$$k_{j} = 1/\left(1+n\tau^{2}s_{j}^{2}\lambda_{j}^{2} \right), \quad m_{eff} = \sum_{j=1}^{p}{\left(1-k_{j} \right)}.$$
Where $ns_{j}^2,\ j=1,2,...,p$ are the diagonal components of $X^{T}X$. For the
zero components of $\beta$, $k_{j}$ is derived close to 1, and nonzero's
$k_{j}$ is derived close to 0, so the variable $m_{eff}$ is called the
effective number of nonzero coefficients. In this algorithm, $\delta$ is
updated to set $s_{\delta} = ceiling(m_{eff})$. Adaptive probability is
defined to satisfy Theorem 5(diminishing adaptation condition) of Roberts and
Rosenthal (2007). at $T$th iteration,
$$p(T) = exp[p_{0} + p_{1}T],\quad p_{1} < 0,
\quad u \sim U(0, 1), \\ if\ u < p(T),\ accept\ \delta^{New},\
s_{\delta^{New}} = ceiling(m_{eff}).$$
The default is $p_{0} = 0$, $p_{1} = -4.6 \times 10^{-4}$, and
under this condition, $p(10000) < 0.01$ is satisfied. The approximate algorithm
that newly added this procedure is called the modified approximate algorithm,
and simulation is performed.
```{r fig.width = 6, fig.height= 4}
# approximate algorithm with fixed default threshold.
approx_horseshoe_result <- approx_horseshoe(y, X, burn = 0, iter = 500,
auto.threshold = FALSE)
# modified approximate algorithm.
mapprox_horseshoe_result <- approx_horseshoe(y, X, burn = 0, iter = 500,
auto.threshold = TRUE)
df2 <- data.frame(index = 1:100,
approx_BetaHat = approx_horseshoe_result$BetaHat[1:100],
approx_LeftCI = approx_horseshoe_result$LeftCI[1:100],
approx_RightCI = approx_horseshoe_result$RightCI[1:100],
mapprox_BetaHat = mapprox_horseshoe_result$BetaHat[1:100],
mapprox_LeftCI = mapprox_horseshoe_result$LeftCI[1:100],
mapprox_RightCI = mapprox_horseshoe_result$RightCI[1:100],
true_beta = true_beta[1:100])
# Estimation results of the approximate algorithm.
ggplot(data = df2, aes(x = index, y = true_beta)) +
geom_point(size = 2) +
geom_point(aes(x = index, y = approx_BetaHat), size = 2, col = "red") +
geom_errorbar(aes(ymin = approx_LeftCI,
ymax = approx_RightCI), width = .1, col = "red") +
labs(title = "95% Credible intervals of the approx_horseshoe", y = "beta")
# Estimation results of the modified approximate algorithm.
ggplot(data = df2, aes(x = index, y = true_beta)) +
geom_point(size = 2) +
geom_point(aes(x = index, y = mapprox_BetaHat),
size = 2, col = "red") +
geom_errorbar(aes(ymin = mapprox_LeftCI,
ymax = mapprox_RightCI),
width = .1, col = "red") +
labs(title = "95% Credible intervals of the modified_approx_horseshoe",
y = "beta")
```
In the case of exact_horseshoe, since the $\Lambda$ is used as is, always think
of $length(S) = 500$. On the other hand, since the method of setting the
threshold is different in approximate algorithm and modified approximate
algorithm, a difference occurs in $length(S) = s_{\delta}$.
```{r fig.width = 6, fig.height= 4}
exact_activeset <- rep(p, 500)
approx_activeset <- apply(approx_horseshoe_result$ActiveSet, MARGIN = 1, sum)
mapprox_activeset <- apply(mapprox_horseshoe_result$ActiveSet, MARGIN = 1, sum)
# active set plot
ggplot(data = data.frame(X = 1:500,
exact_activeset = exact_activeset,
approx_activeset = approx_activeset,
mapprox_activeset = mapprox_activeset)) +
geom_line(mapping = aes(x = X, y = exact_activeset,
color = "exact")) +
geom_line(mapping = aes(x = X, y = approx_activeset,
color = "approx"),
alpha = 0.5) +
geom_line(mapping = aes(x = X, y = mapprox_activeset,
color = "modified_approx"),
alpha = 0.5) +
scale_color_manual(name = "algorithm",
values = c("black", "red", "blue"),
breaks = c("exact", "approx", "modified_approx"),
labels = c("exact", "approx", "modified_approx"))
```
# References
Bhattacharya, A., Chakraborty, A., & Mallick, B. K. (2016). Fast sampling with
Gaussian scale mixture priors in high-dimensional regression. Biometrika,
asw042.
Johndrow, J., Orenstein, P., & Bhattacharya, A. (2020). Scalable Approximate
MCMC Algorithms for the Horseshoe Prior. In Journal of Machine Learning
Research (Vol. 21).
Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization
in the horseshoe and other shrinkage priors. Electronic Journal of
Statistics, 11, 5018-5051.
Roberts G, Rosenthal J. Coupling and ergodicity of adaptive Markov chain Monte
Carlo algorithms. J Appl Prob. 2007;44:458–475.