--- title: "Adjusting the noise distribution in the Barker proposal" output: rmarkdown::html_vignette bibliography: references.bib link-citations: true vignette: > %\VignetteIndexEntry{Adjusting the noise distribution in the Barker proposal} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The Barker proposal implementation in the `rmcmc` package in `barker_proposal()` by default uses a standard normal distribution for the auxiliary noise variables generated in the proposal, as suggested by @livingstone2022barker. The analysis in @vogrinc2023optimal however implies that alternative choices of noise distribution can give improved performance in some situations. This vignette demonstrates how to adjust the Barker proposal noise distribution. ```{r setup} library(rmcmc) ``` ## Example target distribution As a simple example of a target distribution, we consider a $D$-dimensional distribution with product form (independent dimensions) and 'hyperbolic' marginals $$\pi(x) \propto \exp\left(-\sum_{d=1}^D(\delta^2 + x_d^2)^{\frac{1}{2}}\right).$$ The gradient of the corresponding log density function can be derived as $$\nabla(\log \pi)(x)_d = -x_d / (\delta^2 + x_d^2)^{\frac{1}{2}}.$$ This can be implemented in R as ```{r} delta_sq <- 0.1 target_distribution <- list( log_density = function(x) -sum(sqrt(delta_sq + x^2)), gradient_log_density = function(x) -x / sqrt(delta_sq + x^2) ) ``` Here we use $\delta^2 = `r delta_sq`$, corresponding to Example 4 in @vogrinc2023optimal. ## Creating Barker proposal with a custom noise distribution The `barker_proposal()` function accepts an optional argument `sample_auxiliary`, which can be used to specify the distribution the auxiliary noise variables are drawn from. The argument should be passed a function accepting a single integer argument, corresponding to the target distribution dimension, and returning a vector of auxiliary noise variable samples, one per target distribution dimension. By default this function is set to `stats::rnorm()`, thus using independent standard normal variates for the auxiliary noise variables. Below we create a function `sample_bimodal` which instead samples from a bimodal normal mixture, with two equally-weighted normal components with means $\pm (1 - \sigma^2)$ and common variance $\sigma^2 = 0.01$. ```{r} sample_bimodal <- function(dimension) { sigma <- 0.1 return( sample(c(-1, 1), dimension, TRUE) * sqrt(1 - sigma^2) + stats::rnorm(dimension) * sigma ) } ``` This choice of bimodal distribution for the auxiliary noise variables is motivated by the analysis in @vogrinc2023optimal, which shows that for target distributions of product form and using a Barker proposal, the asymptotic _expect squared jump distance_ (ESJD) a measure of chain sampling performance, is maximised when the auxiliary noise distribution is chosen to be the Rademacher distribution (the distribution with half of its mass in atoms at -1 and +1). The Rademacher distribution itself does not give a practical sampling algorithm, as the resulting Markov kernel will not be irreducible. As a more practical alternative @vogrinc2023optimal therefore suggests using the above normal mixture bimodal distribution, which can be considered a relaxation of the Rademacher distribution. We now create two instances of the the Barker proposal, the first using the default of a standard normal distribution for the auxiliary noise variables, and the second using the bimodal distribution. ```{r} normal_noise_proposal <- barker_proposal() bimodal_noise_proposal <- barker_proposal(sample_auxiliary = sample_bimodal) ``` A convenience function `bimodal_barker_proposal()` is also provided in the package to simplify creating a proposal with bimodal noise distribution as above, with optional `sigma` argument specifying the standard deviation of the normal components. That is the second line above can equivalently be written `bimodal_noise_proposal <- bimodal_barker_proposal()` without the need to define the `sample_bimodal` function. ## Sampling chains with proposals Now that we have the target distribution and two proposals specified, we can sample chains to compare their performance. ```{r} set.seed(7861932L) dimension <- 100 initial_state <- rnorm(dimension) adapters <- list(scale_adapter()) n_warm_up_iteration <- 10000 n_main_iteration <- 10000 ``` Above we specify some common settings for the `sample_chain()` calls, namely the dimension of the target distribution, the initial chain state (shared for both chains), the adapters to use during the warm-up iterations (here we adapt only the proposal scale to achieve a target acceptance rate of 0.57), and the number of warm-up and main chain iterations. We can now sample a chain using the bimodal noise proposal variant ```{r} bimodal_noise_results <- sample_chain( target_distribution = target_distribution, proposal = bimodal_noise_proposal, initial_state = initial_state, n_warm_up_iteration = n_warm_up_iteration, n_main_iteration = n_main_iteration, adapters = adapters ) ``` and similarly using the default normal noise proposal ```{r} normal_noise_results <- sample_chain( target_distribution = target_distribution, proposal = normal_noise_proposal, initial_state = initial_state, n_warm_up_iteration = n_warm_up_iteration, n_main_iteration = n_main_iteration, adapters = adapters ) ``` ## Comparing performance of proposals For a realisation of a Markov chain $X_{1:N}$, the ESJD metric used to define the notion of sampling efficiency optimized over in @vogrinc2023optimal, can be estimated as $$ \widehat{\textsf{ESJD}}(X_{1:N}) = \frac{1}{N - 1} \sum_{n=1}^{N-1} \Vert X_{n+1} - X_n \Vert^2 $$ with corresponding R implementation as below ```{r} expected_square_jumping_distance <- function(traces) { n_iteration <- nrow(traces) mean(rowSums(traces[2:n_iteration, ] - traces[1:n_iteration - 1, ])^2) } ``` Applying this function to the chain traces generated using the Barker proposal with bimodal noise, ```{r} cat( sprintf( "Expected square jumping distance using normal noise is %.2f", expected_square_jumping_distance(bimodal_noise_results$traces[, 1:dimension]) ) ) ``` and Barker proposal with normal noise, ```{r} cat( sprintf( "Expected square jumping distance using bimodal noise is %.2f", expected_square_jumping_distance(normal_noise_results$traces[, 1:dimension]) ) ) ``` in both cases selecting columns `1:dimension` to exclude the traced log density values, we find that the proposal with bimodal noise gives roughly a factor two improvement in ESJD compared to using normal noise, correlating with the results in @vogrinc2023optimal. ```{r} library(posterior) ``` Using the `posterior` R package, we can all compare how the chains perform in terms of the estimated _effective sample size_ (ESS) of the estimate of the mean of the target log density. First considering the chain using the Barker proposal with bimodal noise ```{r} cat( sprintf( "Estimated ESS of mean(target_log_density) using bimodal noise is %.0f", ess_mean( extract_variable( as_draws_matrix(bimodal_noise_results$traces), "target_log_density" ) ) ) ) ``` and similarly for the Barker proposal with normal noise ```{r} cat( sprintf( "Estimated ESS of mean(target_log_density) using normal noise is %.0f", ess_mean( extract_variable( as_draws_matrix(normal_noise_results$traces), "target_log_density" ) ) ) ) ``` we again find that the bimodal noise variant appears to show significantly improved sampling efficiency. ## References