JAGS vs. Stan

library(BH)
library(rACMEMEEV)

Here we will compare and contrast the two different sampling backends that are available to users of this library. There are far more indepth reviews of when to use which sampler and why (Beraha et al. 2021; Homan et al. 2014), but, nevertheless, we can further add some guidance for this specific library.

Setting up the Workflow.

I’m going to skip over a lot of the background information on why measurement error adjustment is important in dietary exposure assessment as there are many far more detailed tutorials on this. We should begin by creating a custom dataset that can be used throughout the entirety of this example. Normally dietary consumption data is characterized by a gamma distribution (Passerelli et al. 2022), so we that’s the data that will be generated. It is very important to note here that this library will standardize (i.e. make the consumption distribution more normal) by default. So, please be careful when applying this library to your dataset.

x <- rgamma(100, shape = 1, rate = 0.01)
y <- rgamma(100, shape = 3, rate = 0.02)
z <- rgamma(100, shape = 3, rate = 0.3)

These will represent random consumption values, and accordingly, we will generate some example output data that is linked to these input datas.

output <- rlnorm(100, meanlog = 3.5, sdlog = 0.2)

Thus, the full dataset looks like:

df <- data.frame(
  list(x = x, y = y, z = z, output = output)
)

head(df, 10)
#>             x         y         z   output
#> 1  193.929578  90.74957  8.662289 32.25831
#> 2   18.041910 338.32089  6.448633 46.03330
#> 3   53.443286  42.33018  3.058837 45.20608
#> 4    2.396523  86.38140  6.098603 25.39488
#> 5   42.777424 380.82160  4.813437 30.38507
#> 6   14.315278 104.60287 11.283213 38.96740
#> 7  294.574967 138.23559  7.384926 33.34672
#> 8   45.663770  97.96775  6.283557 30.57872
#> 9    2.690928 184.23220  5.053584 31.23650
#> 10 233.469657 340.39325  3.089829 37.17521

x_v_coef <- generate_coefficient(1000, 0.4, 0.7, 0.95)
y_v_coef <- generate_coefficient(1000, 0.5, 0.7, 0.95)
z_v_coef <- generate_coefficient(1000, 0.3, 0.6, 0.95)

### Stan vs. JAGS Pre-Model

With the data generated, we can now fit the pre-model to solve for the covariances
in the observed data. [Muoka et. al 2021](#References) created a very clever situation
where the JAGS sampler will converge quite quickly as the inverse-wishart distribution
is used to create a semi-conjugate prior situation. This of course does not guarantee
that the sampled posterior will be fully realized, but it does give the sampler that
much more of a chance to converge. From an API perspective, the same function to fit
the pre-model with a JAGS backend can be used to fit with a Stan backend. The only thing
that needs to be adjusted is the `stan = TRUE` parameter for the function.


``` r
jags_output <- acme_model(df, c("x", "y", "z"))
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 100
#>    Unobserved stochastic nodes: 4
#>    Total graph size: 166
#> 
#> Initializing model
stan_output <- acme_model(df, c("x", "y", "z"), stan = TRUE)
#> 
#> SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 9.9e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.99 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
#> Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1501 / 10000 [ 15%]  (Sampling)
#> Chain 1: Iteration: 2500 / 10000 [ 25%]  (Sampling)
#> Chain 1: Iteration: 3500 / 10000 [ 35%]  (Sampling)
#> Chain 1: Iteration: 4500 / 10000 [ 45%]  (Sampling)
#> Chain 1: Iteration: 5500 / 10000 [ 55%]  (Sampling)
#> Chain 1: Iteration: 6500 / 10000 [ 65%]  (Sampling)
#> Chain 1: Iteration: 7500 / 10000 [ 75%]  (Sampling)
#> Chain 1: Iteration: 8500 / 10000 [ 85%]  (Sampling)
#> Chain 1: Iteration: 9500 / 10000 [ 95%]  (Sampling)
#> Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.596 seconds (Warm-up)
#> Chain 1:                3.767 seconds (Sampling)
#> Chain 1:                4.363 seconds (Total)
#> Chain 1:

Both of the models can be then introspected to see the underlying JAGS / Stan model code along with samples to get a better understanding of how the samplers can yield slightly different posteriors.

jags_output$model
#> JAGS model:
#> 
#> 
#>   model {
#>     for (i in 1:n) {
#>       model_data[i, 1:p] ~ dmnorm( zMu[1:p] , zInvCovMat[1:p, 1:p] )
#>     }
#> 
#>     for (varIdx in 1:p) {
#>       # -- uninformative prior for zMu -- #
#>       zMu[varIdx] ~ dnorm(0, 1000000)
#>     }
#>     zInvCovMat ~ dwish(zRmat[1:p, 1:p], zdf)
#> 
#>     # -- Convert invCovMat to SD and correlation -- #
#>     zCovMat <- inverse(zInvCovMat)
#>     for (varIdx in 1:p) {
#>       zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx])
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         zRho[varIdx1, varIdx2] <- (
#>           zCovMat[varIdx1, varIdx2] / (zSigma[varIdx1] * zSigma[varIdx2])
#>         )
#>       }
#>     }
#> 
#>     for (varIdx in 1:p) {
#>       sigma[varIdx] <- zSigma[varIdx] * sd_orig[, varIdx]
#>       mu[varIdx] <- zMu[varIdx] * sd_orig[, varIdx] + mean_orig[, varIdx]
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         rho[varIdx1, varIdx2] <- zRho[varIdx1, varIdx2]
#>       }
#>     }
#>   }
#>   
#> Fully observed variables:
#>  mean_orig model_data n p sd_orig zRmat zdf
stan_output$model
#> S4 class stanmodel 'anon_model' coded as follows:
#>   data {
#>     int<lower=1> n;
#>     int<lower=1> p;
#>     matrix[n, p] model_data;
#>     matrix[p, p] zRmat;
#>     real<lower=0> zdf;
#>     vector[p] sd_orig;
#>     vector[p] mean_orig;
#>   }
#>   parameters {
#>     vector[p] zMu;
#>     cov_matrix[p] zInvCovMat;
#>   }
#>   transformed parameters {
#>     matrix[p, p] zCovMat;
#>     vector[p] zSigma;
#>     matrix[p, p] zRho;
#>     // -- Convert invCovMat to SD and correlation --
#>     zCovMat = inv(zInvCovMat);
#>     for (varIdx in 1:p) {
#>       zSigma[varIdx] = sqrt(zCovMat[varIdx, varIdx]);
#>     }
#>     for (varIdx1 in 1:p) {
#>       for (varIdx2 in 1:p) {
#>         zRho[varIdx1, varIdx2] = zCovMat[varIdx1, varIdx2] / (zSigma[varIdx1] * zSigma[varIdx2]);
#>       }
#>     }
#>   }
#>   model {
#>     for (i in 1:n) {
#>       model_data[i, 1:p] ~ multi_normal(zMu[1:p], zInvCovMat[1:p, 1:p]);
#>     }
#>     // --  uninformative prior for zMu --
#>     for (i in 1:p) {
#>       zMu[i] ~ normal(0, 1000000);
#>     }
#>     zInvCovMat ~ wishart(zdf, zRmat[1:p, 1:p]);
#>   }
#>   generated quantities {
#>     vector[p] mu;
#>     vector[p] sigma;
#>     matrix[p, p] rho;
#>     // -- transform sd and means back to original scale --
#>     for (varIdx in 1:p) {
#>       sigma[varIdx] = zSigma[varIdx] * sd_orig[varIdx];
#>       mu[varIdx] = zMu[varIdx] * sd_orig[varIdx] + mean_orig[varIdx];
#>     }
#>     rho = zRho;
#>   }
#> 

To show exactly how the samplers can create differing results. We can directly look at the different metrics for the solved covariance structure as well as the actual resultant covariance matrix.

jags_output$covariance_matrix
#> 
#> Iterations = 1001:11000
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 10000 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>               Mean        SD  Naive SE Time-series SE
#> rho[1,1]   1.00000 1.409e-16 1.409e-18      0.0000000
#> rho[2,1]   0.08985 9.896e-02 9.896e-04      0.0009896
#> rho[3,1]  -0.13364 9.668e-02 9.668e-04      0.0009668
#> rho[1,2]   0.08985 9.896e-02 9.896e-04      0.0009896
#> rho[2,2]   1.00000 1.422e-16 1.422e-18      0.0000000
#> rho[3,2]  -0.13592 9.711e-02 9.711e-04      0.0009630
#> rho[1,3]  -0.13364 9.668e-02 9.668e-04      0.0009668
#> rho[2,3]  -0.13592 9.711e-02 9.711e-04      0.0009630
#> rho[3,3]   1.00000 0.000e+00 0.000e+00      0.0000000
#> sigma[1] 111.39497 7.959e+00 7.959e-02      0.0795902
#> sigma[2]  98.21803 7.045e+00 7.045e-02      0.0687691
#> sigma[3]   6.47476 4.622e-01 4.622e-03      0.0046991
#> 
#> 2. Quantiles for each variable:
#> 
#>             2.5%       25%       50%       75%     97.5%
#> rho[1,1]  1.0000   1.00000   1.00000   1.00000   1.00000
#> rho[2,1] -0.1025   0.02202   0.09061   0.15746   0.28025
#> rho[3,1] -0.3157  -0.20090  -0.13561  -0.06821   0.06043
#> rho[1,2] -0.1025   0.02202   0.09061   0.15746   0.28025
#> rho[2,2]  1.0000   1.00000   1.00000   1.00000   1.00000
#> rho[3,2] -0.3240  -0.20167  -0.13718  -0.06983   0.05588
#> rho[1,3] -0.3157  -0.20090  -0.13561  -0.06821   0.06043
#> rho[2,3] -0.3240  -0.20167  -0.13718  -0.06983   0.05588
#> rho[3,3]  1.0000   1.00000   1.00000   1.00000   1.00000
#> sigma[1] 97.3128 105.78669 110.95773 116.36249 128.58500
#> sigma[2] 85.6659  93.29242  97.70399 102.65356 113.34842
#> sigma[3]  5.6404   6.15341   6.44824   6.76547   7.43727
stan_output$covariance_matrix
#> $summary
#>                 mean      se_mean           sd       2.5%         25%
#> sigma[1]  108.821210 6.975823e-02 8.021457e+00   93.59244  103.248954
#> sigma[2]   95.714630 6.163810e-02 6.925195e+00   82.56554   91.014884
#> sigma[3]    6.313615 4.136898e-03 4.553638e-01    5.45204    6.004330
#> rho[1,1]    1.000000 2.539112e-18 1.362537e-16    1.00000    1.000000
#> rho[1,2]   32.033426 2.352995e+01 2.169541e+03  -98.56311    4.069593
#> rho[1,3]   -7.156656 3.815892e+00 3.517537e+02  -69.80735  -10.871758
#> rho[2,1]   32.033426 2.352995e+01 2.169541e+03  -98.56311    4.069593
#> rho[2,2]    1.000000 1.807418e-18 1.359980e-16    1.00000    1.000000
#> rho[2,3]    1.447703 6.796757e+00 6.268534e+02  -65.91060  -10.668964
#> rho[3,1]   -7.156656 3.815892e+00 3.517537e+02  -69.80735  -10.871758
#> rho[3,2]    1.447703 6.796757e+00 6.268534e+02  -65.91060  -10.668964
#> rho[3,3]    1.000000 1.770896e-18 1.363175e-16    1.00000    1.000000
#> lp__     -150.433843 3.878089e-02 2.191957e+00 -155.64937 -151.687541
#>                  50%         75%      97.5%     n_eff      Rhat
#> sigma[1]  108.714693  114.221931  124.65692 13222.561 0.9998995
#> sigma[2]   95.548497  100.393867  109.50405 12623.086 0.9998825
#> sigma[3]    6.305834    6.615026    7.23390 12116.226 0.9999691
#> rho[1,1]    1.000000    1.000000    1.00000  2879.603 0.9998823
#> rho[1,2]    7.101119   13.628188  106.09897  8501.464 1.0000697
#> rho[1,3]   -6.422826   -4.347302   55.68731  8497.380 0.9998825
#> rho[2,1]    7.101119   13.628188  106.09897  8501.464 1.0000697
#> rho[2,2]    1.000000    1.000000    1.00000  5661.715 0.9998823
#> rho[2,3]   -6.261771   -4.174223   62.69786  8506.061 0.9998863
#> rho[3,1]   -6.422826   -4.347302   55.68731  8497.380 0.9998825
#> rho[3,2]   -6.261771   -4.174223   62.69786  8506.061 0.9998863
#> rho[3,3]    1.000000    1.000000    1.00000  5925.399 0.9998823
#> lp__     -150.063881 -148.835957 -147.24029  3194.689 1.0002539
#> 
#> $c_summary
#> , , chains = chain:1
#> 
#>           stats
#> parameter         mean           sd       2.5%         25%         50%
#>   sigma[1]  108.821210 8.021457e+00   93.59244  103.248954  108.714693
#>   sigma[2]   95.714630 6.925195e+00   82.56554   91.014884   95.548497
#>   sigma[3]    6.313615 4.553638e-01    5.45204    6.004330    6.305834
#>   rho[1,1]    1.000000 1.362537e-16    1.00000    1.000000    1.000000
#>   rho[1,2]   32.033426 2.169541e+03  -98.56311    4.069593    7.101119
#>   rho[1,3]   -7.156656 3.517537e+02  -69.80735  -10.871758   -6.422826
#>   rho[2,1]   32.033426 2.169541e+03  -98.56311    4.069593    7.101119
#>   rho[2,2]    1.000000 1.359980e-16    1.00000    1.000000    1.000000
#>   rho[2,3]    1.447703 6.268534e+02  -65.91060  -10.668964   -6.261771
#>   rho[3,1]   -7.156656 3.517537e+02  -69.80735  -10.871758   -6.422826
#>   rho[3,2]    1.447703 6.268534e+02  -65.91060  -10.668964   -6.261771
#>   rho[3,3]    1.000000 1.363175e-16    1.00000    1.000000    1.000000
#>   lp__     -150.433843 2.191957e+00 -155.64937 -151.687541 -150.063881
#>           stats
#> parameter          75%      97.5%
#>   sigma[1]  114.221931  124.65692
#>   sigma[2]  100.393867  109.50405
#>   sigma[3]    6.615026    7.23390
#>   rho[1,1]    1.000000    1.00000
#>   rho[1,2]   13.628188  106.09897
#>   rho[1,3]   -4.347302   55.68731
#>   rho[2,1]   13.628188  106.09897
#>   rho[2,2]    1.000000    1.00000
#>   rho[2,3]   -4.174223   62.69786
#>   rho[3,1]   -4.347302   55.68731
#>   rho[3,2]   -4.174223   62.69786
#>   rho[3,3]    1.000000    1.00000
#>   lp__     -148.835957 -147.24029

As you can see the sigma values are quite similar, but the rho values vary quite a bit. This has direct implications for what the resultant structure of the output coefficient adjustments will look like. As referenced above, the samplers explore the posterior space in a bit of a different way, so this can lead to divergences between the two results. This should not be viewed as an error as the posterior here is not mathematically directly estimatable, but it is important to assess, for you, what makes more sense. The research team for this library believes that the JAGS sampler is best, so it is the default argument. All results from the Stan sampler should be viewed as experimental and not necessarily supported by the maintainers.

Stan vs. JAGS Error Adjustment

With the pre-model fit, now we can assess how the resulting coefficients are different depending on which sampler is used.

validity_coefficients <- c(x_v_coef, y_v_coef, z_v_coef)
jags_lambda <- attenuation_matrix(
  jags_output,
  c("x", "y", "z"),
  validity_coefficients
)
jags_model_output <- multivariate_model(
  "output ~ x + y + z",
  data = df,
  columns = c("x", "y", "z"),
  a_c_matrix = jags_lambda$matrix,
  sds = jags_lambda$sds,
  variances = jags_lambda$variances,
  univariate = TRUE
)
validity_coefficients <- c(x_v_coef, y_v_coef, z_v_coef)
stan_lambda <- attenuation_matrix(
  stan_output,
  c("x", "y", "z"),
  validity_coefficients,
  stan = TRUE
)
stan_model_output <- multivariate_model(
  "output ~ x + y + z",
  data = df,
  columns = c("x", "y", "z"),
  a_c_matrix = stan_lambda$matrix,
  sds = stan_lambda$sds,
  variances = stan_lambda$variances,
  univariate = TRUE
)

Please be aware that the stan = TRUE parameter is passed to each step of the adjustment pipeline as the underlying libraries have slightly different APIs. This will prevent unexpected errors if you accidentally try to pass a rJags object to a function expecting a Stan object.

jags_plots <- plot_covariates(jags_model_output, c("x", "y", "z"))
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
jags_plots$x

jags_plots$y

jags_plots$z

stan_plots <- plot_covariates(stan_model_output, c("x", "y", "z"))
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
#> No id variables; using all as measure variables
stan_plots$x

stan_plots$y

stan_plots$z

Again, the differing pre-model results yield different adjustments for our synthetic dataset. It is worth repeating that one is not necessarily more accurate, but the divergences should be treated as important data points when assessing the measurement error structure in your own dataset.

Traceplots

And accordingly, the traceplots can be assessed to make sure that both the samplers have converged.

traceplots(stan_output$samples, c("x", "y", "z"), pre_model = TRUE, stan = TRUE)

traceplots(jags_output$samples, c("x", "y", "z"), pre_model = TRUE)

References

  1. Passarelli, S., Free, C. M., Allen, L. H., Batis, C., Beal, T., Biltoft-Jensen, A. P., Bromage, S., Cao, L., Castellanos-Gutiérrez, A., Christensen, T., Crispim, S. P., Dekkers, A., De Ridder, K., Kronsteiner-Gicevic, S., Lee, C., Li, Y., Moursi, M., Moyersoen, I., Schmidhuber, J., … Golden, C. D. (2022). Estimating national and subnational nutrient intake distributions of global diets. The American Journal of Clinical Nutrition, 116(2), 551–560. https://doi.org/10.1093/ajcn/nqac108
  2. Beraha, M., Falco, D., & Guglielmi, A. (2021). JAGS, NIMBLE, Stan: A detailed comparison among Bayesian MCMC software (No. arXiv:2107.09357). arXiv. https://doi.org/10.48550/arXiv.2107.09357
  3. Homan, M. D., & Gelman, A. (2014). The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1), 1593–1623.
  4. Muoka, A. K., Agogo, G. O., Ngesa, O. O., & Mwambi, H. G. (2020b). A Method to adjust for measurement error in multiple exposure variables measured with correlated errors in the absence of an internal validation study. F1000Research, 9, 1486. https://doi.org/10.12688/f1000research.27892.1

Maintainers