--- title: "Introduction to bayesDiagnostics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to bayesDiagnostics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` # bayesDiagnostics: Comprehensive Bayesian Model Diagnostics The **bayesDiagnostics** package provides a comprehensive suite of diagnostic tools for Bayesian models fitted with `brms`, `rstan`, and other popular Bayesian modeling packages. This vignette demonstrates all major functions organized by diagnostic category. ```{r setup} library(bayesDiagnostics) library(brms) library(ggplot2) ``` ## Overview of Package Functions The package contains **18 main functions** organized into 5 categories: 1. **MCMC Convergence Diagnostics** (3 functions) 2. **Posterior Predictive Checks** (5 functions) 3. **Prior Specification & Sensitivity** (3 functions) 4. **Model Comparison** (3 functions) 5. **Utilities & Reporting** (4 functions) --- ## Part 1: MCMC Convergence Diagnostics ### 1.1 Quick MCMC Diagnostics Summary The `mcmc_diagnostics_summary()` function provides a comprehensive overview of MCMC convergence, including R-hat, ESS, and NUTS-specific diagnostics. ```{r mcmc-diagnostics, eval=FALSE} # Fit a simple model data <- data.frame( y = rnorm(100, mean = 5, sd = 2), x1 = rnorm(100), x2 = rnorm(100) ) fit <- brm(y ~ x1 + x2, data = data, chains = 4, iter = 2000, refresh = 0) # Run comprehensive diagnostics diag <- mcmc_diagnostics_summary(fit, rhat_threshold = 1.01, ess_threshold = 400) print(diag) # Check results diag$converged # TRUE/FALSE diag$summary_table # Full diagnostic table diag$divergences # Number of divergent transitions ``` **What it checks:** - **R-hat** values (should be < 1.01) - **Bulk ESS** and **Tail ESS** (should be > 400) - **Divergent transitions** (should be 0) - **Tree depth saturation** (NUTS-specific) --- ### 1.2 Effective Sample Size Diagnostics The `effective_sample_size_diagnostics()` function provides detailed ESS analysis with visual diagnostics. ```{r ess-diagnostics, eval=FALSE} # Detailed ESS analysis ess_diag <- effective_sample_size_diagnostics( model = fit, parameters = c("b_x1", "b_x2", "sigma"), min_ess = 400, by_chain = TRUE, plot = TRUE ) print(ess_diag) plot(ess_diag) # Access specific components ess_diag$ess_summary # Summary statistics ess_diag$bulk_ess # Bulk ESS per parameter ess_diag$tail_ess # Tail ESS per parameter ess_diag$by_chain_ess # Per-chain ESS breakdown ess_diag$problematic_params # Parameters with low ESS ess_diag$recommendations # Actionable suggestions ``` **Interpretation:** - **Bulk ESS**: Measures precision of posterior mean/median estimates - **Tail ESS**: Measures precision of credible interval bounds - **Per-chain ESS**: Identifies which specific chains have issues --- ### 1.3 Hierarchical Model Convergence For hierarchical/multilevel models, use `hierarchical_convergence()` to check group-level and population-level parameters separately. ```{r hierarchical-convergence, eval=FALSE} # Fit hierarchical model data_hier <- data.frame( y = rnorm(200), x = rnorm(200), group = rep(1:10, each = 20) ) fit_hier <- brm( y ~ x + (1 + x | group), data = data_hier, chains = 4, refresh = 0 ) # Check hierarchical convergence hier_conv <- hierarchical_convergence( model = fit_hier, group_vars = "group", plot = TRUE ) print(hier_conv) plot(hier_conv) ``` **What it provides:** - Group-level vs. population-level convergence breakdown - Variance component diagnostics - Shrinkage assessment - Visual comparison of group-level parameters --- ## Part 2: Posterior Predictive Checks (PPC) ### 2.1 Comprehensive Posterior Predictive Checks The `posterior_predictive_check()` function is the workhorse for model validation. ```{r ppc-basic, eval=FALSE} # Basic PPC with multiple test statistics ppc_result <- posterior_predictive_check( model = fit, observed_data = data$y, n_samples = 1000, test_statistics = c("mean", "sd", "median", "min", "max", "skewness", "kurtosis"), plot = TRUE ) print(ppc_result) plot(ppc_result) # Access results ppc_result$observed_stats # Observed test statistics ppc_result$replicated_stats # Posterior predictive statistics ppc_result$p_values # Bayesian p-values (should be ~0.5) ``` **Interpretation Guide:** - **p-value ≈ 0.5**: Good fit (model captures the statistic well) - **p-value < 0.05 or > 0.95**: Model misspecification - **p-value near 0**: Model underestimates the statistic - **p-value near 1**: Model overestimates the statistic --- ### 2.2 Automated PPC For quick diagnostic checks across multiple statistics, use `automated_ppc()`: ```{r automated-ppc, eval=FALSE} # Automated checks with flagging auto_ppc <- automated_ppc( model = fit, observed_data = data$y, n_samples = 1000, p_value_threshold = 0.05 ) print(auto_ppc) # Check for issues auto_ppc$diagnostics # Table with all statistics and flags auto_ppc$flags # Text warnings for problematic statistics ``` **When to use:** - Quick model validation - Screening for gross misspecification - Automated reporting workflows --- ### 2.3 Graphical PPC Create publication-quality PPC visualizations: ```{r graphical-ppc, eval=FALSE} # Density overlay p1 <- graphical_ppc(fit, data$y, type = "density", n_draws = 50) print(p1) # Prediction intervals p2 <- graphical_ppc(fit, data$y, type = "intervals", n_draws = 50) print(p2) # Ribbon plot (useful for ordered/time-series data) p3 <- graphical_ppc(fit, data$y, type = "ribbon", n_draws = 50) print(p3) ``` --- ### 2.4 LOO Cross-Validation PPC The `ppc_crossvalidation()` function performs Leave-One-Out Probability Integral Transform (LOO-PIT) checks: ```{r ppc-loo, eval=FALSE} # LOO-PIT check loo_ppc <- ppc_crossvalidation( model = fit, observed_y = data$y, n_draws = NULL # Use all draws for accuracy ) print(loo_ppc) plot(loo_ppc) # Inspect results loo_ppc$pit_values # Should be ~Uniform(0,1) if well-calibrated loo_ppc$loo_result # LOO object loo_ppc$weighted # Whether weighted PIT was used ``` **Interpretation:** - **Uniform PIT distribution**: Model is well-calibrated - **U-shaped**: Over-dispersed predictions - **Inverse-U**: Under-dispersed predictions - **Skewed**: Systematic bias in predictions --- ### 2.5 Custom Bayesian P-Values For custom test statistics, use the flexible `bayesian_p_values()` utility: ```{r custom-pvalues, eval=FALSE} # Generate posterior predictive samples yrep <- posterior_predict(fit, ndraws = 1000) # Define custom test statistic custom_stat <- function(x) max(x) - min(x) # Range # Calculate Bayesian p-value result <- bayesian_p_values( yrep = yrep, y = data$y, statistic = custom_stat ) result$observed_value # Observed range result$replicated_values # Posterior predictive ranges result$p_value # P(replicated ≥ observed) ``` --- ## Part 3: Prior Specification & Sensitivity Analysis ### 3.1 Prior Elicitation Helper The `prior_elicitation_helper()` translates expert knowledge into statistical priors: ```{r prior-elicitation, eval=FALSE} # Define expert beliefs expert_input <- list( parameter_name = "treatment_effect", plausible_range = c(-0.5, 1.5), # Plausible values most_likely_value = 0.3, # Best guess confidence = 0.8 # How confident (0-1) ) # Get prior recommendations prior_rec <- prior_elicitation_helper( expert_beliefs = expert_input, parameter_type = "continuous", method = "quantile", data_sample = rnorm(100, 0.3, 0.5), # Optional: existing data visualize = TRUE ) print(prior_rec) # Use recommended prior prior_rec$recommended_prior # brms::prior object prior_rec$alternatives # Alternative priors for sensitivity prior_rec$sensitivity_note # Guidance ``` **Parameter types:** - `"continuous"`: Normal, Student-t, log-normal - `"discrete"`: Poisson, negative binomial - `"proportion"`: Beta distribution --- ### 3.2 Prior Sensitivity Analysis The `prior_sensitivity()` function assesses robustness to prior choice: ```{r prior-sensitivity, eval=FALSE} # Define alternative priors prior_grid <- list( weak = set_prior("normal(0, 10)", class = "b"), moderate = set_prior("normal(0, 2)", class = "b"), strong = set_prior("normal(0, 0.5)", class = "b") ) # Run sensitivity analysis sens_result <- prior_sensitivity( model = fit, parameters = c("b_x1", "b_x2"), prior_grid = prior_grid, comparison_metric = "KL", # or "Wasserstein", "overlap" plot = TRUE, n_draws = 2000 ) print(sens_result) plot(sens_result) # Check sensitivity sens_result$sensitivity_metrics # How much posteriors differ ``` **Metrics:** - **KL divergence**: Information-theoretic difference - **Wasserstein distance**: L1 distance between distributions - **Overlap coefficient**: Proportion of overlapping density **Interpretation:** - **Low sensitivity**: Conclusions robust to prior choice ✓ - **High sensitivity**: Data is weak; prior strongly influences results --- ### 3.3 Prior Robustness Analysis For comprehensive multi-dimensional sensitivity assessment: ```{r prior-robustness, eval=FALSE} robust_result <- prior_robustness( model = fit, prior_specifications = prior_grid, parameters = c("b_x1", "b_x2"), perturbation_direction = "expand", # or "contract", "shift" dimensions = c(0.5, 1, 2, 4), # Scaling factors comparison_metric = "KL", plot = TRUE ) print(robust_result) # Check robustness robust_result$robustness_index # Composite score (higher = more robust) robust_result$concerning_parameters # Parameters with low robustness robust_result$recommendations # What to do ``` --- ## Part 4: Model Comparison ### 4.1 Comprehensive Model Comparison Suite The `model_comparison_suite()` compares multiple models using information criteria: ```{r model-comparison, eval=FALSE} # Fit competing models fit1 <- brm(y ~ x1, data = data, refresh = 0) fit2 <- brm(y ~ x1 + x2, data = data, refresh = 0) fit3 <- brm(y ~ x1 * x2, data = data, refresh = 0) # Compare models comparison <- model_comparison_suite( Model_1 = fit1, Model_2 = fit2, Model_3 = fit3, criterion = "all", # LOO, WAIC, Bayes R² plot = TRUE, detailed = TRUE ) print(comparison) plot(comparison) # Results comparison$comparison_table # All metrics comparison$ic_differences # ΔLOO, ΔWAIC, model weights comparison$plots # Visualization ``` **Metrics explained:** - **LOO-ELPD**: Leave-one-out expected log predictive density (higher is better) - **WAIC**: Widely Applicable Information Criterion (lower is better) - **Bayes R²**: Bayesian coefficient of determination - **Model weights**: Probability each model is best --- ### 4.2 Bayes Factor Comparison For direct model comparison via Bayes Factors: ```{r bayes-factor, eval=FALSE} # Compare two models bf_result <- bayes_factor_comparison( Model_A = fit1, Model_B = fit2, method = "bridge_sampling", # or "waic" repetitions = 5, silent = TRUE ) print(bf_result) # Interpretation bf_result$bayes_factor # BF_{1,2} bf_result$log_bf # Log Bayes Factor bf_result$interpretation # Text interpretation # For 3+ models: pairwise comparisons bf_multi <- bayes_factor_comparison(fit1, fit2, fit3, method = "bridge_sampling") bf_multi$pairwise_comparisons ``` **Bayes Factor Interpretation (Kass & Raftery, 1995):** - **BF < 1**: Evidence for Model 2 - **1-3**: Weak evidence for Model 1 - **3-10**: Moderate evidence - **10-30**: Strong evidence - **> 30**: Very strong evidence --- ### 4.3 Predictive Performance Evaluation The `predictive_performance()` function evaluates out-of-sample predictions: ```{r predictive-performance, eval=FALSE} # In-sample performance perf_in <- predictive_performance( model = fit, newdata = NULL, # NULL = use training data observed_y = data$y, metrics = "all", # RMSE, MAE, Coverage, CRPS credible_level = 0.95, n_draws = NULL ) print(perf_in) plot(perf_in) # Out-of-sample performance test_data <- data.frame(x1 = rnorm(50), x2 = rnorm(50), y = rnorm(50)) perf_out <- predictive_performance( model = fit, newdata = test_data, observed_y = test_data$y, metrics = "all" ) # Compare metrics perf_in$point_metrics # RMSE, MAE, Correlation perf_in$interval_metrics # Coverage, Interval Width perf_in$proper_scores # CRPS (lower is better) perf_in$prediction_summary # Per-observation predictions ``` **Key metrics:** - **RMSE/MAE**: Prediction error (lower is better) - **Coverage**: % of observations within credible intervals (should ≈ 0.95) - **CRPS**: Continuous Ranked Probability Score (proper scoring rule) --- ## Part 5: Utilities & Comprehensive Reporting ### 5.1 Extract Posterior (Unified Interface) The `extract_posterior_unified()` provides consistent posterior extraction across packages: ```{r extract-posterior, eval=FALSE} # Extract as data frame draws_df <- extract_posterior_unified( model = fit, parameters = c("b_x1", "b_x2", "sigma"), format = "draws_df", ndraws = 1000, include_warmup = FALSE, chains = NULL # All chains ) # Extract as matrix draws_matrix <- extract_posterior_unified(fit, format = "draws_matrix") # Extract as array (iterations × chains × parameters) draws_array <- extract_posterior_unified(fit, format = "draws_array") # Extract as named list draws_list <- extract_posterior_unified(fit, format = "list") ``` **Supported model types:** - `brmsfit` (brms) - `stanfit` (rstan) - `stanreg` (rstanarm) - `CmdStanMCMC` (cmdstanr) - `mcmc.list` (coda) --- ### 5.2 Diagnostic Report Generation The `diagnostic_report()` function creates comprehensive HTML/PDF reports: ```{r diagnostic-report, eval=FALSE} # Generate comprehensive report diagnostic_report( model = fit, output_file = "bayesian_model_diagnostics.pdf", output_format = "pdf", # or "html", "docx" include_sections = c( "model_summary", "convergence", "posterior_summary", "recommendations" ), rhat_threshold = 1.01, ess_threshold = 0.1, open_report = TRUE # Open automatically ) ``` **Sections included:** 1. **Model Summary**: Formula, family, data info 2. **Convergence Diagnostics**: R-hat, ESS, divergences 3. **Posterior Summary**: Parameter estimates with credible intervals 4. **Recommendations**: Actionable suggestions for model improvement --- ## Complete Workflow Example Here's a complete Bayesian workflow using all major functions: ```{r complete-workflow, eval=FALSE} # ===== STEP 1: FIT MODEL ===== library(bayesDiagnostics) library(brms) # Simulate data set.seed(123) N <- 200 data <- data.frame( y = rnorm(N, mean = 3 + 2*rnorm(N) - 0.5*rnorm(N), sd = 1.5), x1 = rnorm(N), x2 = rnorm(N) ) # Fit Bayesian model fit <- brm( y ~ x1 + x2, data = data, prior = c( prior(normal(0, 5), class = "b"), prior(student_t(3, 0, 2.5), class = "sigma") ), chains = 4, iter = 2000, warmup = 1000, refresh = 0 ) # ===== STEP 2: CONVERGENCE DIAGNOSTICS ===== # Quick check diag <- mcmc_diagnostics_summary(fit) print(diag) # Detailed ESS analysis ess_diag <- effective_sample_size_diagnostics(fit, plot = TRUE) plot(ess_diag) # ===== STEP 3: POSTERIOR PREDICTIVE CHECKS ===== # Comprehensive PPC ppc <- posterior_predictive_check(fit, observed_data = data$y, plot = TRUE) print(ppc) # Automated screening auto_ppc <- automated_ppc(fit, data$y) print(auto_ppc) # LOO cross-validation loo_ppc <- ppc_crossvalidation(fit, data$y) plot(loo_ppc) # ===== STEP 4: PRIOR SENSITIVITY ===== # Define alternative priors prior_grid <- list( weak = set_prior("normal(0, 10)", class = "b"), moderate = set_prior("normal(0, 5)", class = "b"), strong = set_prior("normal(0, 1)", class = "b") ) # Test sensitivity sens <- prior_sensitivity( fit, parameters = c("b_x1", "b_x2"), prior_grid = prior_grid, plot = TRUE ) print(sens) # ===== STEP 5: MODEL COMPARISON ===== # Fit alternative models fit_x1 <- brm(y ~ x1, data = data, refresh = 0) fit_x1x2 <- fit fit_interaction <- brm(y ~ x1 * x2, data = data, refresh = 0) # Compare comp <- model_comparison_suite( Linear = fit_x1, Additive = fit_x1x2, Interaction = fit_interaction, criterion = "all", plot = TRUE ) print(comp) # ===== STEP 6: PREDICTIVE PERFORMANCE ===== # Hold-out validation test_idx <- sample(1:N, 40) test_data <- data[test_idx, ] train_data <- data[-test_idx, ] fit_train <- update(fit, newdata = train_data, refresh = 0) perf <- predictive_performance( fit_train, newdata = test_data, observed_y = test_data$y, metrics = "all" ) print(perf) plot(perf) # ===== STEP 7: GENERATE REPORT ===== diagnostic_report( fit, output_file = "full_diagnostics.html", output_format = "html", include_sections = c("model_summary", "convergence", "posterior_summary", "recommendations") ) ``` --- ## Interpretation Guidelines ### Convergence (MCMC) - **R-hat < 1.01**: Chains have converged - **ESS > 400**: Sufficient effective samples - **No divergences**: Sampler is behaving well - **R-hat > 1.01**: Run longer chains or reparameterize - **Divergences > 0**: Increase `adapt_delta` or reparameterize ### Posterior Predictive Checks - **p-values ≈ 0.5**: Model captures test statistic - **LOO-PIT ~Uniform**: Well-calibrated predictions - **Extreme p-values**: Model misspecification ### Prior Sensitivity - **Low KL/Wasserstein**: Robust to prior choice - **High sensitivity**: Need more data or informative priors ### Model Comparison - **ΔLOO < 4**: Models are similar - **ΔLOO > 10**: Strong preference for best model - **Coverage ≈ 0.95**: Credible intervals well-calibrated --- ## Summary Table: All Functions | Function | Category | Purpose | |----------|----------|---------| | `mcmc_diagnostics_summary()` | Convergence | Quick MCMC diagnostic overview | | `effective_sample_size_diagnostics()` | Convergence | Detailed ESS analysis with plots | | `hierarchical_convergence()` | Convergence | Hierarchical model convergence | | `posterior_predictive_check()` | PPC | Comprehensive PPC with test statistics | | `automated_ppc()` | PPC | Automated screening across statistics | | `graphical_ppc()` | PPC | Publication-quality PPC plots | | `ppc_crossvalidation()` | PPC | LOO-PIT cross-validation checks | | `bayesian_p_values()` | PPC | Custom test statistic p-values | | `prior_elicitation_helper()` | Prior | Translate expert knowledge to priors | | `prior_sensitivity()` | Prior | Assess robustness to prior choice | | `prior_robustness()` | Prior | Multi-dimensional sensitivity analysis | | `model_comparison_suite()` | Comparison | Compare models via LOO/WAIC/R² | | `bayes_factor_comparison()` | Comparison | Bayes Factor model comparison | | `predictive_performance()` | Comparison | Evaluate predictive accuracy | | `extract_posterior_unified()` | Utility | Unified posterior extraction | | `diagnostic_report()` | Utility | Generate comprehensive reports | --- ## Further Reading ### Books - Gelman et al. (2020). *Bayesian Data Analysis*, 3rd ed. - McElreath (2020). *Statistical Rethinking*, 2nd ed. - Kruschke (2015). *Doing Bayesian Data Analysis*, 2nd ed. ### Papers - Vehtari et al. (2021). "Rank-Normalization, Folding, and Localization: An Improved R-hat for Assessing Convergence of MCMC." *Bayesian Analysis*. - Gabry et al. (2019). "Visualization in Bayesian workflow." *Journal of the Royal Statistical Society, Series A*. - Vehtari et al. (2017). "Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC." *Statistics and Computing*. ### Packages - **brms**: Bayesian Regression Models using Stan - **bayesplot**: Plotting for Bayesian models - **posterior**: Tools for working with posterior draws - **loo**: Efficient leave-one-out cross-validation --- ## Getting Help ```{r help, eval=FALSE} # Function documentation ?mcmc_diagnostics_summary ?posterior_predictive_check ?prior_sensitivity # Package vignettes browseVignettes("bayesDiagnostics") # Report issues # https://github.com/yourusername/bayesDiagnostics/issues ``` --- **End of Vignette**