--- title: "Introduction to pvEBayes" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{pvEBayes-introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Disproportionality Analysis Contemporary drug postmarketing surveillance largely relies on the collection of spontaneous reports of suspected adverse drug events from pharmaceutical companies, healthcare professionals, and patients. These reports are curated and stored in spontaneous reporting systems (SRS), usually organized as a large frequency table for downstream data analysis. We consider an SRS dataset cataloging AE reports on $I$ AE rows across $J$ drug columns. Let $N_{ij}$ denote the number of reported cases for the $i$-th AE and the $j$-th drug, where $i = 1,..., I$ and $j = 1,..., J$. Therefore, AE-drug pairwise occurrences from the AE-reports are summarized into an $I \times J$ contingency table, where the $(i,j)$-th cell catalogs the observed count $N_{ij}$ indicating the number of cases involving $i$-th AE and the $j$-th drug. Methods implemented in this package assumes the observed count $N_{ij}$ conditional on $E_{ij}$ is that \begin{equation} \label{eqn:model-poisson-likelihood} N_{ij} \mid E_{ij} \sim \operatorname{Poisson}(E_{ij}\lambda_{ij}), \end{equation} where the parameter $\lambda_{ij} \geq 0$ is the relative reporting ratio, the signal strength, for the $(i, j)$-th pair measuring the ratio of the actual expected count arising due to dependence to the null baseline expected count. Therefore, $\{\lambda_{ij}\}$ are our key parameters of interest. A large $\lambda_{ij}$ indicates a strong association between a drug and an AE. Let $g$ be a prior density function for signal strength parameters for all AE-drug pairs: $\lambda_{ij} \sim g$. Then, in the context of the Poisson model, the marginal probability mass function of $N_{ij}$ is given by: \[ p(N_{ij}) = \int_0^{\infty} g(\lambda_{ij}) \ f_{\text{pois}}(N_{ij} \mid \lambda_{ij}E_{ij}) \ d\lambda_{ij}, \] where $f_{\text{pois}}(N \mid \lambda)$ is the probability mass function of a Poisson random variable with mean $\lambda$ evaluated at $N$. Under the empirical Bayes framework, the prior distribution is consequently estimated from the data by maximizing the log marginal likelihood: \[ \hat g = \mathop{\arg \max}\limits_{g} \sum_{i=1}^I \sum_{j=1}^J \log p(N_{ij}). \] Then, the estimated posterior density of $\lambda$ given $N_{ij}$ is: \[ \hat{\text{p}}(\lambda \mid N_{ij}) = \frac{\hat g(\lambda) f_{\text{pois}}(N_{ij} \mid \lambda E_{ij})}{\hat{\text{p}}(N_{ij})}, \] where $\hat{\text{p}}(N_{ij}) = \int_0^{\infty} \hat g(\lambda_{ij})f_{\text{pois}}(N_{ij}\mid \lambda_{ij}E_{ij}) \ d\lambda_{ij}$. All empirical Bayes models implemented in 'pvEBayes' share the structure described above; they differ in their assumptions on the prior distribution. The Gamma-Poisson Shrinker (GPS) adopts a two-component gamma mixture prior. The K-gamma model generalizes GPS by allowing a K-component gamma mixture, where K is user-specified. The Koenker-Mizera (KM) and Efron models use a nonparametric discrete mixture prior, with point masses on a pre-specified grid. The general-gamma model employs a nonparametric sparse gamma mixture distribution. Package 'pvEBayes' provides implementations for the empirical Bayes methods for pharmacovigilance mentioned above. It provides tools for effectively fitting these models to the spontaneous reporting system (SRS) frequency tables, extracting summaries, performing hyperparameter tuning, and generating graphical summaries (eye plots and heatmaps) for signal detection and estimation. In the following we provide an example (borrowed from Tan et al.) of SRS data analyzing with 'pvEBayes'. ## Analyzing FDA statin SRS data with pvEBayes ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(pvEBayes) library(ggplot2) # load the SRS data data("statin2025_44") # show the first 6 rows head(statin2025_44) ``` ### Fit the general-gamma model Our interest lies in finding the most important adverse events and estimating the corresponding signal strength of these 6 statin drugs. These are achieved by fitting empirical Bayes models with pvEBayes() to the SRS data. We begin by fitting the general-gamma model to this dataset. Other models mentioned above could be used by modifying the 'model' argument. For illustration, we fit the model with a fixed hyperparameter value of $ \alpha = 0.5$ using the pvEBayes() function. ```{r, include=FALSE} ``` ```{r fit-gg, include=TRUE} gg_given_alpha <- pvEBayes(statin2025_44, model = "general-gamma", alpha = 0.5 ) gg_given_alpha_detected_signal <- summary(gg_given_alpha, return = "detected signal" ) sum(gg_given_alpha_detected_signal) ``` The return argument specifies which component the summary function should return. Valid options include: “prior parameters”, “likelihood”, “detected signal”, and “posterior draws”. If it is set to NULL (default), all components will be returned in a list. In this example, we show the detected signals. In this package, we suggest tuning the general-gamma model by AIC or BIC, which can be accessed through 'AIC()' or 'BIC()' functions, as shown below: ```{r tune, include=TRUE} AIC(gg_given_alpha) BIC(gg_given_alpha) ``` In practice, one can specify a list of candidate $\alpha$ values, fit the general-gamma model for each, compute the corresponding AIC or BIC, and select the model with the lowest AIC or BIC. Instead of manually doing so, one can use the 'pvEBayes_tune()' function, which implements these steps. The relevant code is given below: ```{r auto-tune, include=TRUE, eval=TRUE} gg_tune_statin44 <- pvEBayes_tune(statin2025_44, model = "general-gamma", alpha_vec = c(0, 0.1, 0.3, 0.5, 0.7, 0.9), use_AIC = TRUE ) ``` ## Visualization In addition, 'pvEBayes' has implemented visual summary methods for both signal detection and estimation, which are heatmap and eyeplot. These plot functions can be accessed through the 'plot()' with argument type = “heatmap” or type = “eyeplot”. ```{r heatmap, include = TRUE, fig.height=6, fig.width=8} heatmap_gg_tune_statin44 <- plot(gg_tune_statin44, type = "heatmap", num_top_AEs = 10, cutoff_signal = 1.001 ) heatmap_gg_tune_statin44 + theme( legend.position = "top" ) ``` The above heatmap visualizes the signal detection result for the fitted general-gamma model. For each AE-drug combination, the number of reports (N), the estimated null value (E), and the estimated posterior probability of being a signal are given. A deeper blue color indicates stronger evidence for a signal. ```{r eyeplot, include = TRUE, fig.height=6, fig.width=8} eyeplot_gg_tune_statin44 <- plot(gg_tune_statin44, type = "eyeplot", num_top_AEs = 8, N_threshold = 1, log_scale = FALSE, text_shift = 2.3, text_size = 3, x_lim_scalar = 1.2 ) eyeplot_gg_tune_statin44 + theme( legend.position = "top" ) ``` The above eyeplot visualizes empirical posterior inferences on 10 prominent AEs across 6 statin drugs through computed empirical Bayesian posterior distributions of signal strengths $\{\lambda_{ij}\}$ obtained from the general-gamma model fitted on the statin2025_44 dataset. The points and bars represent the posterior medians and 90\% equi-tailed credible intervals for the corresponding AE-drug pair specific $\{\lambda_{ij}\}$, with different colors indicating the results from different statin drugs. The red dotted vertical line represents the value '1'. The texts on the right provide the number of observations as well as the null baseline expected counts under independence for an AE-drug pair. # References Tan Y, Markatou M and Chakraborty S. Flexible Empirical Bayesian Approaches to Pharmacovigilance for Simultaneous Signal Detection and Signal Strength Estimation in Spontaneous Reporting Systems Data. *Statistics in Medicine*. 2025; 44: 18-19, https://doi.org/10.1002/sim.70195. Tan Y, Markatou M and Chakraborty S. pvEBayes: An R Package for Empirical Bayes Methods in Pharmacovigilance. *arXiv*:2512.01057 (stat.AP). https://doi.org/10.48550/arXiv.2512.01057 Koenker R, Mizera I. Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules. *Journal of the American Statistical Association* 2014; 109(506): 674–685, https://doi.org/10.1080/01621459.2013.869224 Efron B. Empirical Bayes Deconvolution Estimates. *Biometrika* 2016; 103(1); 1-20, https://doi.org/10.1093/biomet/asv068 DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. *The American Statistician*. 1999; 1;53(3):177-90.