--- title: "Tweedie likelihood computations" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tweedie likelihood computations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Tweedie distributions Tweedie distributions are exponential dispersion models, with a mean $\mu$ and a variance $\phi \mu^\xi$, for some dispersion parameter $\phi > 0$ and a variance-power index $\xi$ [@tweedie1984], that is sometimes called $p$, that uniquely defines the distribution within the Tweedie family (for all real values of $\xi$ **not** between 0 and 1). Special cases of the Tweedie distributions are: * the *normal* distribution, with $\xi = 0$ (i.e., the variance is $\phi$ and not related to the mean); * the *Poisson* distribution, with $\xi = 1$ and $\phi = 1$ (i.e., the variance is the same as the mean); * the *gamma* distribution, with $\xi = 2$; and * the *inverse Gaussian* distribution, with $\xi = 3$. For all other values of $\xi$, the probability functions and distribution functions have no closed forms. | Variance power, $\xi$ | Distribution | Support | |:---------------------------:|:----------------:|:-----------------------------:| | $\xi = 0$ | Normal | $(-\infty, \infty)$ | | $0 < \xi < 1$ | Do not exist | | | $\xi = 1$ (with $\phi = 1)$ | Poisson | Discrete: $(0, 1, 2, \dots)$ | | $1 < \xi < 2$ | Poisson--gamma | $[ 0, \infty)$ | | $\xi = 2$ | Gamma | $(0, \infty)$ | | $\xi > 2$ | Skewed right | $(0, \infty$) | | $\xi = 3$ | inverse Gaussian | $(0, \infty)$ | For $\xi < 1$, applications are limited (non-existent so far?), but have support on the entire real line and $\mu > 0$. For $1 < \xi < 2$, Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for $Y > 0$ but have a discrete mass at $Y = 0$. ## Use cases Likelihood computations are not need to fit a Tweedie generalized linear model, using the `tweedie()` family from the [**statmod** package](https://CRAN.R-project.org/package=statmod) (@smythStatmodStatisticalModeling2025, @dunnGeneralizedLinearModels2018): ```{r TWexample} library(tweedie) library(statmod) # For the tweedie() family function, for use in glm() set.seed(96) N <- 25 # Mean of the Poisson (lambda) and Gamma (shape/scale) lambda <- 1.5 # Generating Compound Poisson-Gamma data manually y <- replicate(N, { n_events <- rpois(1, lambda = lambda) if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1)) }) mod.tw <- glm(y ~ 1, family = statmod::tweedie(var.power = 1.5, link.power = 0) ) # link.power = 0 means the log-link ``` However, likelihood computations are necessary in other situations. Since likelihoods cannot be computed analytically apart from special cases, the **tweedie** package computes Tweedie densities using numerical methods (@dunnSeriesEvaluationTweedie2005, @dunnEvaluationTweedieExponential2008). ### Generating random numbers Random numbers from a Tweedie distribution are generated using **rtweedie()**: ```{r TWrandom} tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1) ``` ### Plotting density and probability functions Accurate density functions are generated using **dtweedie()**: ```{r TWplotsPDF} y <- seq(0, 2, length = 100) xi <- 1.1 mu <- 0.5 phi <- 0.4 twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi) twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi) plot( twden[y > 0] ~ y[y > 0], xlab = expression(italic(y)), ylab = "Density function", type ="l", lwd = 2, las = 1) points(twden[y==0] ~ y[y == 0], lwd = 2, pch = 19) ``` Accurate probability functions are generated using **ptweedie()**: ```{r TWplotsCDF} plot(twdtn ~ y, xlab = expression(italic(y)), ylab = "Distribution function", type = "l", las = 1, lwd = 2, las = 1, ylim = c(0, 1) ) points(twdtn[y==0] ~ y[y == 0], lwd = 2, pch = 19) ``` However, the function **tweedie_plot()** is sometimes more convenient (especially for $1 < \xi \ < 2$, when a probability mass at $Y = 0$ is present): ```{r TWplotsPDF2} tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, ylab = "Density function", xlab = expression(italic(y)), las = 1, lwd = 2) ``` ```{r TWplotsCDF2} tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, ylab = "Distribution function", xlab = expression(italic(y)), las = 1, lwd = 2, ylim = c(0, 1), type = "cdf") ``` ### Computing the quantile residuals Quantile residuals [@dunnRandomizedQuantileResiduals1996] can be computed to assess the fit of a glm. The function **qresid()** is in the [**statmod** package](https://CRAN.R-project.org/package=statmod) (@smythStatmodStatisticalModeling2025, @dunnGeneralizedLinearModels2018): ```{r TWqqplot} library(tweedie) qqnorm( qr <- statmod::qresid(mod.tw), las = 1) qqline(qr, col = "grey") ``` The quantile residuals suggest the model may not be adequate. ### Estimating $\xi$ To use a Tweedie distribution in a glm, the value of $\xi$ is needed. To find the most suitable value of $\xi$, **tweedie_profile()** can be used [@dunnGeneralizedLinearModels2018]: ```{r TWprofile} out <- tweedie::tweedie_profile(y ~ 1, xi.vec = seq(1.2, 1.8, by = 0.05), do.plot = TRUE) # The estimated power index: out$xi.max ``` ## Example Consider the `poison` data in the `GLMsData` package [@dunnGLMsData], which records the survival times of animals under various treatments and poisons (used in @dunnGeneralizedLinearModels2018, Sect. 12.4.2). For convenience, a copy is included with this package: ```{r PSNLoad} poison <- read.csv(system.file("extdata", "poison.csv", package = "tweedie")) # Convert to factors: poison$Psn <- factor(poison$Psn) poison$Trmt <- factor(poison$Trmt) head(poison) ``` The survival time is likely related to the poison type and type of treatment: ```{r PSNPlotPsn} plot(Time ~ Psn, data = poison, xlab = "Poison type", ylab = "Survival time (tens of hours)", las = 1) ``` ```{r PSNPlotTmt} plot(Time ~ Trmt, data = poison, xlab = "Treatment type", ylab = "Survival time (tens of hours)", las = 1) ``` In both cases, the variance of the survival times seems to increase with the mean survival time. Perhaps a Tweedie glm would be appropriate. To fit a Tweedie glm, the value of $\xi$ is needed; since no exact zeros are present (or possible) for the survival times, very likely $\xi \ge 2$: ```{r PSNProfile} PSNPrf <- tweedie_profile(Time ~ Trmt + Psn, data = poison, do.plot = TRUE, xi.vec = seq(2.5, 5.5, length = 11) ) ``` Then, $\xi\approx 4$: ```{r PSNXi} PSNPrf$xi.max ``` So then the Tweedie glm can be fitted: ```{r PSNModel} PSNMod <- glm(Time ~ Trmt + Psn, data = poison, family=statmod::tweedie(var = 4, link.power = 0)) anova(PSNMod, test = "F") ``` Quantile residuals can be used to assess the model: ```{r PSNRes} qr <- statmod::qresid(PSNMod) qqnorm(qr, las = 1) qqline(qr, col = "grey") ``` The quantile residuals suggest that this model looks fine. ## References