--- title: "Residual diagnostics, Tukey style" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Residual diagnostics, Tukey style} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7) ``` ```{r setup, message = FALSE} library(shewhartr) ``` Every Shewhart chart silently makes assumptions about its residuals: that they are independent (not autocorrelated), approximately normal (or at least symmetric and unimodal), with constant variance over time. When the assumptions hold, the chart's nominal false-alarm rates apply. When they don't, the chart can be either too lenient or too strict, and either way you cannot tell from the chart itself. `shewhart_diagnostics()` produces a five-panel residual diagnostic display. It is a working tool, not a polished figure: the point is to surface the assumptions, not to print them. ## What the panels show ```{r, eval = FALSE} fit <- shewhart_i_mr(bottle_fill, value = ml, index = observation) shewhart_diagnostics(fit) ``` | Panel | Reads as | |----------------------|-----------------------------------------------| | Residuals vs. fitted | Trend or non-constant variance in residuals? | | Normal Q-Q | Heavy tails, skew, gross departures from N | | ACF | Are residuals correlated in time? | | Moving range | Drift in dispersion? | | Histogram | Symmetry and unimodality | The Q-Q and histogram address normality. The ACF plot and the moving-range trace address independence and stationarity. The residuals-vs-fitted plot catches non-linearity and heteroscedasticity. This panel is what John Tukey called the *exploratory* phase of any analysis: before you trust an answer, look at the working. Box's remark applies — "all models are wrong" — but the residual panel tells you *how* wrong, and whether the wrongness matters. ## A diagnosis-driven fix: Box-Cox If the histogram or Q-Q reveals strong skew, the chart's nominal limits won't give the advertised coverage. The classical fix is a Box-Cox transformation: ```{r, eval = FALSE} bc <- shewhart_box_cox(bottle_fill, value = ml) print(bc) ggplot2::autoplot(bc) ``` The maximum-likelihood lambda and its 95% CI are returned. If 1 falls inside the CI, no transformation is needed; if 0 does, take logs; otherwise apply $y^\lambda$ to the data and re-fit the chart. `shewhart_regression(model = "auto")` calls `shewhart_box_cox()` internally and picks among `linear`, `log`, `loglog` based on the profile-likelihood maximiser. For full control, run the diagnostic yourself and pass `model` (or `formula`) explicitly. ## When non-normality is a feature, not a bug Counts and proportions are non-normal *by construction*; that is why attributes charts use Binomial / Poisson limits, not transforms. Diagnostics on a c-chart's residuals will show discrete jumps and non-normal tails by construction; that is fine. The statistical honesty here is to use the right distribution from the start, not to transform until residuals look normal. ## References - Tukey, J. W. (1977). *Exploratory Data Analysis*. Addison-Wesley. - Box, G. E. P., Hunter, W. G., & Hunter, J. S. (2005). *Statistics for Experimenters* (2nd ed.). Wiley. - Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. *Journal of the Royal Statistical Society B*, 26(2), 211-252. - Atkinson, A. C. (1985). *Plots, Transformations and Regression*. Oxford.