--- title: "Variables charts: I-MR, Xbar-R, Xbar-S" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Variables charts: I-MR, Xbar-R, Xbar-S} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.2 ) ``` ```{r setup, message = FALSE} library(shewhartr) library(dplyr) library(ggplot2) ``` This vignette covers the three classical Shewhart charts for continuous measurements (variables): I-MR for individual observations, Xbar-R for small rational subgroups, and Xbar-S for larger or unequal-sized subgroups. The choice between them depends entirely on the structure of your data. ## When to use which | Situation | Chart | |--------------------------------------------------------|----------------------| | Single measurements, no rational subgroup | `shewhart_i_mr()` | | Subgroups of size 2-10, equal sizes | `shewhart_xbar_r()` | | Subgroups of size > 10, or unequal sizes | `shewhart_xbar_s()` | The reason for the cutoff at 10 is that for small subgroup sizes the range estimator of sigma is more efficient (in the Fisher-information sense) than the standard deviation estimator; from about $n = 10$ onwards the standard deviation overtakes (Montgomery 2019, Section 6.4.1). ## I-MR `bottle_fill` is a 100-point series of fill volumes. The default estimator of sigma is the moving range: $$ \hat{\sigma} = \frac{\overline{\mathrm{MR}}}{d_2(2)} = \frac{\overline{\mathrm{MR}}}{1.128}. $$ ```{r} fit <- shewhart_i_mr(bottle_fill, value = ml, index = observation) broom::glance(fit) ``` ```{r, eval = FALSE} autoplot(fit) ``` ### Robust alternatives The mean of the moving range is sensitive to outliers. Three robust alternatives are built in: ```{r} fit_med <- shewhart_i_mr(bottle_fill, value = ml, sigma_method = "median_mr") fit_bw <- shewhart_i_mr(bottle_fill, value = ml, sigma_method = "biweight") fit_sd <- shewhart_i_mr(bottle_fill, value = ml, sigma_method = "sd") dplyr::bind_rows( broom::glance(fit) |> mutate(method = "mr (default)"), broom::glance(fit_med) |> mutate(method = "median_mr"), broom::glance(fit_bw) |> mutate(method = "biweight"), broom::glance(fit_sd) |> mutate(method = "sd") ) |> select(method, sigma_hat, n_violations) ``` The biweight is Tukey's M-estimator with a redescending influence function (Tukey 1977; Mosteller & Tukey 1977). Use it when you suspect a small fraction of contaminated observations and don't want them to inflate sigma. ## Xbar-R `tablet_weight` records 25 batches of 5 tablets each. ```{r} fit_xbar <- shewhart_xbar_r(tablet_weight, value = weight, subgroup = subgroup) broom::tidy(fit_xbar) ``` The Xbar limits use the constant $A_2(5) = 0.577$; the R limits use $D_3(5) = 0$ and $D_4(5) = 2.114$. These constants come from Montgomery (2019), Appendix VI, and are exposed via `shewhart_constants()`: ```{r} shewhart_constants(c(2, 5, 10, 25)) ``` `shewhart_xbar_r()` insists on equal subgroup sizes (otherwise the constants don't apply uniformly). For unequal sizes use `shewhart_xbar_s()` with `sigma_method = "pooled_sd"`. ## Xbar-S The S chart uses the bias-corrected sample standard deviation. Sigma is estimated as $$ \hat{\sigma} = \frac{\bar{S}}{c_4(n)}, \qquad c_4(n) = \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma(n/2)}{\Gamma((n-1)/2)}. $$ ```{r} # Same data, but using S instead of R (preferred for n > 10) n_per <- 5 df <- tablet_weight |> group_by(subgroup) |> filter(n() == n_per) |> # in case of any incomplete batches ungroup() fit_xbs <- shewhart_xbar_s(df, value = weight, subgroup = subgroup) broom::glance(fit_xbs) ``` For variable subgroup sizes: ```{r, eval = FALSE} fit_xbs_pooled <- shewhart_xbar_s(df, value = weight, subgroup = subgroup, sigma_method = "pooled_sd") ``` ## Reading the results Every variables chart returns a `shewhart_chart` object with the following per-observation columns (`broom::augment()`): * `.value` — the plotted statistic (individual, Xbar, etc.) * `.center`, `.sigma`, `.upper`, `.lower` — control-line series * For two-panel charts, additional `.mr*`, `.r*`, `.s*` columns for the lower panel * `.flag_` and `.flag_any` — TRUE/FALSE for each runs rule configured The `rules` argument controls which rules are flagged. By default, Nelson 1 (point beyond 3 sigma) and Nelson 2 (9 points same side) are checked, and we recommend keeping at least these two. Add more with: ```{r} fit_full <- shewhart_i_mr(bottle_fill, value = ml, rules = c("nelson_1_beyond_3s", "nelson_2_nine_same", "nelson_3_six_trend", "nelson_5_two_of_three", "nelson_6_four_of_five")) fit_full$violations ``` Each additional rule increases the chart's sensitivity at the cost of shorter in-control ARL. See `vignette("arl-simulation")` for how to quantify this trade-off. ## References * Montgomery, D. C. (2019). *Introduction to Statistical Quality Control* (8th ed.). Wiley. Chapter 6. * Tukey, J. W. (1977). *Exploratory Data Analysis*. Addison-Wesley. * Mosteller, F., & Tukey, J. W. (1977). *Data Analysis and Regression*. Addison-Wesley. * Wheeler, D. J., & Chambers, D. S. (1992). *Understanding Statistical Process Control* (2nd ed.). SPC Press. * Cryer, J. D., & Ryan, T. P. (1990). The Estimation of Sigma for an X Chart: $\bar{MR}/d_2$ or $S/c_4$? *Journal of Quality Technology*, 22(3), 187-192.