--- title: "Comparing changepoint methods with ggchangepoint" output: rmarkdown::html_vignette bibliography: vignette_reference.bib vignette: > %\VignetteIndexEntry{Comparing changepoint methods with ggchangepoint} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 6, fig.alt = "ggchangepoint plot comparing changepoint detection methods on a time series" ) library(ggchangepoint) library(ggplot2) theme_set(theme_light()) # `fpop` lives in Suggests; fall back to core engines when it is unavailable # so the vignette builds in any environment. has_fpop <- requireNamespace("fpop", quietly = TRUE) cmp_methods <- if (has_fpop) c("pelt", "binseg", "fpop") else c("pelt", "binseg", "amoc") ``` ## Introduction The companion vignette `vignette("introduction", package = "ggchangepoint")` provides a comprehensive overview of ggchangepoint's design, API, and supported methods. This vignette focuses on **method comparison**---the problem of running multiple detectors on the same data, comparing their outputs, and measuring their accuracy when ground truth is known. ## Problem setup Let $y_{1:n}$ be an ordered sequence. A segmentation with $m$ changepoints is an ordered set $\tau_{0:m+1}$ with $0=\tau_0<\tau_1<\dots<\tau_m<\tau_{m+1}=n$, partitioning the data into $m+1$ contiguous segments. Within each segment the data are governed by a parameter $\theta_j$ (e.g. mean, variance, or distribution) that changes at each $\tau_j$ [@truong2020selective]. Most offline methods minimise a penalised cost $$ \sum_{j=1}^{m+1} \mathcal{C}(y_{(\tau_{j-1}+1):\tau_j}) + \beta f(m), $$ where $\mathcal{C}(\cdot)$ is a segment cost (typically $-2 \times$ maximised log-likelihood) and $\beta f(m)$ guards against over-segmentation [@yao1988estimating]. The choice of penalty $\beta$ is the central challenge of the field. ## Taxonomy of methods in v0.2.0 The methods available through `cpt_detect()` in this release fall into three broad families: | Family | Methods | Approach | |--------|---------|----------| | Exact / pruned optimal partitioning | PELT, BinSeg, SegNeigh, AMOC, FPOP | Dynamic programming with pruning [@killick2012pelt; @maidstone2017optimal] | | Randomised and multiscale search | WBS, WBS2, NOT, MOSUM, IDetect, TGUH | Random intervals or moving windows [@fryzlewicz2014wild; @fryzlewicz2020detecting; @baranowski2019narrowest; @eichinger2018mosum; @anastasiou2022idetect; @fryzlewicz2022tail] | | Nonparametric / distribution-free | NP (changepoint.np), ECP (ecp) | Empirical distribution or energy distance [@haynes2017computationally; @james2014ecp] | The `ggcpt_compare()` function runs multiple methods on the same series and arranges the results. It respects the `future::plan()` for optional parallel execution when many methods are compared. ## Comparing methods on a simulated series Generate a three-regime series with changes in mean and variance: ```{r} set.seed(2022) x <- c(rnorm(100, 0, 1), rnorm(100, 10, 1), rnorm(100, 5, 2)) ``` The faceted layout shows each method in its own panel, with x-axes aligned: ```{r fig-1} ggcpt_compare(x, methods = cmp_methods, layout = "facet") ``` The overlay layout draws all changepoints on a single panel, colour-coded: ```{r fig-2} ggcpt_compare(x, methods = cmp_methods, layout = "overlay") ``` For a numeric summary, `ggcpt_compare_table()` returns a tidy tibble: ```{r} ggcpt_compare_table(x, methods = cmp_methods) ``` ## Accuracy metrics When true changepoint locations are known, `cpt_metrics()` provides a standard suite of accuracy measures [@truong2020selective; @van2020evaluation]: ```{r} truth <- c(100, 200) # PELT res_pelt <- cpt_detect(x, method = "pelt", change_in = "meanvar") generics::tidy(res_pelt)$cp # Metrics for BinSeg res_binseg <- cpt_detect(x, method = "binseg", change_in = "meanvar") cpt_metrics(generics::tidy(res_binseg)$cp, truth, n = length(x)) ``` ### Visual evaluation `ggcpt_eval()` overlays predictions and ground truth, colouring true positives, false positives, and misses within a tolerance margin: ```{r fig-3} ggcpt_eval(pred = c(100), truth = c(100, 200), data_vec = x) ``` ## Evaluation study on canonical signals We now conduct a systematic comparison across the five built-in test signals [@donoho1994ideal], using a tolerance margin of 5 observations. ```{r eval-study} set.seed(42) signals <- list( blocks = signal_blocks(512), fms = signal_fms(512), mix = signal_mix(512), teeth = signal_teeth(512), stairs = signal_stairs(512) ) methods <- c("pelt", "binseg", "fpop", "wbs", "not") margin <- 5 results <- do.call(rbind, lapply(names(signals), function(nm) { sig <- signals[[nm]] truth <- attr(sig, "true_changepoints") do.call(rbind, lapply(methods, function(m) { res <- tryCatch( cpt_detect(sig$value, method = m, change_in = "mean"), error = function(e) NULL ) if (is.null(res)) return(NULL) pred <- generics::tidy(res)$cp met <- cpt_metrics(pred, truth, n = length(sig$value), margin = margin) data.frame(signal = nm, method = m, met) })) })) results[, c("signal", "method", "precision", "recall", "f1", "covering", "hausdorff")] ``` The table shows how different methods perform across signal types. PELT and FPOP, as exact methods, generally have strong performance, while WBS and NOT can detect small or short features that the exact methods miss at the default penalty setting---but may also over-segment. ### Interpreting the metrics - **Precision / Recall / F1**: standard classification measures with a margin of tolerance. - **Covering**: the average Jaccard overlap between true and predicted segments [@van2020evaluation]. - **Hausdorff distance**: worst-case localisation error between predicted and true changepoint sets. ## Full workflow: simulate, detect, evaluate A complete reproducible workflow combining the simulator, detection, and evaluation: ```{r} set.seed(1) dat <- cpt_simulate(200, changepoints = c(60, 120), change_in = "mean", params = c(0, 8, 3), sd = 1) truth <- attr(dat, "true_changepoints") res <- cpt_detect(dat$value, method = "pelt", change_in = "mean") pred <- generics::tidy(res)$cp cpt_metrics(pred, truth, n = length(dat$value)) ``` ## Multi-annotator evaluation When multiple ground-truth annotations are available---as in the Turing Change Point Dataset [@van2020evaluation]---use `cpt_metrics_annotated()` to average metrics across annotators: ```{r} cpt_metrics_annotated(pred = c(100, 200), annotations = list(c(100, 200), c(105, 198)), n = 300) ``` ## Parallel comparison (optional) If the `future` and `future.apply` packages are installed, `ggcpt_compare()` and the evaluation loop can run methods in parallel by setting a `future::plan()`: ```{r eval=FALSE} future::plan(future::multisession, workers = 2) ggcpt_compare(x, methods = c("pelt", "binseg", "wbs", "fpop", "not"), seed = 1) ``` Parallel execution is reproducible: when a `seed` is supplied, the detectors are fanned out over parallel-safe L'Ecuyer-CMRG random streams, so the result is identical regardless of worker count or backend. ## See also - `vignette("introduction", package = "ggchangepoint")` for a comprehensive walkthrough of the package API and supported methods. - The R package references for the individual detection engines: **changepoint** [@killick2014changepoint], **wbs** [@fryzlewicz2014wild], **not** [@baranowski2019narrowest], **mosum** [@eichinger2018mosum], **fpop** [@maidstone2017optimal], **IDetect** [@anastasiou2022idetect], **breakfast** [@fryzlewicz2020detecting; @fryzlewicz2022tail], **changepoint.np** [@haynes2017computationally], and **ecp** [@james2014ecp]. ## References