--- title: "Delay-Adjusted Nowcasting" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Delay-Adjusted Nowcasting} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4, dev = "png") ``` ## The right-truncation problem From sample collection to sequence upload, there is a delay of typically 1--4 weeks. This means that when you look at the latest data, the most recent weeks are always incomplete --- not because fewer people were infected, but because results have not arrived yet. If you ignore this and plot raw counts, you see a false decline in the most recent weeks. This is called **right-truncation bias**. ## Estimating the delay distribution survinger fits a parametric delay distribution accounting for the fact that we can only observe delays shorter than the time elapsed since collection (right-truncation correction). ```{r delay-fit} library(survinger) data(sarscov2_surveillance) design <- surv_design( data = sarscov2_surveillance$sequences, strata = ~ region, sequencing_rate = sarscov2_surveillance$population[c("region", "seq_rate")], population = sarscov2_surveillance$population ) delay_fit <- surv_estimate_delay(design, distribution = "negbin") print(delay_fit) plot(delay_fit) ``` ## Reporting probability Given the fitted delay, we can ask: what fraction of sequences collected *d* days ago have been reported by now? ```{r report-prob} days <- c(7, 14, 21, 28) probs <- surv_reporting_probability(delay_fit, delta = days) data.frame(days_ago = days, prob_reported = round(probs, 3)) ``` Sequences collected 7 days ago may only be partially reported, while those from 28 days ago are nearly complete. ## Nowcasting Nowcasting inflates observed counts by dividing by the reporting probability, giving a better estimate of the true number: ```{r nowcast, fig.cap = "Observed (grey bars) vs nowcasted (orange line) counts for BA.2.86"} nowcast <- surv_nowcast_lineage(design, delay_fit, "BA.2.86") plot(nowcast) ``` The grey bars show what has been observed; the orange line shows the delay-corrected estimate. The gap is largest in the most recent weeks. ## Combined design + delay correction The main inference function applies both corrections simultaneously: ```{r adjusted} adjusted <- surv_adjusted_prevalence(design, delay_fit, "BA.2.86") print(adjusted) ``` The `mean_report_prob` column shows how complete each week's data is. Low values indicate that the delay correction is doing heavy lifting. ## Choosing a delay distribution - **`negbin`** (default): Handles overdispersion well. Recommended for most settings. - **`poisson`**: Use when delays are very regular (rare). - **`lognormal`**: Use when delays have a heavy right tail. - **`nonparametric`**: No distributional assumption. Use when you have enough data and suspect the parametric forms do not fit.