--- title: "Optimizing Sequencing Resource Allocation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Optimizing Sequencing Resource Allocation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4, dev = "png") ``` ## Why allocation matters In genomic surveillance, the total number of sequences you can generate each week is fixed by lab capacity and budget. The question is not *how many* to sequence (phylosamp answers that), but *how to distribute* a fixed number across regions, institutions, and sample sources. Poor allocation wastes resources. If you sequence proportionally to submissions (which is what most systems do by default), you over-represent regions that send more samples --- which are often the same regions that already have the highest sequencing rates. ## The three objectives survinger supports three optimization objectives: - **`min_mse`**: Minimize the mean squared error of lineage prevalence estimates. This is a Neyman-type allocation that gives more sequences to high-variance strata. - **`max_detection`**: Maximize the probability of detecting a rare variant. This spreads sequences to maximize geographic coverage. - **`min_imbalance`**: Minimize the deviation from population-proportional representation. This ensures each region is fairly represented. ## Example ```{r example} library(survinger) data(sarscov2_surveillance) design <- surv_design( data = sarscov2_surveillance$sequences, strata = ~ region, sequencing_rate = sarscov2_surveillance$population[c("region", "seq_rate")], population = sarscov2_surveillance$population, source_type = "source_type" ) ``` ### Optimize for minimum MSE ```{r min-mse} alloc_mse <- surv_optimize_allocation(design, "min_mse", total_capacity = 500) print(alloc_mse) plot(alloc_mse) ``` ### Compare all strategies ```{r compare} comparison <- surv_compare_allocations(design, total_capacity = 500) print(comparison) ``` The table shows the trade-off: minimizing MSE may increase imbalance, while proportional allocation sacrifices detection power. ### With minimum coverage constraints ```{r floor} alloc_floor <- surv_optimize_allocation( design, "min_mse", total_capacity = 500, min_per_stratum = 20 ) print(alloc_floor) ``` Setting `min_per_stratum = 20` ensures every region gets at least 20 sequences, preventing any region from being invisible. ## Choosing an objective - Use **`min_mse`** when your primary goal is accurate prevalence tracking. - Use **`max_detection`** when hunting for rare emerging variants. - Use **`min_imbalance`** when equity and representativeness are priorities. In practice, reviewing the `surv_compare_allocations()` output helps stakeholders understand the trade-offs and choose based on their mandate.