--- title: "MSM Identification and Recovery in tidyILD" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MSM Identification and Recovery in tidyILD} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## Why this vignette exists This vignette documents the **identification assumptions** behind the MSM/IPW workflow and shows how to run the causal recovery harness added for regression testing and simulation-based checks. ## Identification assumptions In this workflow, interpretation of weighted outcome contrasts depends on: 1. **Sequential exchangeability**: all confounders needed for treatment assignment at each \(t\) are captured in the history set used for IPTW. 2. **Positivity / overlap**: treatment probabilities are bounded away from 0 and 1 in relevant strata. 3. **Consistency**: observed outcomes under observed treatment history equal potential outcomes under that same history. 4. **Correct weight models**: treatment and censoring models are correctly specified. Use diagnostics to stress-test these assumptions: - `ild_msm_balance()` for weighted SMD checks; - `ild_ipw_ess()` for effective sample size; - `ild_msm_overlap_plot()` for propensity overlap; - `ild_diagnose(..., balance = TRUE, ...)` for integrated causal diagnostics + guardrails. ## Estimand-first + history-builder workflow (v1) ```{r eval = FALSE} library(tidyILD) d <- ild_msm_simulate_scenario(n_id = 100, n_obs_per = 12, true_ate = 0.5, seed = 101) d <- ild_center(d, y) hist_spec <- ild_msm_history_spec(vars = c("stress", "trt"), lags = 1:2) d <- ild_build_msm_history(d, hist_spec) estimand <- ild_msm_estimand(type = "ate", regime = "static", treatment = "trt") fit_obj <- ild_msm_fit( estimand = estimand, data = d, outcome_formula = y ~ y_bp + y_wp + stress + trt + (1 | id), history = ~ stress_lag1 + trt_lag1, predictors_censor = "stress", inference = "bootstrap", n_boot = 200, strict_inference = FALSE ) fit_obj fit_obj$inference$status fit_obj$inference$reason ``` ## Recovery harness ```{r eval = FALSE} rec <- ild_msm_recovery( n_sim = 100, n_id = 120, n_obs_per = 12, true_ate = 0.5, n_boot = 200, inference = "bootstrap", seed = 1001, censoring = TRUE ) rec$summary rec$summary_by_scenario ``` Scenario-grid validation (positivity stress and treatment-model misspecification): ```{r eval = FALSE} grid <- tibble::tibble( scenario_id = c("baseline", "positivity_stress", "misspecified_treatment"), positivity_stress = c(1, 1.8, 1), misspec_treatment_model = c(FALSE, FALSE, TRUE) ) rec_grid <- ild_msm_recovery( n_sim = 50, n_id = 120, n_obs_per = 12, true_ate = 0.5, n_boot = 200, inference = "bootstrap", scenario_grid = grid, seed = 1101 ) rec_grid$summary_by_scenario ``` Interpretation: - `bias` and `rmse` target point-estimate recovery; - `coverage` targets interval calibration under the chosen inference mode; - `ess_mean` / `ess_min` and `weight_ratio_median` summarize positivity stress. ## Inference caveats and strict mode - `inference = "robust"` can degrade on weighted `lmer` paths where robust variance is not supported. - `ild_msm_fit()` records this explicitly in: - `fit_obj$inference$status` (`"ok"`, `"degraded"`, `"unsupported"`), - `fit_obj$inference$reason` (machine-readable reason code), - `fit_obj$inference$message` (user-facing explanation). - Set `strict_inference = TRUE` to error instead of degrading. - Use `ild_msm_bootstrap(..., weight_policy = "reestimate_weights")` when you want first-stage weight uncertainty represented in intervals. ## Notes on v1 scope - v1.1 estimand schema accepts static and dynamic regime specs, but dynamic weighting is still scaffold-only in `ild_msm_fit` and will report degraded status unless strict mode is enabled. - Joint Bayesian MSM estimation is out of scope in v1 (see `?ild_msm_inference`).