--- title: "Getting Started with sssvcqr" author: "Houjian Hou" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Getting Started with sssvcqr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center") options(prompt = "R> ", continue = "+ ", width = 80, useFancyQuotes = FALSE) ``` ## What this vignette covers This vignette walks through the core workflow of `sssvcqr` on a small synthetic example: simulating a global--local quantile regression problem, fitting the SS-SVCQR estimator, inspecting the global-versus-local selection, querying the fitted object through the standard R model methods, predicting at new locations, tuning the penalties with spatially blocked cross-validation, and reading the first-order KKT diagnostics. **The model.** SS-SVCQR fits the conditional quantile $$ Q_\tau(y_i \mid z_i, x_i, u_i) = z_i^\top \alpha + \sum_{j=1}^p x_{ij}\{\beta_{G,j} + \delta_j(u_i)\}, $$ with global controls `Z`, candidate spatially varying covariates `X`, coordinates `u`, a group penalty that may shrink an entire deviation vector $\delta_j$ exactly to zero, and a graph-Laplacian smoothness penalty over a $k$-nearest-neighbor proximity graph. The companion JSS software paper covers the algorithm and case study in detail; this vignette is the short interactive walkthrough. ## Simulating data with known structure `simulate_sssvcqr_data()` returns synthetic data on the unit square with a known true active set, true global-baseline coefficients, and known deviation fields. The default `p = 3` design has `x1` and `x3` active and `x2` globally constant. ```{r simulate} library("sssvcqr") dat <- simulate_sssvcqr_data(n = 120, q = 2, p = 3, seed = 20260505) str(dat[c("y", "Z", "X", "u", "active", "alpha_true", "beta_G_true")], max.level = 1) dat$active ``` ## Fitting the model `ss_svcqr()` takes an explicit matrix interface so that the analyst's `Z`/`X` partition is unambiguous. The penalties `lambda1` and `lambda2` control sparsity (group selection) and graph smoothness, respectively. ```{r fit} fit <- ss_svcqr( y = dat$y, Z = dat$Z, X = dat$X, u = dat$u, tau = 0.5, lambda1 = 5, lambda2 = 0.1, k_nn = 8, control = list(max_iter = 180, warn_nonconvergence = FALSE) ) fit ``` ## Selection and global--local summary The fitted summary prints the model size, tuning parameters, ADMM convergence status, the global block, and the L2 norms of each estimated deviation field. The deviation norms are the visual signature of global-versus-local selection: the second deviation field is exactly zero because the group penalty has correctly identified `x2` as global. ```{r summary} summary(fit) selection_recovery_table(fit, dat) ``` ## Inspecting fitted values, residuals, and coefficients Standard R methods are wired through the `sssvcqr` class so that fitted values and residuals can be extracted directly, and `coef()` returns a list with the global block and the deviation matrix. ```{r methods} round(unlist(coef(fit)[1:2]), 4) head(round(fitted(fit), 4)) head(round(residuals(fit), 4)) ``` ## Visualizing the fit The `plot()` method renders the fitted deviation field, the local total coefficient surface, residuals, or ADMM convergence traces. By default the spatial plots use an inverse-distance-weighted gridded surface over the first two coordinate columns, with a diverging color scale. ```{r plots, fig.width = 7, fig.height = 6} plot(fit, type = "deviation", index = 1) plot(fit, type = "coefficient", index = 1) plot(fit, type = "convergence") ``` ## Predicting at new locations `predict()` either returns fitted conditional quantiles (the default) or local coefficient surfaces (`type = "coefficients"`). For new locations the deviation is extrapolated by inverse-distance-weighted averaging of the $k$ nearest training deviations. ```{r predict} unew <- dat$u[1:3, , drop = FALSE] + 0.01 round(predict(fit, Znew = dat$Z[1:3, , drop = FALSE], Xnew = dat$X[1:3, , drop = FALSE], unew = unew), 4) round(predict(fit, type = "coefficients")[1:3, ], 4) ``` ## Tuning by spatially blocked cross-validation `cv_ss_svcqr()` evaluates a $(\lambda_1,\lambda_2)$ grid using spatial folds (k-means clusters on the coordinate matrix by default), so that nearby observations are not split between training and validation. The example below uses a small grid and three folds for speed; empirical applications should use a broader grid and more iterations. ```{r cv} dat_cv <- simulate_sssvcqr_data(n = 80, q = 2, p = 3, seed = 20260505) cv <- cv_ss_svcqr( y = dat_cv$y, Z = dat_cv$Z, X = dat_cv$X, u = dat_cv$u, tau = 0.5, lambda1_seq = c(1, 2), lambda2_seq = c(0.5, 1), K_folds = 3, adaptive_weights = FALSE, control = list(max_iter = 25, warn_nonconvergence = FALSE) ) cv cv$best ``` ## First-order KKT diagnostics `kkt_sssvcqr()` reports the first-order conditions of the SS-SVCQR objective: gradient norms for the unpenalized global block, stationarity residuals for nonzero deviation groups, zero-group margins for groups shrunk to zero, and the maximum violation of the per-component degree-weighted centering constraints. Centering violations should be at the level of solver round-off when the fit has converged. ```{r kkt} kkt <- kkt_sssvcqr(dat$y, dat$Z, dat$X, fit) signif(kkt$max_violation, 3) signif(kkt$max_centering_violation, 3) ``` ## Where to look next - The companion vignette *Lucas County housing example* mirrors the case study in the JSS software paper. - The replication directory at the project root reproduces every numerical table and figure in the JSS manuscript. - The function help pages (`?ss_svcqr`, `?cv_ss_svcqr`, `?build_graph_laplacian`, `?kkt_sssvcqr`) document each argument and return value in full.