--- title: "Lucas County Housing Example" author: "Houjian Hou" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Lucas County Housing Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center") options(prompt = "R> ", continue = "+ ", width = 80, useFancyQuotes = FALSE) ``` ## Scope This vignette walks through the full SS-SVCQR workflow on the small Lucas County housing subset shipped with the package (`lucas_housing_sample`, $n = 150$). The aim is to make every step of the case study reproducible without having to download external data: the model is the same one used in the JSS software paper but on a subset rather than the full $n = 25{,}357$ panel. The replication script at the project root reproduces the full-sample case study from `data/lucas_housing_clean.csv`. The hedonic interpretation is conventional: log sale price is regressed on global controls (log total living area, log lot size, sale-year indicators) and on candidate spatially varying age effects. Because the Lucas County region is geographically compact, longitude and latitude serve as a pragmatic spatial index for graph construction; for larger study regions, projected coordinates should be used. ## Loading the package subset ```{r data} library("sssvcqr") data("lucas_housing_sample") housing <- lucas_housing_sample str(housing) summary(housing[, c("log_price", "log_TLA", "log_lotsize", "age_scaled")]) ``` ## Assembling `y`, `Z`, `X`, and coordinates The matrix interface separates always-global controls (`Z`) from candidate spatially varying covariates (`X`). The coordinates (`u`) drive both the proximity graph and the location-indexed deviation fields. ```{r assemble} y <- housing$log_price Z <- model.matrix(~ log_TLA + log_lotsize + sale_year, data = housing) X <- as.matrix(housing[, c("age_scaled", "age2_scaled")]) u <- scale(as.matrix(housing[, c("longitude", "latitude")])) c(n = nrow(housing), q = ncol(Z), p = ncol(X)) ``` ## Inspecting graph choices It is good practice to query the graph before fitting, both to check that the proximity structure is connected and to inspect the degree distribution. `build_graph_laplacian()` returns the sparse adjacency, the degree vector, the chosen Laplacian, and component membership. ```{r graph} graph <- build_graph_laplacian(u, k = 8L) length(graph$components_list) range(graph$D_vec) ``` ## Fitting SS-SVCQR The fit below uses fixed penalties for speed; the JSS software paper tunes $(\lambda_1, \lambda_2)$ by spatially blocked cross-validation on the full sample. Tighter ADMM tolerances are appropriate for the moderate sample sizes encountered in this vignette. ```{r fit} fit <- ss_svcqr( y = y, Z = Z, X = X, u = u, tau = 0.5, lambda1 = 3, lambda2 = 1, k_nn = 8, control = list(max_iter = 100, warn_nonconvergence = FALSE) ) summary(fit) ``` The deviation L2 norms make the global-versus-local decision explicit: a norm at exact zero means the group penalty has classified the corresponding candidate as global. ## Visualizing the fitted local coefficient For a single candidate effect, the package's `plot()` method renders the local total coefficient surface over the first two coordinate columns. Inverse-distance-weighted interpolation gives a smooth visual summary; the observed locations are overlaid as small reference marks. ```{r plot, fig.width = 7, fig.height = 6} plot(fit, type = "coefficient", index = 1) ``` ## Spatially blocked cross-validation For real analyses, the penalties should be tuned rather than fixed by hand. The example below evaluates a small grid with three spatial folds on this subset; empirical applications should use a broader grid and more iterations. ```{r cv} cv <- cv_ss_svcqr( y = y, Z = Z, X = X, u = u, tau = 0.5, lambda1_seq = c(2, 3), lambda2_seq = c(0.5, 1), K_folds = 3, adaptive_weights = FALSE, control = list(max_iter = 25, warn_nonconvergence = FALSE) ) cv ``` ## KKT diagnostics `kkt_sssvcqr()` returns first-order optimality summaries and the maximum violation of the per-component degree-weighted centering constraints. Both should be small after a converged fit. ```{r kkt} kkt <- kkt_sssvcqr(y, Z, X, fit) signif(kkt$max_violation, 3) signif(kkt$max_centering_violation, 3) ``` ## Predicting at new locations The `predict()` method returns fitted conditional quantiles when given new $Z$, $X$, and $u$ (`type = "response"`, the default), or local coefficient surfaces when called with `type = "coefficients"`. New-location deviations are extrapolated by inverse-distance-weighted averaging of the $k$ nearest training deviations. ```{r predict} unew <- u[1:3, , drop = FALSE] round(predict(fit, Znew = Z[1:3, , drop = FALSE], Xnew = X[1:3, , drop = FALSE], unew = unew), 3) round(predict(fit, type = "coefficients")[1:3, ], 3) ``` ## Where to look next - The companion vignette *Getting started with sssvcqr* covers the synthetic example with full prediction, plotting, CV, and diagnostic calls. - The replication directory at the project root reproduces the full-sample Lucas County case study used in the JSS software paper, including the all-six-coefficients map and the multi-quantile age-coefficient comparison. - Function help pages (`?ss_svcqr`, `?cv_ss_svcqr`, `?build_graph_laplacian`, `?kkt_sssvcqr`, `?predict.sssvcqr`) document each argument and return value.