---
title: "Lucas County Housing Example"
author: "Houjian Hou"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Lucas County Housing Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 5, fig.align = "center")
options(prompt = "R> ", continue = "+  ", width = 80, useFancyQuotes = FALSE)
```

## Scope

This vignette walks through the full SS-SVCQR workflow on the small Lucas
County housing subset shipped with the package (`lucas_housing_sample`,
$n = 150$). The aim is to make every step of the case study reproducible
without having to download external data: the model is the same one used
in the JSS software paper but on a subset rather than the full $n =
25{,}357$ panel. The replication script at the project root reproduces
the full-sample case study from `data/lucas_housing_clean.csv`.

The hedonic interpretation is conventional: log sale price is regressed
on global controls (log total living area, log lot size, sale-year
indicators) and on candidate spatially varying age effects. Because the
Lucas County region is geographically compact, longitude and latitude
serve as a pragmatic spatial index for graph construction; for larger
study regions, projected coordinates should be used.

## Loading the package subset

```{r data}
library("sssvcqr")
data("lucas_housing_sample")
housing <- lucas_housing_sample
str(housing)
summary(housing[, c("log_price", "log_TLA", "log_lotsize", "age_scaled")])
```

## Assembling `y`, `Z`, `X`, and coordinates

The matrix interface separates always-global controls (`Z`) from
candidate spatially varying covariates (`X`). The coordinates (`u`) drive
both the proximity graph and the location-indexed deviation fields.

```{r assemble}
y <- housing$log_price
Z <- model.matrix(~ log_TLA + log_lotsize + sale_year, data = housing)
X <- as.matrix(housing[, c("age_scaled", "age2_scaled")])
u <- scale(as.matrix(housing[, c("longitude", "latitude")]))
c(n = nrow(housing), q = ncol(Z), p = ncol(X))
```

## Inspecting graph choices

It is good practice to query the graph before fitting, both to check
that the proximity structure is connected and to inspect the degree
distribution. `build_graph_laplacian()` returns the sparse adjacency,
the degree vector, the chosen Laplacian, and component membership.

```{r graph}
graph <- build_graph_laplacian(u, k = 8L)
length(graph$components_list)
range(graph$D_vec)
```

## Fitting SS-SVCQR

The fit below uses fixed penalties for speed; the JSS software paper
tunes $(\lambda_1, \lambda_2)$ by spatially blocked cross-validation on
the full sample. Tighter ADMM tolerances are appropriate for the
moderate sample sizes encountered in this vignette.

```{r fit}
fit <- ss_svcqr(
  y = y, Z = Z, X = X, u = u,
  tau = 0.5,
  lambda1 = 3, lambda2 = 1, k_nn = 8,
  control = list(max_iter = 100, warn_nonconvergence = FALSE)
)
summary(fit)
```

The deviation L2 norms make the global-versus-local decision explicit: a
norm at exact zero means the group penalty has classified the
corresponding candidate as global.

## Visualizing the fitted local coefficient

For a single candidate effect, the package's `plot()` method renders the
local total coefficient surface over the first two coordinate columns.
Inverse-distance-weighted interpolation gives a smooth visual summary;
the observed locations are overlaid as small reference marks.

```{r plot, fig.width = 7, fig.height = 6}
plot(fit, type = "coefficient", index = 1)
```

## Spatially blocked cross-validation

For real analyses, the penalties should be tuned rather than fixed by
hand. The example below evaluates a small grid with three spatial folds
on this subset; empirical applications should use a broader grid and
more iterations.

```{r cv}
cv <- cv_ss_svcqr(
  y = y, Z = Z, X = X, u = u,
  tau = 0.5,
  lambda1_seq = c(2, 3),
  lambda2_seq = c(0.5, 1),
  K_folds = 3, adaptive_weights = FALSE,
  control = list(max_iter = 25, warn_nonconvergence = FALSE)
)
cv
```

## KKT diagnostics

`kkt_sssvcqr()` returns first-order optimality summaries and the maximum
violation of the per-component degree-weighted centering constraints.
Both should be small after a converged fit.

```{r kkt}
kkt <- kkt_sssvcqr(y, Z, X, fit)
signif(kkt$max_violation, 3)
signif(kkt$max_centering_violation, 3)
```

## Predicting at new locations

The `predict()` method returns fitted conditional quantiles when given
new $Z$, $X$, and $u$ (`type = "response"`, the default), or local
coefficient surfaces when called with `type = "coefficients"`.
New-location deviations are extrapolated by inverse-distance-weighted
averaging of the $k$ nearest training deviations.

```{r predict}
unew <- u[1:3, , drop = FALSE]
round(predict(fit,
  Znew = Z[1:3, , drop = FALSE],
  Xnew = X[1:3, , drop = FALSE],
  unew = unew), 3)
round(predict(fit, type = "coefficients")[1:3, ], 3)
```

## Where to look next

- The companion vignette *Getting started with sssvcqr* covers the
  synthetic example with full prediction, plotting, CV, and diagnostic
  calls.
- The replication directory at the project root reproduces the
  full-sample Lucas County case study used in the JSS software paper,
  including the all-six-coefficients map and the multi-quantile
  age-coefficient comparison.
- Function help pages (`?ss_svcqr`, `?cv_ss_svcqr`,
  `?build_graph_laplacian`, `?kkt_sssvcqr`, `?predict.sssvcqr`) document
  each argument and return value.