---
title: "Worked Example: msPCA on mtcars"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Worked Example: msPCA on mtcars}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## Overview

This vignette shows the basic workflow of `msPCA` on the built-in `mtcars`
dataset. We compute sparse principal components, inspect the solution with the
`print()` and `summary()` S3 methods, and compare the sparse result with dense
PCA.

## Install and load

Install the package directly from CRAN.

```{r install, eval=FALSE}
install.packages("msPCA")
```

```{r load}
library(msPCA)
```

## Fit two sparse PCs

We work with the correlation matrix of `mtcars` and ask for two 4-sparse
principal components under the default orthogonality constraint.

```{r fit}
Sigma <- cor(mtcars)

set.seed(42)
res <- mspca(Sigma, r = 2, ks = c(4, 4), verbose = FALSE)
```

`print()` shows the sparse loading matrix restricted to the union of all
active variables, together with the percentage of variance explained and
the number of non-zero loadings per component.

```{r print}
print(res, Sigma)
```

`summary()` gives a fuller breakdown: a per-PC table of variance explained
and sparsity, followed by the pairwise feasibility violation matrix so you
can see at a glance how well the orthogonality constraint is satisfied
between each pair of components.

```{r summary}
summary(res, Sigma)
```

## Working from the raw data matrix

By default (`type = "Sigma"`) the first argument is a covariance/correlation
matrix. Set `type = "X"` to pass the raw data matrix instead (rows are
observations, columns are variables). With `type = "X"`, msPCA applies the
algorithm to the data directly: each matrix–vector product
$\Sigma \beta = X^\top (X \beta) / (n - 1)$ is computed without ever forming
the $p \times p$ matrix. This is mathematically equivalent but more scalable
when $p \gg n$.

The preprocessing arguments control which matrix is implicitly used:

- `center` (default `TRUE`) subtracts column means.
- `scale` (default `FALSE`) divides by column standard deviations; set to
  `TRUE` to operate on the correlation matrix.
- `divisor` selects the normalization: `"n-1"` (default, matching `cov`/`cor`)
  or `"n"`.

With `scale = TRUE` and `divisor = "n-1"`, the raw-data call reproduces the
correlation-matrix result above. When `type = "X"`, the variance figures are
stored inside the object, so `C` need not be passed to `print()` or
`summary()`.

```{r fit_X}
X <- as.matrix(mtcars)

set.seed(42)
res_X <- mspca(X, r = 2, ks = c(4, 4), type = "X", scale = TRUE, verbose = FALSE)
print(res_X)
```

The same dual interface is available for the single-component `tpm()`.

## Orthogonality versus zero correlation

Sparse PCA requires a constraint to prevent redundancy between components. The
default (`feasibilityConstraintType = 0`) enforces orthogonality of the loading
vectors. Setting `feasibilityConstraintType = 1` instead enforces zero pairwise
correlation between the resulting scores. The choice can lead to different
solutions when the variables are strongly correlated.

```{r fit_corr}
set.seed(42)
res_corr <- mspca(Sigma, r = 2, ks = c(4, 4),
                  feasibilityConstraintType = 1, verbose = FALSE)
print(res_corr, Sigma)
summary(res_corr, Sigma, feasibilityConstraintType = 1)
```

## Diagnostics

The utility functions `feasibility_violation_off()` and
`fraction_variance_explained()` can be called directly for custom reporting or
for comparing solutions across methods.

```{r diagnostics}
# Orthogonality and zero-correlation violations for the default solution
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 0)
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 1)

# Total and per-PC fraction of variance explained
fraction_variance_explained(Sigma, res$x_best)
fraction_variance_explained_perPC(Sigma, res$x_best)
```

## Comparison with dense PCA

The first two dense principal components explain more variance, but all
variables receive non-zero loadings.

```{r dense_pca}
pca_res <- prcomp(mtcars, scale. = TRUE)
fraction_variance_explained(Sigma, pca_res$rotation[, 1:2])
```

Sparse PCA trades a small reduction in explained variance for a much more
interpretable loading pattern.