--- title: "Worked Example: msPCA on mtcars" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Worked Example: msPCA on mtcars} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## Overview This vignette shows the basic workflow of `msPCA` on the built-in `mtcars` dataset. We compute sparse principal components, inspect the solution with the `print()` and `summary()` S3 methods, and compare the sparse result with dense PCA. ## Install and load Install the package directly from CRAN. ```{r install, eval=FALSE} install.packages("msPCA") ``` ```{r load} library(msPCA) ``` ## Fit two sparse PCs We work with the correlation matrix of `mtcars` and ask for two 4-sparse principal components under the default orthogonality constraint. ```{r fit} Sigma <- cor(mtcars) set.seed(42) res <- mspca(Sigma, r = 2, ks = c(4, 4), verbose = FALSE) ``` `print()` shows the sparse loading matrix restricted to the union of all active variables, together with the percentage of variance explained and the number of non-zero loadings per component. ```{r print} print(res, Sigma) ``` `summary()` gives a fuller breakdown: a per-PC table of variance explained and sparsity, followed by the pairwise feasibility violation matrix so you can see at a glance how well the orthogonality constraint is satisfied between each pair of components. ```{r summary} summary(res, Sigma) ``` ## Working from the raw data matrix By default (`type = "Sigma"`) the first argument is a covariance/correlation matrix. Set `type = "X"` to pass the raw data matrix instead (rows are observations, columns are variables). With `type = "X"`, msPCA applies the algorithm to the data directly: each matrix–vector product $\Sigma \beta = X^\top (X \beta) / (n - 1)$ is computed without ever forming the $p \times p$ matrix. This is mathematically equivalent but more scalable when $p \gg n$. The preprocessing arguments control which matrix is implicitly used: - `center` (default `TRUE`) subtracts column means. - `scale` (default `FALSE`) divides by column standard deviations; set to `TRUE` to operate on the correlation matrix. - `divisor` selects the normalization: `"n-1"` (default, matching `cov`/`cor`) or `"n"`. With `scale = TRUE` and `divisor = "n-1"`, the raw-data call reproduces the correlation-matrix result above. When `type = "X"`, the variance figures are stored inside the object, so `C` need not be passed to `print()` or `summary()`. ```{r fit_X} X <- as.matrix(mtcars) set.seed(42) res_X <- mspca(X, r = 2, ks = c(4, 4), type = "X", scale = TRUE, verbose = FALSE) print(res_X) ``` The same dual interface is available for the single-component `tpm()`. ## Orthogonality versus zero correlation Sparse PCA requires a constraint to prevent redundancy between components. The default (`feasibilityConstraintType = 0`) enforces orthogonality of the loading vectors. Setting `feasibilityConstraintType = 1` instead enforces zero pairwise correlation between the resulting scores. The choice can lead to different solutions when the variables are strongly correlated. ```{r fit_corr} set.seed(42) res_corr <- mspca(Sigma, r = 2, ks = c(4, 4), feasibilityConstraintType = 1, verbose = FALSE) print(res_corr, Sigma) summary(res_corr, Sigma, feasibilityConstraintType = 1) ``` ## Diagnostics The utility functions `feasibility_violation_off()` and `fraction_variance_explained()` can be called directly for custom reporting or for comparing solutions across methods. ```{r diagnostics} # Orthogonality and zero-correlation violations for the default solution feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 0) feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 1) # Total and per-PC fraction of variance explained fraction_variance_explained(Sigma, res$x_best) fraction_variance_explained_perPC(Sigma, res$x_best) ``` ## Comparison with dense PCA The first two dense principal components explain more variance, but all variables receive non-zero loadings. ```{r dense_pca} pca_res <- prcomp(mtcars, scale. = TRUE) fraction_variance_explained(Sigma, pca_res$rotation[, 1:2]) ``` Sparse PCA trades a small reduction in explained variance for a much more interpretable loading pattern.