---
title: "PC Directions in dppca"
description: >
Principal component direction estimation used in dppca, including
non-private sample PCA directions and differentially private g-DPPCA directions.
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{PC Directions in dppca}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align = "center"
)
```
In ordinary PCA, the principal component directions are obtained from the
eigenvectors of the sample covariance matrix. In `dppca`, these directions can be
computed in two different ways.
1. **Non-private PC directions**: eigenvectors of the sample covariance matrix.
2. **Differentially private PC directions**: private principal component directions obtained through the g-DPPCA procedure.
## Notation
Let
\[
X =
\begin{bmatrix}
X_1^\top \\
X_2^\top \\
\vdots \\
X_n^\top
\end{bmatrix}
\in \mathbb{R}^{n \times p}
\]
be the data matrix used for PCA, where $X_i \in \mathbb{R}^p$is the $i$-th observation.
We assume that $X$ has been centered, and optionally standardized.
The principal component direction matrix is denoted by
\[
V_k = [v_1,\ldots,v_k] \in \mathbb{R}^{p \times k},
\]
where each column $v_\ell$ is a unit vector representing the $\ell$-th pc direction.
The corresponding score matrix is $Z = X V_k$.
## 1. Non-private PC directions
The classical sample covariance matrix is
\[
\hat\Sigma
=
\frac{1}{n-1}X^\top X.
\]
The non-private PCA directions are obtained from the eigenvalue decomposition
\[
\hat\Sigma
=
\hat V \hat\Lambda \hat V^\top,
\]
where
\[
\hat V = [\hat v_1,\ldots,\hat v_p],
\quad
\hat\Lambda
=
\operatorname{diag}(\hat\lambda_1,\ldots,\hat\lambda_p)
\quad \text{with} \quad
\hat\lambda_1 \geq \hat\lambda_2 \geq \cdots \geq \hat\lambda_p \geq 0.
\]
The $\ell$-th sample principal component direction is $\hat v_\ell$.
Equivalently,
\[
\hat v_\ell
=
\arg\max_{\|v\|_2 = 1}
v^\top \hat\Sigma v
\quad
\text{subject to}
\quad
v^\top \hat v_j = 0,
\qquad j = 1,\ldots,\ell-1.
\]
In the non-private option of `dppca`, the direction matrix used for projection is
\[
\hat V_k = [\hat v_1,\ldots,\hat v_k].
\]
## 2. DP PC directions
[Kim and Jung (2025)](#ref-Kim2025) proposed `g-DPPCA` by adding matrix Gaussian mechanism on the generalized multivariate Kendall's tau matrix which based on the robust data transformation called generalized spatial sign proposed by [Raymakers and Rousseeuw (2019)](#ref-Raymaekers2019).
For a positive valued scale function $\xi: (0, \infty) \to (0, \infty)$, consider a map $g_\xi: \mathbb{R}^d \to \mathbb{R}^d$ defined as
\[
g_\xi(t) = \xi(\|t\|_2)\cdot \frac{t}{\|t\|_2}.
\]
$g_{\xi}$ is called as a *generalized spatial sign* with respect to $\xi$.
The *generalized multivariate Kendall's tau* matrix with respect to $g_\xi$ is defined as
\[
K_{g_\xi} = \mathbb{E}_{X, X'}\left[ g_\xi\left( \frac{X - X'}{\sqrt{2}}\right)
g_\xi\left( \frac{X - X'}{\sqrt{2}}\right)^\top ~ \right],
\]
where $X'$ is an independent copy of $X$. Importantly, if $X$ follows an elliptical distribution (which including Gaussian and multivariate $t$-distributions), $K_{g_\xi}$ shares the same eigenvectors with same order to the $\mbox{cov}(X)$. So, one can conduct a PCA by estimating $K_{g_\xi}$ and then get eigenvectors of it.
For a convenience, we write $g$ as the given sign function. For a random sample $S = (X_1, \dots, X_n)$, the second order *U*-statistic of $K_{g}$ can be written as
\[
\widehat{K}_g(S) = \frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{X_j - X_i}{\sqrt{2}}\right)
g\left(\frac{X_j - X_i}{\sqrt{2}}\right)^\top.
\]
Note that the sensitivity of $\widehat{K}_g$ with respect to the Frobenius norm can be upper bounded by
\[
\Delta_F(\widehat{K}_g)
= \sup_{S \sim S'} \|\widehat{K}_g(S) - \widehat{K}_g(S')\|_F
\le \frac{4\|g\|_\infty^2}{n}.
\]
So, for a dataset $S = (x_1, \dots, x_n)$ the randomized mechanism $\bar{K}_g$ defined as
\[
\bar{K}_g(S) :=
\frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{x_j-x_i}{\sqrt{2}}\right)g\left(\frac{x_j-x_i}{\sqrt{2}}\right)^\top + \mbox{vecd}^{-1}(\xi),
\]
where $\xi \sim N_{d(d+1)/2}(0, \sigma_{\varepsilon, \delta}^2 I_{d(d+1)/2})$ and
$\sigma_{\varepsilon, \delta} = \frac{4\|g\|_{\infty}^2 \sqrt{2 \ln(1.25/\delta)}}{n\varepsilon}$, satisfies $(\varepsilon, \delta)$-DP.
Define $\bar{V}_{g, m}(S) \in \mathcal{O}(d, m)$ as the matrix of the first $m$ eigenvectors of $\bar{K}_g(S)$. Then, $\bar{V}_{g, m}(S)$ satisfies $(\varepsilon, \delta)$-DP due to the post-processing property, and it can be served as a DP principal components. Kim and Jung (2025) calls these process as a `g-DPPCA`.
In the implementation of the function `dp_pc_dir` with option `g_dppca=TRUE`, we use the spherical transformation $g_{sph}(t) = t/\|t\|_2$ to output differentially private PC directions $\bar{V}_{sph,m}$. In this case, it holds that $\|g_{sph}\|_{\infty} = 1$, and thus the variance of additive Gaussian noise is set as $\sigma_{\varepsilon, \delta} = \frac{4\sqrt{2 \ln(1.25/\delta)}}{n\varepsilon}$.
## Summary
The principal component direction step in `dppca` can be summarized as follows.
1. Start with a preprocessed data matrix $X$.
2. Choose a direction estimation method.
3. Obtain a direction matrix $V_k$.
4. Compute projected scores $Y = X V_k$.
5. Use the scores for private scree estimation or private score visualization.
The main distinction is whether $V_k$ is obtained from the ordinary sample covariance matrix or from a differentially private robust PC direction estimator.
## References
Minwoo Kim and Sungkyu Jung (2025), "Robust and differentially private principal component analysis," *Statistical Analysis and Data Mining*, 18(6), https://doi.org/10.1002/sam.70053
Jakob Raymaekers and Peter Rousseeuw (2019), "A generalized spatial sign covariance matrix," *Journal of Multivariate Analysis*, 171:94–111, https://doi.org/10.1016/j.jmva.2018.11.010