In ordinary PCA, the principal component directions are obtained from
the eigenvectors of the sample covariance matrix. In dppca,
these directions can be computed in two different ways.
Let
\[ X = \begin{bmatrix} X_1^\top \\ X_2^\top \\ \vdots \\ X_n^\top \end{bmatrix} \in \mathbb{R}^{n \times p} \]
be the data matrix used for PCA, where \(X_i \in \mathbb{R}^p\)is the \(i\)-th observation. We assume that \(X\) has been centered, and optionally standardized.
The principal component direction matrix is denoted by
\[ V_k = [v_1,\ldots,v_k] \in \mathbb{R}^{p \times k}, \]
where each column \(v_\ell\) is a unit vector representing the \(\ell\)-th pc direction.
The corresponding score matrix is \(Z = X V_k\).
The classical sample covariance matrix is
\[ \hat\Sigma = \frac{1}{n-1}X^\top X. \]
The non-private PCA directions are obtained from the eigenvalue decomposition
\[ \hat\Sigma = \hat V \hat\Lambda \hat V^\top, \]
where
\[ \hat V = [\hat v_1,\ldots,\hat v_p], \quad \hat\Lambda = \operatorname{diag}(\hat\lambda_1,\ldots,\hat\lambda_p) \quad \text{with} \quad \hat\lambda_1 \geq \hat\lambda_2 \geq \cdots \geq \hat\lambda_p \geq 0. \]
The \(\ell\)-th sample principal component direction is \(\hat v_\ell\).
Equivalently,
\[ \hat v_\ell = \arg\max_{\|v\|_2 = 1} v^\top \hat\Sigma v \quad \text{subject to} \quad v^\top \hat v_j = 0, \qquad j = 1,\ldots,\ell-1. \]
In the non-private option of dppca, the direction matrix
used for projection is
\[ \hat V_k = [\hat v_1,\ldots,\hat v_k]. \]
Kim and Jung (2025) proposed
g-DPPCA by adding matrix Gaussian mechanism on the
generalized multivariate Kendall’s tau matrix which based on the robust
data transformation called generalized spatial sign proposed by Raymakers and Rousseeuw (2019).
For a positive valued scale function \(\xi: (0, \infty) \to (0, \infty)\), consider a map \(g_\xi: \mathbb{R}^d \to \mathbb{R}^d\) defined as
\[ g_\xi(t) = \xi(\|t\|_2)\cdot \frac{t}{\|t\|_2}. \]
\(g_{\xi}\) is called as a generalized spatial sign with respect to \(\xi\).
The generalized multivariate Kendall’s tau matrix with respect to \(g_\xi\) is defined as
\[ K_{g_\xi} = \mathbb{E}_{X, X'}\left[ g_\xi\left( \frac{X - X'}{\sqrt{2}}\right) g_\xi\left( \frac{X - X'}{\sqrt{2}}\right)^\top ~ \right], \]
where \(X'\) is an independent copy of \(X\). Importantly, if \(X\) follows an elliptical distribution (which including Gaussian and multivariate \(t\)-distributions), \(K_{g_\xi}\) shares the same eigenvectors with same order to the \(\mbox{cov}(X)\). So, one can conduct a PCA by estimating \(K_{g_\xi}\) and then get eigenvectors of it.
For a convenience, we write \(g\) as the given sign function. For a random sample \(S = (X_1, \dots, X_n)\), the second order U-statistic of \(K_{g}\) can be written as
\[ \widehat{K}_g(S) = \frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{X_j - X_i}{\sqrt{2}}\right) g\left(\frac{X_j - X_i}{\sqrt{2}}\right)^\top. \]
Note that the sensitivity of \(\widehat{K}_g\) with respect to the Frobenius norm can be upper bounded by
\[ \Delta_F(\widehat{K}_g) = \sup_{S \sim S'} \|\widehat{K}_g(S) - \widehat{K}_g(S')\|_F \le \frac{4\|g\|_\infty^2}{n}. \]
So, for a dataset \(S = (x_1, \dots, x_n)\) the randomized mechanism \(\bar{K}_g\) defined as
\[ \bar{K}_g(S) := \frac{2}{n(n-1)} \sum_{i < j} g\left(\frac{x_j-x_i}{\sqrt{2}}\right)g\left(\frac{x_j-x_i}{\sqrt{2}}\right)^\top + \mbox{vecd}^{-1}(\xi), \] where \(\xi \sim N_{d(d+1)/2}(0, \sigma_{\varepsilon, \delta}^2 I_{d(d+1)/2})\) and \(\sigma_{\varepsilon, \delta} = \frac{4\|g\|_{\infty}^2 \sqrt{2 \ln(1.25/\delta)}}{n\varepsilon}\), satisfies \((\varepsilon, \delta)\)-DP.
Define \(\bar{V}_{g, m}(S) \in
\mathcal{O}(d, m)\) as the matrix of the first \(m\) eigenvectors of \(\bar{K}_g(S)\). Then, \(\bar{V}_{g, m}(S)\) satisfies \((\varepsilon, \delta)\)-DP due to the
post-processing property, and it can be served as a DP principal
components. Kim and Jung (2025) calls these process as a
g-DPPCA.
In the implementation of the function dp_pc_dir with
option g_dppca=TRUE, we use the spherical transformation
\(g_{sph}(t) = t/\|t\|_2\) to output
differentially private PC directions \(\bar{V}_{sph,m}\). In this case, it holds
that \(\|g_{sph}\|_{\infty} = 1\), and
thus the variance of additive Gaussian noise is set as \(\sigma_{\varepsilon, \delta} = \frac{4\sqrt{2
\ln(1.25/\delta)}}{n\varepsilon}\).
The principal component direction step in dppca can be
summarized as follows.
The main distinction is whether \(V_k\) is obtained from the ordinary sample covariance matrix or from a differentially private robust PC direction estimator.
Minwoo Kim and Sungkyu Jung (2025), “Robust and differentially private principal component analysis,” Statistical Analysis and Data Mining, 18(6), https://doi.org/10.1002/sam.70053
Jakob Raymaekers and Peter Rousseeuw (2019), “A generalized spatial sign covariance matrix,” Journal of Multivariate Analysis, 171:94–111, https://doi.org/10.1016/j.jmva.2018.11.010