dppca

dppca provides tools for differentially private principal component analysis (PCA) visualization in R. It supports private PC direction estimation, private scree/PVE plots, private score plots, grouped score visualizations, and an interactive ‘shiny’ app.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("yejinjo0220/dppca")

Basic workflow

The main workflow is:

  1. estimate private PC directions with dp_pc_dir().
  2. estimate and plot private scree/PVE summaries with dp_scree() and dp_scree_plot().
  3. compute and plot private PCA score summaries with dp_score() and dp_score_plot().
  4. optionally use grouped score visualizations or the Shiny app.

The examples below use the synthetic Gaussian cluster dataset included in the package.

library(dppca)

data(gau, package = "dppca")
X <- gau

1. Private PC directions

dp_pc_dir() estimates leading principal component directions under differential privacy.

set.seed(123)

V <- dp_pc_dir(
  X,
  k = 5,
  g_dppca = TRUE,
  eps = 3,
  delta = 1e-4
)

V 

The returned object contains private principal component directions that can be used PCA summaries and visualizations.

2. Private scree values

dp_scree() estimates private scree values or proportions of variance explained. The method is chosen by the method argument.

set.seed(123)

scree_clipped <- dp_scree(
  X,
  k = 5,
  method = "clipped",
  control = clipped_control(C_clip = 3),
  eps = 3,
  delta = 1e-4
)

scree_clipped

The package currently supports three scree estimation methods:

Method-specific tuning parameters are specified using the control helper unctions clipped_control(), pmwm_control(), and huber_control().

For example, multiple scree methods can be requested by passing a vector to method and a named list to control.

set.seed(123)

scree_all <- dp_scree(
  X,
  k = 5,
  method = c("clipped", "pmwm", "huber"),
  control = list(
    clipped = clipped_control(C_clip = 3),
    pmwm = pmwm_control(a = 0, b = 50, trim_const = 10, eta = 0.01),
    huber = huber_control(k_min_m2 = -10, k_max_m2 = 10, m2_frac = 1 / 4)
  ),
  eps = 3,
  delta = 1e-4
)

scree_all

Private scree plots

dp_scree_plot() visualizes private scree values or private proportions of variance explained.

set.seed(123)

scree_plot_all <- dp_scree_plot(
  X,
  k = 5,
  method = c("clipped", "pmwm", "huber"),
  control = list(
    clipped = clipped_control(C_clip = 3),
    pmwm = pmwm_control(a = 0, b = 50, trim_const = 10, eta = 0.01),
    huber = huber_control(k_min_m2 = -10, k_max_m2 = 10, m2_frac = 1 / 4)
  ),
  eps = 3,
  delta = 1e-4
)
scree_plot_all

Private scree plot produced by dppca

3. Private PCA score

dp_score() computes differentially private summaries of two-dimensional PCA scores using histogram-based methods.

set.seed(123)

score_result <- dp_score(
  X,
  eps = 3,
  delta = 1e-4,
  bins = c(8, 8),
  method = "add"
)

score_result 

Available score methods include:

Use method = "add" or method = "sparse" to run one histogram method, or method = c("add", "sparse") to compute both.

Private score plot

dp_score_plot() draws private score plots based on the histogram summaries returned by dp_score().

If method is omitted, both additive and sparse histogram methods are used.

set.seed(123)

score_plot <- dp_score_plot(
  X,
  eps = 3,
  delta = 1e-4,
  bins = c(15, 15)
)

score_plot$plot$all

Private score plot produced by dppca

Grouped score plot

For data with group labels, dp_score_group() and dp_score_plot_group() provide grouped versions of the private score.

data(gau_g, package = "dppca")
X_g <- gau_g

Compute grouped private score.

set.seed(123)
score_group <- dp_score_group(
  X_g,
  group = "group",
  eps = 3,
  delta = 1e-4,
  bins = c(8, 8),
  method = "add"
)

score_group

Draw a grouped private score plot.

set.seed(123)

score_group_plot <- dp_score_plot_group(
  X_g,
  group = "group",
  eps = 3,
  delta = 1e-4,
  bins = c(15, 15),
)

score_group_plot$plot$all

Private group score plot produced by dppca


Shiny app

dppca_app() launches a Shiny app for exploring private scree and score plots through a graphical interface.

dppca_app()

You can also launch the app with a user-supplied dataset.

data(gau_g, package = "dppca")
dppca_app(gau_g, group = "group")

Data

dppca includes three datasets for examples and demonstrations:

Data sources

The package includes a numerical subset of the Adult dataset from the UCI Machine Learning Repository. The Adult dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This package retains five numerical variables: age, education_num, capital_gain, capital_loss, and hours_per_week.

The package also includes synthetic Gaussian cluster datasets generated by the package authors for reproducible examples.


References

The methods and examples in dppca are related to the following references.