A Quick Introduction to the psqn Package

\[\renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \def\argmin{\text{arg}\,\text{min}} \def\Expe{\text{E}}\]

This is a quick introduction to the psqn package. A more detailed description can be found in the psqn vignette (call vignette("psqn", package = "psqn")). The main function in the package is the psqn function. The psqn function minimizes functions which can be written like:

\[f(\vec x) = \sum_{i = 1}^n f_i(\vec x_{\mathcal I_i})\]

where \(\vec x\in \mathbb R^l\),

\[\vec x_{\mathcal I_i} = (\vec e_{j_{i1}}^\top, \dots ,\vec e_{j_{im_i}}^\top)\vec x, \qquad \mathcal I_i = (j_{i1}, \dots, \mathcal j_{im_i}) \subseteq \{1, \dots, l\},\]

and \(\vec e_k\) is the \(k\)’th column of the \(l\) dimensional identity matrix. We call the \(f_i\)s element functions and assume that each of them only depend on a small number of variables. Furthermore, we assume that each index set \(\mathcal I_i\) is of the form:

\[\begin{align*} \mathcal I_i &= \{1,\dots, p\} \cup \mathcal J_i \\ \mathcal J_i \cap \mathcal J_j &= \emptyset \qquad j\neq i \\ \mathcal J_i \cap \{1,\dots, p\} &= \emptyset \qquad \forall i = 1,\dots, n \end{align*}.\]

That is, each index set contains \(p\) global parameters and \(q_i = \lvert\mathcal J_i\rvert\) private parameters which are particular for each element function, \(f_i\). For implementation reason, we let:

\[\begin{align*} \overleftarrow q_i &= \begin{cases} p & i = 0 \\ p + \sum_{k = 1}^i q_k & i > 0 \end{cases} \\ \mathcal J_i &= \{1 + \overleftarrow q_{i - 1}, \dots , q_i + \overleftarrow q_{i - 1}\} \end{align*}\]

such that the element functions’ private parameters lies in consecutive parts of \(\vec x\). There is also a less restricted optimizer called optimizer_generic were the parameters can overlap in an arbitrary way. The R interface for this function is implemented in the psqn_generic function. See vignette("psqn", package = "psqn") for further details on both the psqn and the psqn_generic function.

The R Interface

As a simple example, we consider the element functions:

\[ f_i((\vec\beta^\top, \vec u^\top_i)^\top) = -\vec y_i(\mat X_i\vec\beta + \mat Z_i\vec u_i) + \sum_{k = 1}^{t_i} \log(1 + \exp(\vec x_{ik}^\top\vec\beta + \vec z_{ik}^\top\vec u_i)) + \frac 12 \vec u^\top_i\mat\Sigma^{-1} \vec u_i. \]

\(\vec\beta\) is the \(p\) dimensional global parameter and \(\vec u_i\) is the \(q_i = q\) dimensional private parameters for the \(i\)th element function. \(\vec y_i\in \{0, 1\}^{t_i}\), \(\mat X_i\in\mathbb R^{t_i\times p}\), and \(\mat Z_i\in\mathbb R^{t_i\times q}\) are particular to each element function. We simulate some data below to use:

# assign global parameters, number of private parameters, etc.
q <- 4 # number of private parameters per cluster
p <- 5 # number of global parameters
beta <- sqrt((1:p) / sum(1:p))
Sigma <- diag(q)

# simulate a data set
n_ele_func <- 1000L # number of element functions
set.seed(80919915)

sim_dat <- replicate(n_ele_func, {
  t_i <- sample.int(40L, 1L) + 2L
  X <- matrix(runif(p * t_i, -sqrt(6 / 2), sqrt(6 / 2)), 
              p)
  u <- drop(rnorm(q) %*% chol(Sigma))
  Z <- matrix(runif(q * t_i, -sqrt(6 / 2 / q), sqrt(6 / 2 / q)), 
              q)
  eta <- drop(beta %*% X + u %*% Z)
  y <- as.numeric((1 + exp(-eta))^(-1) > runif(t_i))
  
  list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma))
}, simplify = FALSE)

# data for the first element function
sim_dat[[1L]]
#> $X
#>        [,1]   [,2]   [,3]   [,4]   [,5]    [,6]   [,7]   [,8]   [,9]  [,10]
#> [1,]  0.147 -1.246  0.840 -1.083  0.701 -1.6532  0.549  1.522 -0.449  0.254
#> [2,] -0.241  1.391 -0.873 -0.877  1.005  1.5401 -0.207 -0.999  1.249  1.379
#> [3,] -1.713  0.698  1.550 -1.335  0.687  0.0999  0.688  0.493 -0.992  0.780
#> [4,]  1.116  1.687  0.557 -1.380  0.294  1.2391 -1.331 -0.459  0.262 -1.351
#> [5,] -0.391  1.310 -1.477 -0.836 -1.542  1.3278 -0.788 -0.675 -1.184  0.208
#>        [,11]  [,12]   [,13]  [,14]  [,15]  [,16]   [,17]  [,18]    [,19]  [,20]
#> [1,]  0.3633 -0.578  1.5378  1.488  0.653  1.707 -1.4558 -1.396  0.58917  1.473
#> [2,] -0.4122 -0.616 -1.2711  0.256 -1.494  0.615 -0.4410  0.114 -0.56704 -0.261
#> [3,]  0.0818 -0.272 -1.4706  1.060 -0.959 -1.141  0.0916 -0.928  1.68352 -0.155
#> [4,] -1.4245  1.716 -0.9433  0.428  1.670 -0.254 -0.1064 -0.245  0.00692  0.161
#> [5,]  1.6009  1.628  0.0971 -0.818  0.402 -0.497 -1.3034  0.636  0.72653 -0.425
#>       [,21]   [,22]   [,23]  [,24]  [,25]   [,26]  [,27]    [,28]  [,29]  [,30]
#> [1,] -1.086 -0.8711  1.2213  0.698  0.721  0.9319 -0.326 -0.00238 -1.164  0.203
#> [2,]  0.397  1.1903 -0.3113 -0.837  1.501 -0.0304  1.509 -0.17466  0.547 -0.667
#> [3,]  0.440  0.0235 -0.7929  0.305 -0.809  0.0949 -0.946 -0.44998 -0.761 -0.724
#> [4,]  0.222  1.2529 -0.0905 -0.879 -0.274  1.0152  0.492 -1.48076 -0.213  1.332
#> [5,]  0.872 -1.2783  1.0110 -1.225  0.904  1.0819 -1.243  0.34144  0.919  0.404
#>       [,31]  [,32]   [,33]  [,34]  [,35]  [,36]  [,37]  [,38]
#> [1,] -0.333 -0.842 -0.3760  1.529  0.439 -1.227 -0.235 -0.562
#> [2,]  0.649  1.103 -1.1518 -0.277 -1.369 -0.951  1.702  1.685
#> [3,]  1.370 -1.343  1.5827  0.355  0.457 -1.509 -1.427  0.779
#> [4,]  0.179  1.544 -0.0281 -0.199 -0.923 -0.524  0.406  0.515
#> [5,]  0.138 -0.470  1.4224  0.271 -0.424  1.090  0.290  1.585
#> 
#> $Z
#>        [,1]   [,2]    [,3]   [,4]    [,5]   [,6]    [,7]     [,8]    [,9]
#> [1,] -0.178  0.838 -0.6811 -0.752 -0.0552 -0.076 -0.0308 -0.19838  0.7404
#> [2,]  0.843  0.372 -0.0235 -0.652 -0.1140 -0.531  0.7292  0.74028  0.0757
#> [3,]  0.432  0.231  0.8555  0.624  0.2864  0.583  0.5691  0.00112 -0.6749
#> [4,] -0.288 -0.297  0.6028  0.536 -0.8658  0.142 -0.0209  0.53635  0.8298
#>       [,10]  [,11]  [,12]  [,13]   [,14]   [,15]  [,16]   [,17]  [,18]  [,19]
#> [1,] 0.0646 -0.657 -0.566  0.806 -0.1562  0.0875 -0.039 -0.7200  0.835  0.220
#> [2,] 0.6481 -0.232  0.215 -0.425  0.0812 -0.3133  0.378 -0.0546 -0.553 -0.365
#> [3,] 0.1781 -0.331  0.691  0.683 -0.8539 -0.8270 -0.257 -0.4874 -0.753  0.747
#> [4,] 0.7046  0.643  0.141 -0.308 -0.7163  0.7274 -0.711 -0.0990 -0.400  0.642
#>       [,20]  [,21]   [,22]   [,23]   [,24]  [,25]  [,26]  [,27]  [,28]   [,29]
#> [1,] -0.315 -0.395 -0.4401 -0.1950  0.0695 -0.402 -0.297 -0.862 -0.448  0.8003
#> [2,] -0.053 -0.498 -0.6587  0.0463  0.4262 -0.440 -0.551  0.852  0.764  0.6065
#> [3,] -0.323  0.307  0.3877  0.5228 -0.5408 -0.532 -0.758  0.346 -0.847 -0.6973
#> [4,] -0.779 -0.779  0.0879  0.8470 -0.1618 -0.320 -0.858  0.527 -0.784  0.0282
#>        [,30]    [,31]  [,32]  [,33]  [,34]   [,35]  [,36]  [,37]  [,38]
#> [1,] -0.4176 -0.60760  0.502 -0.816 -0.568 -0.3325 -0.085 -0.656 -0.828
#> [2,] -0.6609  0.00662  0.296  0.480 -0.538 -0.3001  0.750  0.375 -0.358
#> [3,]  0.1750  0.25447  0.124 -0.367  0.129 -0.0108  0.711  0.751 -0.159
#> [4,]  0.0145  0.08370 -0.802  0.500  0.263 -0.0894 -0.637  0.416 -0.677
#> 
#> $y
#>  [1] 1 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
#> 
#> $u
#> [1] -0.170  0.427  0.918 -2.080
#> 
#> $Sigma_inv
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> [3,]    0    0    1    0
#> [4,]    0    0    0    1

We work with \(\mat X_i^\top\) and \(\mat Z_i^\top\) for computational reasons. The function we need to pass to psqn needs to take three arguments:

The function should return the element function value (potentially with the gradient as an attribute) or \(p\) and \(q_i\). Thus, an example in our case will be:

r_func <- function(i, par, comp_grad){
  dat <- sim_dat[[i]]
  X <- dat$X
  Z <- dat$Z
  
  if(length(par) < 1)
    # requested the dimension of the parameter
    return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z)))
  
  y <- dat$y
  Sigma_inv <- dat$Sigma_inv
  
  beta <- par[1:p]
  u_i <- par[1:q + p]
  eta <- drop(beta %*% X + u_i %*% Z)
  exp_eta <- exp(eta)
  
  # compute the element function
  out <- -sum(y * eta) + sum(log(1 + exp_eta)) + 
    sum(u_i * (Sigma_inv %*% u_i)) / 2
  
  if(comp_grad){
    # we also need to compute the gradient
    d_eta <- -y + exp_eta / (1 + exp_eta)
    grad <- c(X %*% d_eta, 
              Z %*% d_eta + dat$Sigma_inv %*% u_i)
    attr(out, "grad") <- grad
  }
  
  out
}

Then we can optimize the function as follows:

library(psqn)
start_val <- numeric(p + n_ele_func * q) # the starting value
opt_res <- psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func)

# check the minimum
opt_res$value
#> [1] 11969

# check the estimated global parameters
head(opt_res$par, length(beta))
#> [1] 0.254 0.356 0.410 0.520 0.564

# should be close to 
beta
#> [1] 0.258 0.365 0.447 0.516 0.577

The R Interface for optimizer_generic

We can also use the psqn_generic function although it will be slower because of some additional computational overhead because the function is more general. The function we need to pass to psqn_generic needs to take three arguments:

We assign the function we need to pass to psqn_generic for the example in this vignette:

r_func_generic <- function(i, par, comp_grad){
  dat <- sim_dat[[i]]
  X <- dat$X
  Z <- dat$Z
  
  if(length(par) < 1)
    # return the index set. This is one-based like in R
    return(c(1:NROW(dat$X), 
             seq_len(NROW(dat$Z)) + NROW(dat$X) + (i - 1L) * NROW(dat$Z)))
  
  y <- dat$y
  Sigma_inv <- dat$Sigma_inv
  
  beta <- par[1:p]
  u_i <- par[1:q + p]
  eta <- drop(beta %*% X + u_i %*% Z)
  exp_eta <- exp(eta)
  
  # compute the element function
  out <- -sum(y * eta) + sum(log(1 + exp_eta)) + 
    sum(u_i * (Sigma_inv %*% u_i)) / 2
  
  if(comp_grad){
    # we also need to compute the gradient
    d_eta <- -y + exp_eta / (1 + exp_eta)
    grad <- c(X %*% d_eta, 
              Z %*% d_eta + dat$Sigma_inv %*% u_i)
    attr(out, "grad") <- grad
  }
  
  out
}

Then we can optimize the function as follows:

opt_res_generic <- psqn_generic(
  par = start_val, fn = r_func_generic, n_ele_func = n_ele_func)

# we get the same
all.equal(opt_res_generic$value, opt_res$value)
#> [1] TRUE
all.equal(opt_res_generic$par  , opt_res$par)
#> [1] TRUE

# the generic version is slower
bench::mark(
  psqn = psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func), 
  psqn_generic = psqn_generic(
  par = start_val, fn = r_func_generic, n_ele_func = n_ele_func), 
  min_iterations = 5)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 psqn            287ms    290ms      3.42    30.9MB     13.7
#> 2 psqn_generic    296ms    301ms      3.32    30.9MB     13.3

Ending Remarks

The package can also be used as a header-only library in C++. This can yield a very large reduction in computation time and be easy to implement with the Rcpp package. Two examples are shown in the psqn vignette (see vignette("psqn", package = "psqn")).

There is also a BFGS implementation in the package. This can be used in R with the psqn_bfgs function. The BFGS implementation can also be used in C++ using the psqn-bfgs.h file.