\[\renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\diag{\text{diag}} \def\argmin{\text{arg}\,\text{min}} \def\Expe{\text{E}}\]
This is a quick introduction to the psqn package. A more detailed description can be found in the psqn vignette (call vignette("psqn", package = "psqn")
). The main function in the package is the psqn
function. The psqn
function minimizes functions which can be written like:
\[f(\vec x) = \sum_{i = 1}^n f_i(\vec x_{\mathcal I_i})\]
where \(\vec x\in \mathbb R^l\),
\[\vec x_{\mathcal I_i} = (\vec e_{j_{i1}}^\top, \dots ,\vec e_{j_{im_i}}^\top)\vec x, \qquad \mathcal I_i = (j_{i1}, \dots, \mathcal j_{im_i}) \subseteq \{1, \dots, l\},\]
and \(\vec e_k\) is the \(k\)’th column of the \(l\) dimensional identity matrix. We call the \(f_i\)s element functions and assume that each of them only depend on a small number of variables. Furthermore, we assume that each index set \(\mathcal I_i\) is of the form:
\[\begin{align*} \mathcal I_i &= \{1,\dots, p\} \cup \mathcal J_i \\ \mathcal J_i \cap \mathcal J_j &= \emptyset \qquad j\neq i \\ \mathcal J_i \cap \{1,\dots, p\} &= \emptyset \qquad \forall i = 1,\dots, n \end{align*}.\]
That is, each index set contains \(p\) global parameters and \(q_i = \lvert\mathcal J_i\rvert\) private parameters which are particular for each element function, \(f_i\). For implementation reason, we let:
\[\begin{align*} \overleftarrow q_i &= \begin{cases} p & i = 0 \\ p + \sum_{k = 1}^i q_k & i > 0 \end{cases} \\ \mathcal J_i &= \{1 + \overleftarrow q_{i - 1}, \dots , q_i + \overleftarrow q_{i - 1}\} \end{align*}\]
such that the element functions’ private parameters lies in consecutive parts of \(\vec x\). There is also a less restricted optimizer called optimizer_generic
were the parameters can overlap in an arbitrary way. The R interface for this function is implemented in the psqn_generic
function. See vignette("psqn", package = "psqn")
for further details on both the psqn
and the psqn_generic
function.
As a simple example, we consider the element functions:
\[ f_i((\vec\beta^\top, \vec u^\top_i)^\top) = -\vec y_i(\mat X_i\vec\beta + \mat Z_i\vec u_i) + \sum_{k = 1}^{t_i} \log(1 + \exp(\vec x_{ik}^\top\vec\beta + \vec z_{ik}^\top\vec u_i)) + \frac 12 \vec u^\top_i\mat\Sigma^{-1} \vec u_i. \]
\(\vec\beta\) is the \(p\) dimensional global parameter and \(\vec u_i\) is the \(q_i = q\) dimensional private parameters for the \(i\)th element function. \(\vec y_i\in \{0, 1\}^{t_i}\), \(\mat X_i\in\mathbb R^{t_i\times p}\), and \(\mat Z_i\in\mathbb R^{t_i\times q}\) are particular to each element function. We simulate some data below to use:
# assign global parameters, number of private parameters, etc.
q <- 4 # number of private parameters per cluster
p <- 5 # number of global parameters
beta <- sqrt((1:p) / sum(1:p))
Sigma <- diag(q)
# simulate a data set
n_ele_func <- 1000L # number of element functions
set.seed(80919915)
sim_dat <- replicate(n_ele_func, {
t_i <- sample.int(40L, 1L) + 2L
X <- matrix(runif(p * t_i, -sqrt(6 / 2), sqrt(6 / 2)),
p)
u <- drop(rnorm(q) %*% chol(Sigma))
Z <- matrix(runif(q * t_i, -sqrt(6 / 2 / q), sqrt(6 / 2 / q)),
q)
eta <- drop(beta %*% X + u %*% Z)
y <- as.numeric((1 + exp(-eta))^(-1) > runif(t_i))
list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma))
}, simplify = FALSE)
# data for the first element function
sim_dat[[1L]]
#> $X
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0.147 -1.246 0.840 -1.083 0.701 -1.6532 0.549 1.522 -0.449 0.254
#> [2,] -0.241 1.391 -0.873 -0.877 1.005 1.5401 -0.207 -0.999 1.249 1.379
#> [3,] -1.713 0.698 1.550 -1.335 0.687 0.0999 0.688 0.493 -0.992 0.780
#> [4,] 1.116 1.687 0.557 -1.380 0.294 1.2391 -1.331 -0.459 0.262 -1.351
#> [5,] -0.391 1.310 -1.477 -0.836 -1.542 1.3278 -0.788 -0.675 -1.184 0.208
#> [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] 0.3633 -0.578 1.5378 1.488 0.653 1.707 -1.4558 -1.396 0.58917 1.473
#> [2,] -0.4122 -0.616 -1.2711 0.256 -1.494 0.615 -0.4410 0.114 -0.56704 -0.261
#> [3,] 0.0818 -0.272 -1.4706 1.060 -0.959 -1.141 0.0916 -0.928 1.68352 -0.155
#> [4,] -1.4245 1.716 -0.9433 0.428 1.670 -0.254 -0.1064 -0.245 0.00692 0.161
#> [5,] 1.6009 1.628 0.0971 -0.818 0.402 -0.497 -1.3034 0.636 0.72653 -0.425
#> [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
#> [1,] -1.086 -0.8711 1.2213 0.698 0.721 0.9319 -0.326 -0.00238 -1.164 0.203
#> [2,] 0.397 1.1903 -0.3113 -0.837 1.501 -0.0304 1.509 -0.17466 0.547 -0.667
#> [3,] 0.440 0.0235 -0.7929 0.305 -0.809 0.0949 -0.946 -0.44998 -0.761 -0.724
#> [4,] 0.222 1.2529 -0.0905 -0.879 -0.274 1.0152 0.492 -1.48076 -0.213 1.332
#> [5,] 0.872 -1.2783 1.0110 -1.225 0.904 1.0819 -1.243 0.34144 0.919 0.404
#> [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#> [1,] -0.333 -0.842 -0.3760 1.529 0.439 -1.227 -0.235 -0.562
#> [2,] 0.649 1.103 -1.1518 -0.277 -1.369 -0.951 1.702 1.685
#> [3,] 1.370 -1.343 1.5827 0.355 0.457 -1.509 -1.427 0.779
#> [4,] 0.179 1.544 -0.0281 -0.199 -0.923 -0.524 0.406 0.515
#> [5,] 0.138 -0.470 1.4224 0.271 -0.424 1.090 0.290 1.585
#>
#> $Z
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,] -0.178 0.838 -0.6811 -0.752 -0.0552 -0.076 -0.0308 -0.19838 0.7404
#> [2,] 0.843 0.372 -0.0235 -0.652 -0.1140 -0.531 0.7292 0.74028 0.0757
#> [3,] 0.432 0.231 0.8555 0.624 0.2864 0.583 0.5691 0.00112 -0.6749
#> [4,] -0.288 -0.297 0.6028 0.536 -0.8658 0.142 -0.0209 0.53635 0.8298
#> [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
#> [1,] 0.0646 -0.657 -0.566 0.806 -0.1562 0.0875 -0.039 -0.7200 0.835 0.220
#> [2,] 0.6481 -0.232 0.215 -0.425 0.0812 -0.3133 0.378 -0.0546 -0.553 -0.365
#> [3,] 0.1781 -0.331 0.691 0.683 -0.8539 -0.8270 -0.257 -0.4874 -0.753 0.747
#> [4,] 0.7046 0.643 0.141 -0.308 -0.7163 0.7274 -0.711 -0.0990 -0.400 0.642
#> [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29]
#> [1,] -0.315 -0.395 -0.4401 -0.1950 0.0695 -0.402 -0.297 -0.862 -0.448 0.8003
#> [2,] -0.053 -0.498 -0.6587 0.0463 0.4262 -0.440 -0.551 0.852 0.764 0.6065
#> [3,] -0.323 0.307 0.3877 0.5228 -0.5408 -0.532 -0.758 0.346 -0.847 -0.6973
#> [4,] -0.779 -0.779 0.0879 0.8470 -0.1618 -0.320 -0.858 0.527 -0.784 0.0282
#> [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
#> [1,] -0.4176 -0.60760 0.502 -0.816 -0.568 -0.3325 -0.085 -0.656 -0.828
#> [2,] -0.6609 0.00662 0.296 0.480 -0.538 -0.3001 0.750 0.375 -0.358
#> [3,] 0.1750 0.25447 0.124 -0.367 0.129 -0.0108 0.711 0.751 -0.159
#> [4,] 0.0145 0.08370 -0.802 0.500 0.263 -0.0894 -0.637 0.416 -0.677
#>
#> $y
#> [1] 1 1 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1
#>
#> $u
#> [1] -0.170 0.427 0.918 -2.080
#>
#> $Sigma_inv
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0 0 0
#> [2,] 0 1 0 0
#> [3,] 0 0 1 0
#> [4,] 0 0 0 1
We work with \(\mat X_i^\top\) and \(\mat Z_i^\top\) for computational reasons. The function we need to pass to psqn
needs to take three arguments:
TRUE
if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).The function should return the element function value (potentially with the gradient as an attribute) or \(p\) and \(q_i\). Thus, an example in our case will be:
r_func <- function(i, par, comp_grad){
dat <- sim_dat[[i]]
X <- dat$X
Z <- dat$Z
if(length(par) < 1)
# requested the dimension of the parameter
return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z)))
y <- dat$y
Sigma_inv <- dat$Sigma_inv
beta <- par[1:p]
u_i <- par[1:q + p]
eta <- drop(beta %*% X + u_i %*% Z)
exp_eta <- exp(eta)
# compute the element function
out <- -sum(y * eta) + sum(log(1 + exp_eta)) +
sum(u_i * (Sigma_inv %*% u_i)) / 2
if(comp_grad){
# we also need to compute the gradient
d_eta <- -y + exp_eta / (1 + exp_eta)
grad <- c(X %*% d_eta,
Z %*% d_eta + dat$Sigma_inv %*% u_i)
attr(out, "grad") <- grad
}
out
}
Then we can optimize the function as follows:
library(psqn)
start_val <- numeric(p + n_ele_func * q) # the starting value
opt_res <- psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func)
# check the minimum
opt_res$value
#> [1] 11969
# check the estimated global parameters
head(opt_res$par, length(beta))
#> [1] 0.254 0.356 0.410 0.520 0.564
# should be close to
beta
#> [1] 0.258 0.365 0.447 0.516 0.577
We can also use the psqn_generic
function although it will be slower because of some additional computational overhead because the function is more general. The function we need to pass to psqn_generic
needs to take three arguments:
TRUE
if the function should return an attribute with the gradient with respect to \(\vec x_{\mathcal I_i}\).We assign the function we need to pass to psqn_generic
for the example in this vignette:
r_func_generic <- function(i, par, comp_grad){
dat <- sim_dat[[i]]
X <- dat$X
Z <- dat$Z
if(length(par) < 1)
# return the index set. This is one-based like in R
return(c(1:NROW(dat$X),
seq_len(NROW(dat$Z)) + NROW(dat$X) + (i - 1L) * NROW(dat$Z)))
y <- dat$y
Sigma_inv <- dat$Sigma_inv
beta <- par[1:p]
u_i <- par[1:q + p]
eta <- drop(beta %*% X + u_i %*% Z)
exp_eta <- exp(eta)
# compute the element function
out <- -sum(y * eta) + sum(log(1 + exp_eta)) +
sum(u_i * (Sigma_inv %*% u_i)) / 2
if(comp_grad){
# we also need to compute the gradient
d_eta <- -y + exp_eta / (1 + exp_eta)
grad <- c(X %*% d_eta,
Z %*% d_eta + dat$Sigma_inv %*% u_i)
attr(out, "grad") <- grad
}
out
}
Then we can optimize the function as follows:
opt_res_generic <- psqn_generic(
par = start_val, fn = r_func_generic, n_ele_func = n_ele_func)
# we get the same
all.equal(opt_res_generic$value, opt_res$value)
#> [1] TRUE
all.equal(opt_res_generic$par , opt_res$par)
#> [1] TRUE
# the generic version is slower
bench::mark(
psqn = psqn(par = start_val, fn = r_func, n_ele_func = n_ele_func),
psqn_generic = psqn_generic(
par = start_val, fn = r_func_generic, n_ele_func = n_ele_func),
min_iterations = 5)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 psqn 287ms 290ms 3.42 30.9MB 13.7
#> 2 psqn_generic 296ms 301ms 3.32 30.9MB 13.3
The package can also be used as a header-only library in C++. This can yield a very large reduction in computation time and be easy to implement with the Rcpp package. Two examples are shown in the psqn vignette (see vignette("psqn", package = "psqn")
).
There is also a BFGS implementation in the package. This can be used in R with the psqn_bfgs
function. The BFGS implementation can also be used in C++ using the psqn-bfgs.h file.