--- title: "Models in SLOPE" author: "Johan Larsson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: SLOPE.bib vignette: > %\VignetteIndexEntry{Models in SLOPE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center" ) ``` ## Models Sorted L-One Penalized Estimation (SLOPE) is the procedure of minimizing objectives of the type $$ \mathrm{minimize}\left\{F(\beta_0, \beta; X, y) + J(\beta; \alpha,\lambda) \right\}, $$ where $$ J(\beta; \alpha,\lambda) = \alpha \sum_{j=1}^p \lambda_j \lvert \beta \rvert_{(j)}, $$ with $\alpha \in \mathbb{R}_{+}$, $\lambda \in \mathbb{R}_+^p$ and $(j)$ represents an rank of the magnitudes of $\beta$ in descending order. $\lambda$ controls the shape of the penalty sequence, which needs to be non-increasing, and $\alpha$ controls the scale of that sequence. $X$ and $Y$ are the design matrix and response matrix, respectively, which are of dimensions $n \times p$ and $n \times m$ respectively. Except for multinomial logistic regression, $m = 1$. We assume that $F$ takes the following form: \[ F(\beta_0, \beta) = \frac{1}{n} \sum_{i=1}^n f(\beta_0 + x_i^\intercal \beta, y_i), \] where $f$ is a smooth and convex loss function from the family of generalized linear models (GLMs), $x_i$ is the $i$th row of the design matrix $X$, and $y_i$ is the $i$th row of the response matrix $Y$. SLOPE currently supports four different models from the family of generalized linear models (GLMs): - Least-squares (Gaussian) regression, - Logistic regression, - Multinomial logistic regression, and - Poisson regression. Due to the way GLMs are formulated, each model is defined by three components: a loss function \(f(\eta, y)\), a link function \(g(\mu)\), and an inverse link function \(g^{-1}(\eta)\). Here, \(\eta\) is the linear predictor, and \(\mu\) is the mean of the response variable. | Model | \(f(\eta, y)\) | \(g(\mu)\) | \(g^{-1}(\eta)\) | |:------------|:----------------------------------------------------------------------------------------------------:|:-----------------------------------------:|:------------------------------------------------------:| | Gaussian | \(\frac{1}{2}(y - \eta)^2\) | \(\mu\) | \(\eta\) | | Binomial | \(\log(1 + e^\eta) - \eta y\) | \(\log \left(\frac{\mu}{1 - \mu}\right)\) | \(\frac{e^\eta}{1 + e^\eta}\) | | Poisson | \(e^\eta - \eta y\) | \(\log(\mu)\) | \(e^\eta\) | | Multinomial | \(\sum_{k=1}^{m-1}\left( \log \left( 1 + \sum_{j=1}^{m-1} e^{\eta_j}\right) - y_k \eta_k \right)\) | \(\log\left(\frac{\mu}{1 - \mu}\right) \) | \(\frac{\exp(\eta)}{1 + \sum_{j=1}^{m-1} e^{\eta_j}}\) | : Loss functions, link functions, and inverse link functions for generalized linear models in the SLOPE package. Note that in the case of multinomial logistic regression, the input is vector-valued, and we allow \(\log\) and \(\exp\) to be overloaded to apply element-wise in these cases. ## Gaussian Regression Gaussian regression, also known as least-squares regression, is used for continuous response variables, and takes the following form: $$ f(\beta, \beta_0; X, y) = \frac{1}{2n} \sum_{i=1}^n \left(y_i - x_i^\intercal \beta - \beta_0\right)^2. $$ You select it by setting `family = "gaussian"` in the `SLOPE()` function. In the following example, we fit a Gaussian SLOPE model to the `bodyfat` data set, which is included in the package: ```{r} library(SLOPE) fit_gaussian <- SLOPE( x = bodyfat$x, y = bodyfat$y, family = "gaussian" ) ``` Often it's instructive to look at the solution path of the fitted model, which you can do by simply plotting the fitted object: ```{r} #| fig-width: 4.5 #| fig-height: 5 plot(fit_gaussian) ``` We can also print a summary of the fitted model: ```{r} summary(fit_gaussian) ``` SLOPE also contains standard methods for returning coefficients and making predictions. We can use `coef()` with a specified level of regularization `alpha`, to obtain the coefficients at that level (or omit `alpha` to get the coefficients for the full path). ```{r} coef(fit_gaussian, alpha = 0.05) ``` To make predictions on new data, we can use the `predict()` function, specifying the fitted model, new design matrix `x`, and the level of regularization `alpha`: ```{r} y_hat <- predict(fit_gaussian, x = bodyfat$x[1:5, ], alpha = 0.01) plot(bodyfat$y[1:5], y_hat) ``` ## Logistic Regression Logistic regression is used for binary classification problems, and takes the following form: $$ f(\beta_0, \beta; X, y) = \frac{1}{n} \sum_{i=1}^n \left\{ \log\left(1 + \exp\left(x_i^T \beta + \beta_0\right)\right) - y_i x_i^T \beta \right\}, $$ where $X_i$ is the $i$th row of the design matrix $X$. You select it by setting `family = "binomial"` in the `SLOPE()` function. ```{r} fit_logistic <- SLOPE( x = heart$x, y = heart$y, family = "binomial" ) ``` You can plot the solution path, print a summary, extract coefficients, and make predictions in the same way as for Gaussian regression: ```{r} #| fig-width: 4.5 #| fig-height: 5 plot(fit_logistic) ``` ## Poisson Regression Poisson regression is used for count data, and takes the following form: $$ f(\beta_0, \beta; X, y) = \frac{1}{n} \sum_{i=1}^n \left\{ \exp\left(x_i^T \beta + \beta_0\right) - y_i x_i^T \beta \right\}. $$ You select it by setting `family = "poisson"` in the `SLOPE()` function. Note that the solving the Poisson regression problem is a notoriously difficult optimization problem, which is not Lipschitz-smooth. As such, convergence may be slow. SLOPE features safeguards to ensure convergence by modifying step sizes, but this may lead to long computation times in some cases. In the next example, we fit a Poisson SLOPE model to the `abalone` data set, which consists of observations of abalones. ```{r} #| fig-width: 4.5 #| fig-height: 5 fit_poisson <- SLOPE( x = abalone$x, y = abalone$y, family = "poisson" ) plot(fit_poisson) ``` ### Numerical Considerations Sometimes it may be beneficial to pick a gradient descent solver instead of the default coordinate descent solver. ## Multinomial Logistic Regression Multinomial logistic regression is used for multi-class classification problems, and takes the following form: $$ f(\beta_0, \beta; X, Y) = \frac{1}{n} \sum_{i=1}^n \left\{ \log\left(1 + \sum_{k=1}^{m-1} \exp\left(x_i^T \beta_k + \beta_{0k}\right)\right) - \sum_{k=1}^{m-1} y_{ik} \left(x_i^T \beta_k + \beta_{0k}\right) \right\}, $$ where $X_i$ is the $i$th row of the design matrix $X$, and $Y$ is the response matrix of dimension $n \times m$, where $m$ is the number of classes. In our package, we use the non-redundant formulation, where the last class is treated as the baseline class. Thus, only $m-1$ sets of coefficients are estimated. This is not the case in other packages, such as `glmnet` [@friedman2010]. To select multinomial logistic regression, set `family = "multinomial"` in the `SLOPE()` function. In the following example, we fit a multinomial logistic SLOPE model to the `wine` data set, which consists of 178 observations of wines from three different cultivars. The task is to classify the cultivar based on 13 different chemical properties. ```{r} fit_multinomial <- SLOPE( x = wine$x, y = wine$y, family = "multinomial" ) ``` Note that the coefficients are now a matrix. To retain sparsity, we have chosen to return the full set of coefficients along the path as a list of sparse matrices. Here we extract the coefficients for the second step in the solution path, which consists of one matrix, with one column for the coefficients of the first two classes (the third class is the baseline class): ```{r} fit_multinomial$coefficients[[2]] ``` By default, `coef()` simplifies the output and returns three lists of coefficient vectors, one for each class (excluding the baseline class). For similar reasons, plotting the paths is also a bit different and in SLOPE we plot a path for each class (excluding the baseline class), which resembles the result from calling `coef()`. Because we use the base R plotting system, you need to setup a multi-panel layout to see all the paths: ```{r} #| fig-width: 6 #| fig-height: 4 par(mfrow = c(1, 2)) plot(fit_multinomial) ``` ## References