% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/SVEMnet.R
\name{SVEMnet}
\alias{SVEMnet}
\title{Fit an SVEMnet Model (with optional relaxed elastic net)}
\usage{
SVEMnet(
  formula,
  data,
  nBoot = 200,
  glmnet_alpha = c(0.25, 0.5, 0.75, 1),
  weight_scheme = c("SVEM", "FWR", "Identity"),
  objective = c("auto", "wAIC", "wBIC", "wSSE"),
  auto_ratio_cutoff = 1.3,
  relaxed = TRUE,
  response = NULL,
  unseen = c("warn_na", "error"),
  ...
)
}
\arguments{
\item{formula}{A formula specifying the model to be fitted, OR a bigexp_spec
created by \code{bigexp_terms()}.}

\item{data}{A data frame containing the variables in the model.}

\item{nBoot}{Number of bootstrap iterations (default 200).}

\item{glmnet_alpha}{Elastic Net mixing parameter(s). May be a vector with
entries in the range between 0 and 1, inclusive, where alpha = 1 is Lasso
and alpha = 0 is Ridge. Defaults to \code{c(0.25, 0.5, 0.75, 1)}.}

\item{weight_scheme}{Weighting scheme for SVEM (default "SVEM").
One of \code{"SVEM"}, \code{"FWR"}, or \code{"Identity"}.}

\item{objective}{Objective used to pick lambda on each bootstrap path
(default "auto"). One of \code{"auto"}, \code{"wAIC"}, \code{"wBIC"}, or \code{"wSSE"}.}

\item{auto_ratio_cutoff}{Single cutoff for the automatic rule when
\code{objective = "auto"} (default 1.3). Let \code{r = n_X / p_X}, where \code{n_X} is the
number of training rows and \code{p_X} is the number of predictors in the model
matrix after dropping the intercept column. If \code{r >= auto_ratio_cutoff},
SVEMnet uses wAIC; otherwise it uses wBIC.}

\item{relaxed}{Logical, TRUE or FALSE (default TRUE). When TRUE, use glmnet's
relaxed elastic net path and select both lambda and relaxed gamma on each bootstrap.
When FALSE, fit the standard glmnet path. Note: if \code{relaxed = TRUE} and
\code{glmnet_alpha} includes 0 (ridge), alpha = 0 is dropped.}

\item{response}{Optional character. When \code{formula} is a \code{bigexp_spec}, this names
the response column to use on the LHS; defaults to the response stored in the spec.}

\item{unseen}{How to treat unseen factor levels when \code{formula} is a \code{bigexp_spec}:
\code{"warn_na"} (default; convert to NA with a warning) or \code{"error"} (stop).}

\item{...}{Additional args passed to \code{glmnet()} (e.g., \code{penalty.factor},
\code{lower.limits}, \code{upper.limits}, \code{offset}, \code{standardize.response}, etc.).
Any user-supplied \code{weights} are ignored so SVEM can supply its own bootstrap weights.
Any user-supplied \code{standardize} is ignored; SVEMnet always uses \code{standardize = TRUE}.}
}
\value{
An object of class \code{svem_model} with elements:
\itemize{
\item \code{parms}: averaged coefficients (including intercept).
\item \code{parms_debiased}: averaged coefficients adjusted by the calibration fit.
\item \code{debias_fit}: \code{lm(y ~ y_pred)} calibration model used for debiasing (or \code{NULL}).
\item \code{coef_matrix}: per-bootstrap coefficient matrix.
\item \code{nBoot}, \code{glmnet_alpha}, \code{best_alphas}, \code{best_lambdas}, \code{weight_scheme}, \code{relaxed}.
\item \code{best_relax_gammas}: per-bootstrap relaxed gamma chosen (NA if \code{relaxed = FALSE}).
\item \code{objective_input}, \code{objective_used}, \code{objective} (same as \code{objective_used}),
\code{auto_used}, \code{auto_decision}, \code{auto_rule}.
\item \code{dropped_alpha0_for_relaxed}: whether alpha = 0 was removed because \code{relaxed = TRUE}.
\item \code{schema}: list(\code{feature_names}, \code{terms_str}, \code{xlevels}, \code{contrasts}, \code{terms_hash}) for safe predict.
\item \code{sampling_schema}: list(
\code{predictor_vars}, \code{var_classes},
\code{num_ranges} = rbind(min=..., max=...) for numeric raw predictors,
\code{factor_levels} = list(...) for factor/character raw predictors).
\item \code{diagnostics}: list with \code{k_summary} (median and IQR of selected size),
\code{fallback_rate}, \code{n_eff_summary}, \code{alpha_freq}, \code{relax_gamma_freq}.
\item \code{actual_y}, \code{training_X}, \code{y_pred}, \code{y_pred_debiased}, \code{nobs}, \code{nparm}, \code{formula}, \code{terms},
\code{xlevels}, \code{contrasts}.
\item \code{used_bigexp_spec}: logical flag indicating whether a \code{bigexp_spec} was used.
}
}
\description{
Wrapper for glmnet (Friedman et al. 2010) to fit an ensemble of Elastic Net
models using the Self-Validated Ensemble Model method (SVEM; Lemkus et al. 2021),
with an option to use glmnet's built-in relaxed elastic net (Meinshausen 2007).
Supports searching over multiple alpha values in the Elastic Net penalty.
}
\details{
You can pass either:
\itemize{
\item a standard model formula, e.g. y ~ X1 + X2 + X3 + I(X1^2) + (X1 + X2 + X3)^2
\item a bigexp_spec created by bigexp_terms(), in which case SVEMnet will prepare
the data deterministically (locked types/levels) and, if requested, swap
the response to fit multiple independent responses over the same expansion.
}

SVEM applies fractional bootstrap weights to training data and anti-correlated
weights for validation when \code{weight_scheme = "SVEM"}. For each bootstrap, glmnet
paths are fit for each alpha in \code{glmnet_alpha}, and the lambda (and, if \code{relaxed = TRUE},
relaxed gamma) minimizing a weighted validation criterion is selected.

Predictors are always standardized internally via \code{glmnet::glmnet(..., standardize = TRUE)}.

When \code{relaxed = TRUE}, \code{coef(fit, s = lambda, gamma = g)} is used to obtain the
coefficient path at each relaxed gamma in the internal grid. Metrics are computed
from validation-weighted errors and model size is taken as the number of nonzero
coefficients including the intercept (support size), keeping selection consistent
between standard and relaxed paths.

Automatic objective rule ("auto"): This function uses a single ratio cutoff,
\code{auto_ratio_cutoff}, applied to \code{r = n_X / p_X}, where \code{p_X} is computed from
the model matrix with the intercept column removed. If \code{r >= auto_ratio_cutoff}
the selection criterion is wAIC; otherwise it is wBIC.

Implementation notes for safety:
\itemize{
\item The training terms object is stored with environment set to \code{baseenv()} to avoid
accidental lookups in the calling environment.
\item A compact schema (feature names, xlevels, contrasts) is stored to let \code{predict()}
reconstruct the design matrix deterministically.
\item A lightweight sampling schema (numeric ranges and factor levels for raw predictors)
is cached to enable random-table generation without needing the original data.
}
}
\section{Acknowledgments}{

Development of this package was assisted by GPT o1-preview for structuring parts of the code
and documentation.
}

\examples{
set.seed(42)

n  <- 30
X1 <- rnorm(n)
X2 <- rnorm(n)
X3 <- rnorm(n)
eps <- rnorm(n, sd = 0.5)
y <- 1 + 2*X1 - 1.5*X2 + 0.5*X3 + 1.2*(X1*X2) + 0.8*(X1^2) + eps
dat <- data.frame(y, X1, X2, X3)

# Minimal hand-written expansion
mod_relax <- SVEMnet(
  y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2),
  data          = dat,
  glmnet_alpha  = c(1, 0.5),
  nBoot         = 75,
  objective     = "auto",
  weight_scheme = "SVEM",
  relaxed       = FALSE
)

pred_in_raw <- predict(mod_relax, dat, debias = FALSE)
pred_in_db  <- predict(mod_relax, dat, debias = TRUE)

\donttest{
# ---------------------------------------------------------------------------
# Big expansion (full factorial + response surface + partial cubic)
# Build once, reuse for one or more responses
# ---------------------------------------------------------------------------
spec <- bigexp_terms(
  y ~ X1 + X2 + X3, data = dat,
  factorial_order    = 3,      # allow 3-way factorials
  include_pc_3way    = FALSE,  # set TRUE to add I(X^2):Z:W
  include_pure_cubic = FALSE
)

# Fit using the spec (auto-prepares data)
fit_y <- SVEMnet(
  spec, dat,
  glmnet_alpha  = c(1, 0.5),
  nBoot         = 50,
  objective     = "auto",
  weight_scheme = "SVEM",
  relaxed       = FALSE
)

# A second, independent response over the same expansion
set.seed(99)
dat$y2 <- 0.5 + 1.4*X1 - 0.6*X2 + 0.2*X3 + rnorm(n, 0, 0.4)
fit_y2 <- SVEMnet(
  spec, dat, response = "y2",
  glmnet_alpha  = c(1, 0.5),
  nBoot         = 50,
  objective     = "auto",
  weight_scheme = "SVEM",
  relaxed       = FALSE
)

p1  <- predict(fit_y,  dat)
p2  <- predict(fit_y2, dat, debias = TRUE)

# Show that a new batch expands identically under the same spec
newdat <- data.frame(
  y  = y,
  X1 = X1 + rnorm(n, 0, 0.05),
  X2 = X2 + rnorm(n, 0, 0.05),
  X3 = X3 + rnorm(n, 0, 0.05)
)
prep_new <- bigexp_prepare(spec, newdat)
stopifnot(identical(
  colnames(model.matrix(spec$formula, bigexp_prepare(spec, dat)$data)),
  colnames(model.matrix(spec$formula, prep_new$data))
))
preds_new <- predict(fit_y, prep_new$data)
}

}
\references{
Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. \emph{JMP Discovery Conference}. \url{https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849873/redirect_from_archived_page/true}

Karl, A. T. (2024). A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM). \emph{Chemometrics and Intelligent Laboratory Systems}, \emph{249}, 105122. \doi{10.1016/j.chemolab.2024.105122}

Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments. JMP Discovery Conference. \doi{10.13140/RG.2.2.34598.40003/1}

Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Design of Experiments. \emph{Chemometrics and Intelligent Laboratory Systems}, 219, 104439. \doi{10.1016/j.chemolab.2021.104439}

Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. \emph{The American Statistician}, 74(4), 345–358. \doi{10.1080/00031305.2020.1731599}

Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space Filling Mixture Designs, Neural Networks and SVEM. \emph{JMP Discovery Conference}. \url{https://community.jmp.com/t5/Abstracts/Accelerating-Innovation-with-Space-Filling-Mixture-Designs/ev-p/756841}

Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. \emph{JMP Discovery Conference - Europe}. \url{https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849647/redirect_from_archived_page/true}

Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments. \emph{JMP Discovery Conference - Europe}. \url{https://community.jmp.com/t5/Abstracts/SVEM-A-Paradigm-Shift-in-Design-and-Analysis-of-Experiments-2021/ev-p/756634}

Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It's All About Prediction. \emph{JMP Discovery Conference}.

Friedman, J. H., Hastie, T., and Tibshirani, R. (2010).
Regularization Paths for Generalized Linear Models via Coordinate Descent.
Journal of Statistical Software, 33(1), 1-22.

Meinshausen, N. (2007).
Relaxed Lasso. Computational Statistics & Data Analysis, 52(1), 374-393.
}
