MLModel
is a function supplied by the
MachineShop package. It allows for the integration of
statistical and machine learning models supplied by other R packages
with the MachineShop model fitting, prediction, and
performance assessment tools.
The following are guidelines for writing model constructor
functions that are wrappers around the MLModel
function.
In this context, the term “constructor” refers to the wrapper function and “source package” to the package supplying the original model implementation.
The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments.
The source package defaults will be used for parameters with
NULL
values.
Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments.
Include all external packages whose functions are called directly from within the constructor.
Use :: to reference source package functions.
"binary"
,
"factor"
, "matrix"
, "numeric"
,
"ordered"
, and/or "Surv"
) that can be analyzed
with the model.new_params(environment())
if all arguments
are to be passed to the source package fit function as supplied.
Additional steps may be needed to pass the constructor arguments to the
source package in a different format; e.g., when some model parameters
must be passed in a control structure, as in C50Model
and
CForestModel
.The first three arguments should be formula
,
data
, and weights
followed by an ellipsis
(...
).
If weights are not supported, the following, or equivalent, should be included in the function:
Only add elements to the resulting fit object if they are needed
and will be used in the predict
or varimp
functions.
Return the fit object.
The arguments are a model fit object
,
newdata
frame, optionally times
for prediction
at survival time points, and an ellipsis.
The predict function should return a vector or column matrix of
probabilities for the second level of binary factors, a matrix whose
columns contain the probabilities for factors with more than two levels,
a matrix of predicted responses if matrix, a vector or column matrix of
predicted responses if numeric, a matrix whose columns contain survival
probabilities at times
if supplied, or a vector of
predicted survival means if times
are not
supplied.
Should have a single model fit object
argument
followed by an ellipsis.
Variable importance results should generally be returned as a
vector with elements named after the corresponding predictor variables.
The package will handle conversions to a data frame and
VariableImportance
object. If there is more than one set of
relevant variable importance measures, they can be returned as a matrix
or data frame with predictor variable names as the row names.
Include the first sentences from the source package.
Start sentences with the parameter value type (logical, numeric, character, etc.).
Start sentences with lowercase.
Omit indefinite articles (a, an, etc.) from the starting sentences.
Include response types (binary, factor, matrix, numeric, ordered, and/or Surv).
Include the following sentence:
Default values for the arguments and further model details can be found in the source link below.
MLModel class object.
\code{\link[<source package>]{<fit function>}}, \code{\link{fit}},
\code{\link{resample}}
If adding a new model to the package, save its source code in a
file whose name begins with “ML_” followed by the model name, and ending
with a .R extension; e.g., "R/ML_CustomModel.R"
.
Export the model in NAMESPACE
.
Add any required packages to the “Suggests” section of
DESCRIPTION
.
Add the model to R/models.R
.
Add the model to R/modelinfo.R
.
Add a unit testing file to tests/testthat
.