The optRF package provides tools for optimizing the number of trees in a random forest to improve model stability and reproducibility. Since random forest is a non-deterministic method, variable importance and prediction results can vary between runs. The optRF package estimates the stability of random forest based on the number of trees and helps users determine the optimal number of trees required for reliable predictions and variable selection.
To install the optRF R package from CRAN, just run
install.packages("optRF")
R version >= 3.6 is required.
You can install the development version of optRF from GitHub using
devtools
with:
::install_github("tmlange/optRF") devtools
The optRF package includes the SNPdata
data set for
demonstration purposes. The two main functions are:
opt_prediction
– Finds the optimal number of trees for
stable predictions.opt_importance
– Finds the optimal number of trees for
stable variable importance estimates.library(optRF)
# Load example data set
data(SNPdata)
# Optimise random forest for predicting the first column in SNPdata
= opt_prediction(y = SNPdata[,1], X=SNPdata[,-1])
result_optpred summary(result_optpred)
# Optimise random forest for calculating variable importance
= opt_importance(y = SNPdata[,1], X=SNPdata[,-1])
result_optimp summary(result_optimp)
For detailed examples and explanations, refer to the package vignettes:
optRF
– General package overviewopt_prediction
– Optimizing random forest
predictionsopt_importance
– Optimizing random forest variable
importance estimationIf you use optRF in your research, please cite:
Lange, T. M.,
Heinrich, F., Gültas, M. and Schmitt, A. O. (2024). optRF: Optimising
random forest stability by determining the optimal number of trees.
PREPRINT (Version 1) available at Research Square.