---
title: "CRE"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{CRE}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Installation
Installing from CRAN.
```{r, eval=FALSE}
install.packages("CRE")
```
Installing the latest developing version.
```{r, eval=FALSE}
library(devtools)
install_github("NSAPH-Software/CRE", ref = "develop")
```
Import.
```{r, eval=FALSE}
library("CRE")
```
# Arguments
__Data (required)__
**`y`** The observed response/outcome vector (binary or continuous).
**`z`** The treatment/exposure/policy vector (binary).
**`X`** The covariate matrix (binary or continuous).
__Parameters (not required)__
**`method_parameters`** The list of parameters to define the models used, including:
- **`ratio_dis`** The ratio of data delegated to the discovery sub-sample (default: 0.5).
- **`ite_method`** The method to estimate the individual treatment effect (default: "aipw") [1].
- **`learner_ps`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the propensity score estimation (default: "SL.xgboost", used only for "aipw","bart","cf" ITE estimators).
- **`learner_y`** The ([SuperLearner](https://CRAN.R-project.org/package=SuperLearner)) model for the outcome estimation (default: "SL.xgboost", used only for "aipw","slearner","tlearner" and "xlearner" ITE estimators).
**`hyper_params`** The list of hyper parameters to fine tune the method, including:
- **`intervention_vars`** Intervention-able variables used for Rules Generation (default: `NULL`).
- **`ntrees`** The number of decision trees for random forest (default: 20).
- **`node_size`** Minimum size of the trees' terminal nodes (default: 20).
- **`max_rules`** Maximum number of candidate decision rules (default: 50).
- **`max_depth`** Maximum rules length (default: 3).
- **`t_decay`** The decay threshold for rules pruning (default: 0.025).
- **`t_ext`** The threshold to define too generic or too specific (extreme) rules (default: 0.01).
- **`t_corr`** The threshold to define correlated rules (default: 1).
- **`stability_selection`** Method for stability selection for selecting the rules. `vanilla` for stability selection, `error_control` for stability selection with error control and `no` for no stability selection (default: `vanilla`).
- **`B`** Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20).
- **`subsample`** Bootstrap ratio subsample for stability selection in rules selection and uncertainty quantification in estimation (default: 0.5).
- **`offset`** Name of the covariate to use as offset (i.e. "x1") for T-Poisson ITE Estimation. `NULL` if not used (default: `NULL`).
- **`cutoff`** Threshold defining the minimum cutoff value for the stability scores in Stability Selection (default: 0.9).
- **`pfer`** Upper bound for the per-family error rate (tolerated amount of falsely selected rules) in Error Control Stability Selection (default: 1).
__Additional Estimates (not required)__
**`ite`** The estimated ITE vector. If given, both the ITE estimation steps in Discovery and Inference are skipped (default: `NULL`).
## Notes
### Options for the ITE estimation
**[1]** Options for the ITE estimation are as follows:
- [S-Learner](https://CRAN.R-project.org/package=SuperLearner) (`slearner`).
- [T-Learner](https://CRAN.R-project.org/package=SuperLearner) (`tlearner`)
- T-Poisson(`tpoisson`)
- [X-Learner](https://CRAN.R-project.org/package=SuperLearner) (`xlearner`)
- [Augmented Inverse Probability Weighting](https://CRAN.R-project.org/package=SuperLearner) (`aipw`)
- [Causal Forests](https://CRAN.R-project.org/package=grf) (`cf`)
- [Causal Bayesian Additive Regression Trees](https://CRAN.R-project.org/package=bartCause) (`bart`)
If other estimates of the ITE are provided in `ite` additional argument, both the ITE estimations in discovery and inference are skipped and those values estimates are used instead. The ITE estimator requires also an outcome learner and/or a propensity score learner from the [SuperLearner](https://CRAN.R-project.org/package=SuperLearner) package (i.e., "SL.lm", "SL.svm"). Both these models are simple classifiers/regressors. By default XGBoost algorithm is used for both these steps.
### Customized wrapper for SuperLearner
One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system.
```R
m_xgboost <- function(nthread = 12, ...) {
SuperLearner::SL.xgboost(nthread = nthread, ...)
}
```
Then use "m_xgboost", instead of "SL.xgboost".
# Examples
Example 1 (*default parameters*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
cre_results <- cre(y, z, X)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```
Example 2 (*personalized ite estimation*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
ite_pred <- ... # personalized ite estimation
cre_results <- cre(y, z, X, ite = ite_pred)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```
Example 3 (*setting parameters*)
```R
set.seed(9687)
dataset <- generate_cre_dataset(n = 1000,
rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]
method_params <- list(ratio_dis = 0.5,
ite_method ="aipw",
learner_ps = "SL.xgboost",
learner_y = "SL.xgboost")
hyper_params <- list(intervention_vars = c("x1","x2","x3","x4"),
offset = NULL,
ntrees = 20,
node_size = 20,
max_rules = 50,
max_depth = 3,
t_decay = 0.025,
t_ext = 0.025,
t_corr = 1,
stability_selection = "vanilla",
cutoff = 0.8,
pfer = 1,
B = 10,
subsample = 0.5)
cre_results <- cre(y, z, X, method_params, hyper_params)
summary(cre_results)
plot(cre_results)
ite_pred <- predict(cre_results, X)
```
More synthetic data sets can be generated using `generate_cre_dataset()`.