---
title: "Logistic report template"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Report_template}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

# Statement of the problem from the customer's perspective

# History of the problem, previous results

# Exploratory data analysis

-   Head of the data

    -   Discuss the characteristics of each feature.

-   Barchart of target (0 or 1) vs each feature, by percent (%)

    -   Discussion of y vs target variables

-   Boxplots of the numeric data (insert plot here)

    -   Discussion of boxplots of the numeric data

-   Histograms of each numeric column (insert plot here)

    -   Discussion of histograms of each numeric column

-   Data summary (insert table here)

    -   Discussion of the data summary

-   Outliers in the data (insert outliers data here)

    -   Discussion of outliers in the data

-   Correlation of the data (table)

-   Correlation plot of the numeric data as circles and colors

-   Correlation of the ensemble

-   Variance Inflation Factor

-   The stories in the exploratory data analysis

# 24 logistic models (Individual models then ensembles, in alphabetical order)

One paragraph summary about statistical modeling here

-   Cubist

    cubist_train_fit \<- Cubist::cubist(x = as.data.frame(train), y = train\$y)

-   Flexible Discriminant Analysis

    fda_train_fit \<- MachineShop::fit(as.factor(y) \~ ., data = train01, model = "FDAModel")

-   GAM (Generalized Additive Models) (uses smoothing splines)

    f2 \<- stats::as.formula(paste0("y \~", paste0("gam::s(", names_df, ")", collapse = "+")))

    gam_train_fit \<- gam(f2, data = train1)

-   Generalized Linear Models

    glm_train_fit \<- stats::glm(y \~ ., data = train, family = binomial)

-   Lasso (uses best model)

    best_lasso_lambda \<- lasso_cv\$lambda.min

    best_lasso_model \<- glmnet(x, y, alpha = 1, lambda = best_lasso_lambda)

-   Linear (tuned)

    linear_train_fit \<- e1071::tune.rpart(formula = y \~ ., data = train)

-   Linear Discriminant Analysis

    lda_train_fit \<- MASS::lda(as.factor(y) \~ ., data = train01, model = "LMModel")

-   Penalized Discriminant Analysis

    pda_train_fit \<- MachineShop::fit(as.factor(y) \~ ., data = train01, model = "PDAModel")

-   Quadratic Discriminant Analysis

    qda_train_fit \<- MASS::qda(as.factor(y) \~ ., data = train01)

-   Random Forest

    rf_train_fit \<- randomForest(x = train, y = as.factor(y_train), data = df, family = binomial(link = "logit"))

-   Ridge

    best_ridge_lambda \<- ridge_cv\$lambda.min

    best_ridge_model \<- glmnet(x, y, alpha = 0, lambda = best_ridge_lambda)

-   RPart

    rpart_train_fit \<- rpart::rpart(train\$y \~ ., data = train)

-   SVM (Support Vector Machines) (tuned)

    svm_train_fit \<- e1071::tune.svm(x = train, y = train\$y, data = train)

-   Tree

    tree_train_fit \<- tree::tree(train\$y \~ ., data = train)

    **Ensemble models start here**

-   Ensemble Gradient Boosted

    ensemble_gb_train_fit \<- gbm::gbm(ensemble_train\$y_ensemble \~ ., data = ensemble_train, distribution = "gaussian", n.trees = 100, shrinkage = 0.1, interaction.depth = 10 )

-   Ensemble Lasso (uses best model)

    ensemble_best_lasso_lambda \<- ensemble_lasso_cv\$lambda.min

    ensemble_best_lasso_model \<- glmnet(ensemble_x, ensemble_y, alpha = 1, lambda = ensemble_best_lasso_lambda)

-   Ensemble Partial Least Squares

    ensemble_pls_train_fit \<- MachineShop::fit(as.factor(y) \~ ., data = ensemble_train, model = "PLSModel")

-   Ensemble Penalized Discriminant Analysis

    ensemble_pda_train_fit \<- MachineShop::fit(as.factor(y) \~ ., data = ensemble_train, model = "PDAModel")

-   Ensemble Ridge

    x = model.matrix(y \~ ., data = ensemble_train)[, -1]

    y = ensemble_train\$y

    ensemble_ridge_train_fit \<- glmnet::glmnet(x, y, alpha = 0)

-   Ensemble RPart

    ensemble_rpart_train_fit \<- MachineShop::fit(as.factor(y) \~ ., data = ensemble_train, model = "RPartModel")

-   Ensemble Support Vector Machines (SVM)

    ensemble_svm_train_fit \<- e1071::svm(as.factor(y) \~ ., data = ensemble_train, kernel = "radial", gamma = 1, cost = 1)

-   Ensemble Trees

    ensemble_tree_train_fit \<- tree::tree(ensemble_train\$y \~ ., data = ensemble_train)

-   **The stories in the models (fill in here)**

# Ensembles and individual model plots

-   Negative predictive value (fixed scales)

-   Negative predictive value (free scales)

-   Positive predictive value (fixed scales)

-   Positive predictive value (free scales)

-   F1 Score (fixed scales)

-   F1 Score (free scales)

-   False negative rate (fixed scales)

-   False negative rate (free scales)

-   False positive rate (fixed scales)

-   False positive rate (free scales)

-   True negative rate (fixed scales)

-   True negative rate (free scales)

-   True positive rate (fixed scales)

-   True positive rate (free scales)

-   ROC Curves for each of the 24 models

-   Over or under fitting (closer to 1 is better) barchart

-   Duration (mean) by model barchart

-   Overfitting by model and resample, fixed scales

-   Overfitting by model and resample, free scales

-   Model accuracy bar chart

-   Accuracy by model and resample, including train and holdout by each resample, fixed scales

-   Accuracy by model and resample, including train and holdout by each resample, free scales

-   **Summary report**

    -   Accuracy (mean)

    -   Accuracy (standard deviation)

    -   True positive rate (also known as sensitivity)

    -   True negative rate (also known as specificity)

    -   False positive rate (also known as Type I error)

    -   False negative rate (also known as Type II error)

    -   Positive predictive value

    -   Negative predictive value

    -   F1 score

    -   Area under the curve (AUC)

    -   Overfitting (mean)

    -   Overfitting (standard deviation)

    -   Duration (mean)

    -   Duration (standard deviation)

-   Function call

-   Warnings or errors

-   The stories in the plots

# Strongest evidence based results:

-   Most accurate models with error ranges

-   Strongest predictor with error ranges

-   The stories of the strongest evidenced based data

# Five strongest evidence based recommendations

# Conclusions

# References