---
title: "DDESONN vs Keras — 1000-Seed Summary — Heart Failure"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DDESONN vs Keras — 1000-Seed Summary — Heart Failure}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

## Overview

This vignette summarizes DDESONN results across **1000 randomized seeds**
(two separate 500-seed runs) and compares them against a Keras benchmark
summary stored in an Excel workbook bundled with the package.

The purpose of this benchmark is not to showcase a single favorable run.
Instead, it evaluates **distributional behavior across many random
initializations**, with emphasis on:

- mean performance
- variance and standard deviation
- worst-case behavior
- reproducibility under repeated stress

In this context, stronger stability across seeds is important because it
indicates that the training procedure is less sensitive to random
initialization and therefore more dependable at scale.

## Where the demo data lives

The four RDS artifacts included with the package are stored under:

```text
inst/extdata/heart_failure_runs/
├─ run1/
│  ├─ SingleRun_Train_Acc_Val_Metrics_500_seeds_20251025.rds
│  └─ SingleRun_Test_Metrics_500_seeds_20251025.rds
└─ run2/
   ├─ SingleRun_Train_Acc_Val_Metrics_500_seeds_20251026.rds
   └─ SingleRun_Test_Metrics_500_seeds_20251026.rds
```

Each folder represents one 500-seed run performed locally; together they
form the 1000-seed composite.

## Motivation and comparison philosophy

This benchmark addresses a focused research question:

> Can a fully R-native, from-first-principles neural network implementation
> achieve competitive statistical stability relative to an established
> deep-learning framework under repeated randomized initialization?

The Keras comparison is included as a **reference benchmark**, not as an
implementation template. DDESONN was built independently from scratch and
was not derived from Keras source code.

## Load DDESONN runs and build the summary

```{r setup, message=FALSE, warning=FALSE}
suppressPackageStartupMessages({
  library(dplyr)
  library(tibble)
  library(knitr)
})

if (!requireNamespace("DDESONN", quietly = TRUE)) {
  message("DDESONN not installed in this build session; skipping evaluation.")
  knitr::opts_chunk$set(eval = FALSE)
}
```

```{r helpers, message=FALSE, warning=FALSE}
.render_tbl <- function(x, title = NULL, digits = 4) {
  if (requireNamespace("DDESONN", quietly = TRUE) &&
      exists("ddesonn_viewTables", envir = asNamespace("DDESONN"), inherits = FALSE)) {
    get("ddesonn_viewTables", envir = asNamespace("DDESONN"))(x, title = title)
  } else {
    if (!is.null(title)) cat("\n\n###", title, "\n\n")
    knitr::kable(x, digits = digits, format = "html")
  }
}
```

```{r ddesonn-summary, message=FALSE, warning=FALSE, results='asis'}
heart_failure_root <- system.file("extdata", "heart_failure_runs", package = "DDESONN")

if (!nzchar(heart_failure_root)) {
  # Fallback when building from source before installation
  heart_failure_root <- file.path("..", "inst", "extdata", "heart_failure_runs")
}

stopifnot(dir.exists(heart_failure_root))

train_run1_path <- file.path(
  heart_failure_root, "run1",
  "SingleRun_Train_Acc_Val_Metrics_500_seeds_20251025.rds"
)
test_run1_path <- file.path(
  heart_failure_root, "run1",
  "SingleRun_Test_Metrics_500_seeds_20251025.rds"
)
train_run2_path <- file.path(
  heart_failure_root, "run2",
  "SingleRun_Train_Acc_Val_Metrics_500_seeds_20251026.rds"
)
test_run2_path <- file.path(
  heart_failure_root, "run2",
  "SingleRun_Test_Metrics_500_seeds_20251026.rds"
)

stopifnot(
  file.exists(train_run1_path),
  file.exists(test_run1_path),
  file.exists(train_run2_path),
  file.exists(test_run2_path)
)

train_run1 <- readRDS(train_run1_path)
test_run1  <- readRDS(test_run1_path)
train_run2 <- readRDS(train_run2_path)
test_run2  <- readRDS(test_run2_path)

train_all <- dplyr::bind_rows(train_run1, train_run2)
test_all  <- dplyr::bind_rows(test_run1, test_run2)

train_seed <- train_all %>%
  group_by(seed) %>%
  slice_max(order_by = best_val_acc, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  transmute(
    seed,
    train_acc = best_train_acc,
    val_acc   = best_val_acc
  )

test_seed <- test_all %>%
  group_by(seed) %>%
  slice_max(order_by = accuracy, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  transmute(
    seed,
    test_acc = accuracy
  )

merged <- inner_join(train_seed, test_seed, by = "seed") %>%
  arrange(seed)

summarize_column <- function(x) {
  pct <- function(p) stats::quantile(x, probs = p, names = FALSE, type = 7)
  data.frame(
    count = length(x),
    mean  = mean(x),
    std   = sd(x),
    min   = min(x),
    `25%` = pct(0.25),
    `50%` = pct(0.50),
    `75%` = pct(0.75),
    max   = max(x),
    check.names = FALSE
  )
}

summary_train <- summarize_column(merged$train_acc)
summary_val   <- summarize_column(merged$val_acc)
summary_test  <- summarize_column(merged$test_acc)

summary_all <- data.frame(
  stat = c("count","mean","std","min","25%","50%","75%","max"),
  train_acc = unlist(summary_train[1, ]),
  val_acc   = unlist(summary_val[1, ]),
  test_acc  = unlist(summary_test[1, ]),
  check.names = FALSE
)

round4 <- function(x) if (is.numeric(x)) round(x, 4) else x
pretty_summary <- as.data.frame(lapply(summary_all, round4))

.render_tbl(
  pretty_summary,
  title = "DDESONN — 1000-seed summary (train/val/test)"
)
```

## Keras parity (Excel, Sheet 2)

Keras parity results are stored in an Excel workbook included with the package
under:

```text
inst/scripts/vsKeras/1000SEEDSRESULTSvsKeras/1000seedsKeras.xlsx
```

The file is accessed programmatically using `system.file()` so the path remains
CRAN-safe and cross-platform.

```{r keras-summary, message=FALSE, warning=FALSE, results='asis'}
if (!requireNamespace("readxl", quietly = TRUE)) {
  message("Skipping keras-summary chunk: 'readxl' not installed.")
} else {
  keras_path <- system.file(
    "scripts", "vsKeras", "1000SEEDSRESULTSvsKeras", "1000seedsKeras.xlsx",
    package = "DDESONN"
  )

  if (nzchar(keras_path) && file.exists(keras_path)) {
    keras_stats <- readxl::read_excel(keras_path, sheet = 2)
    .render_tbl(
      keras_stats,
      title = "Keras — 1000-seed summary (Sheet 2)"
    )
  } else {
    cat("Keras Excel not found in installed package.\n")
  }
}
```

## 🔬 Benchmark results across 1000 seeds

Across **1000 random neural network initializations**, DDESONN demonstrated
stronger stability than the Keras benchmark model on this heart-failure task.

```{r benchmark-comparison, message=FALSE, warning=FALSE, results='asis'}
benchmark_results <- data.frame(
  Metric = c(
    "Mean Test Accuracy",
    "Standard Deviation",
    "Minimum Test Accuracy",
    "Maximum Test Accuracy"
  ),
  DDESONN = c("≈ 99.92%", "≈ 0.0013", "≈ 99.20%", "100%"),
  Keras   = c("≈ 99.69%", "≈ 0.0036", "≈ 97.82%", "100%"),
  check.names = FALSE
)

.render_tbl(
  benchmark_results,
  title = "Benchmark results across 1000 seeds"
)
```

These results suggest that DDESONN achieved:

- **higher average test accuracy**
- **materially lower variance across seeds**
- **stronger minimum-case performance**
- **equal best-case ceiling performance**

This is important because lower variance implies the model is less sensitive
to randomized initialization and more dependable across repeated training runs.

## Why this matters for large-scale projects

### Enterprise machine learning pipelines

In large corporate environments, teams may train hundreds or thousands of
models across changing datasets, validation windows, and deployment cycles.
A lower-variance model reduces the need for repeated retraining simply to
obtain a “good seed,” which lowers compute cost and improves operational
predictability.

### Trading and financial systems

In trading, portfolio analytics, execution modeling, or risk forecasting,
model instability can create inconsistent outputs across retrains. A model
that is more stable across seeds can improve confidence in:

- signal consistency,
- scenario analysis,
- risk control calibration,
- production retraining pipelines.

This does not guarantee trading profitability, but it does support stronger
engineering reliability and more reproducible model behavior.

### Healthcare and regulated environments

In healthcare and other regulated domains, reproducibility matters because
stakeholders need confidence that retraining the same workflow will not
produce materially unstable outcomes. Lower dispersion across seeds can help
support validation, governance, and auditability.

### Aerospace and autonomous systems

In mission-critical environments such as autonomous control or space-related
analytics, reproducibility and reliability are essential. More stable training
behavior can be valuable when models need to be trusted under constrained or
high-stakes deployment settings.

## Reproducibility notes

These results aggregate **two independent 500-seed runs** performed locally.

A master seed was **not** set for those original runs. Since then:

- DDESONN benchmarking has been updated to use a master seed in
  `TestDDESONN_1000seeds.R`
- Keras parity benchmarking has been updated to use a synchronized master seed
  in `TestKeras_1000seeds.py`

Keras raw and summary outputs are compiled in:

```text
inst/scripts/vsKeras/1000SEEDSRESULTSvsKeras/1000seedsKeras.xlsx
```

## Distributed execution (scaling note)

The results shown here were computed locally.

For large-scale experiments involving hundreds or thousands of seeds,
DDESONN can be executed in distributed environments to reduce wall-clock
time significantly. Distributed orchestration and development-stage scaling
scripts are maintained in the GitHub repository and are intentionally
excluded from the CRAN package so this vignette remains focused on validated
results and benchmark methodology.