---
title: "Getting Started"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include = FALSE}
library(mighty.metadata)
```

This vignette walks you through the key workflows for defining and working with
ADaM specifications: loading and editing a single domain, assembling a
multi-domain study, propagating metadata across domains, applying conditional
includes, and producing flat output for downstream tooling.

## The YAML Specification Format

Each ADaM domain is defined in a YAML file. The simplest case is a
subject-level dataset like ADSL, which only has columns:

```yaml
id: ADSL
label: Subject-Level Analysis Dataset
class: SUBJECT LEVEL ANALYSIS DATASET
structure: One record per subject
keys: USUBJID

population:
  base:
    - domain: DM
      depends: [USUBJID]
      filter: USUBJID != ""

columns:
  - id: STUDYID
    label: Study Identifier
    method: DM.STUDYID
    core: Req
  - id: USUBJID
    label: Unique Subject Identifier
    method: DM.USUBJID
    core: Req
  # ... more columns ...
```

Top-level keys define the domain identity (`id`, `label`, `class`, `structure`,
`keys`). The `population` block declares which source domains supply raw data
and what row-level filters apply. The mighty code generator reads this to build
the initial population step; mighty.metadata stores it but does not execute it.

Each column has an `id`, `label`, and `core` conformance level: `Req`
(Required), `Cond` (Conditionally Required), or `Perm` (Permissible). The
`method` field describes how the column is derived. A `DOMAIN.COLUMN` pattern
(e.g., `DM.STUDYID`) means the column is a predecessor --- its metadata can be
inherited from the referenced source via `populate_sparse()`.

BDS (Basic Data Structure) domains like ADVS add `parameters` and `rows`:

```yaml
id: ADVS
label: Vital Signs Analysis Dataset
keys: [USUBJID, PARAMCD, AVISITN]

parameters:
  - id: BMI
    label: Body Mass Index (kg/m^2)
    columns:
      - id: AVAL
        method: Derived from height and weight

rows:
  - id: BASELINE
    method: Add baseline visit as a new row
```

See `vignette("adam-schema")` for the full schema reference.

## Working with a Single Domain

The package provides a consistent set of verbs for columns, parameters, and
rows:

| Action | Columns              | Parameters              | Rows              |
|--------|----------------------|-------------------------|--------------------|
| List   | `list_columns()`     | `list_parameters()`     | `list_rows()`      |
| Select | `select_column()`    | `select_parameter()`    | `select_row()`     |
| Add    | `add_column()`       | `add_parameter()`       | `add_row()`        |
| Update | `update_column()`    | `update_parameter()`    | `update_row()`     |
| Move   | `move_column()`      | `move_parameter()`      | `move_row()`       |
| Remove | `remove_columns()`   | `remove_parameters()`   | `remove_rows()`    |

The `remove_*` functions accept a character vector to remove multiple items at
once.

### Loading and Inspecting

Load a domain specification from a YAML file with `mighty_domain()`. The file is
validated against the ADaM JSON schema on load.

```{r}
path <- system.file("examples", "advs.yml", package = "mighty.metadata")
advs <- mighty_domain(path)
advs
```

Use `list_*()` functions to see what the specification contains:

```{r}
list_columns(advs)
list_parameters(advs)
list_rows(advs)
```

Drill into a specific column with `select_column()`:

```{r}
select_column(advs, id = "AVAL") |>
  str()
```

### Modifying Columns

Every modification automatically re-validates the domain against the schema.
All column functions return the modified domain, so they compose naturally into
a pipe chain. Here we add an actual treatment column sourced from ADSL, update
a label, and drop an unused column:

```{r}
advs <- mighty_domain(path) |>
  add_column(
    id = "TRTA",
    label = "Actual Treatment",
    method = "ADSL.TRT01A",
    .pos = 5
  ) |>
  update_column(id = "AVAL", label = "Analysis Value (Numeric)") |>
  remove_columns(id = "AVALC")

list_columns(advs)
```

#### Schema Validation

Validation runs on every modification and on initial load. You can also call
`validate()` explicitly at any time:

```{r}
validate(advs)
```

If a modification violates the schema, you get an immediate error. For example,
adding a column with a duplicate ID fails:

```{r, error = TRUE}
advs |> add_column(id = "AVAL", label = "Duplicate")
```

### Modifying Parameters

Parameters use the same verbs. The key difference is the `columns` argument in
`add_parameter()`, which accepts a nested list of column overrides specific to
that parameter:

```{r}
select_parameter(advs, id = "BMI") |>
  str()
```

```{r}
advs <- advs |>
  add_parameter(
    id = "WSTCIR",
    label = "Waist Circumference (cm)",
    columns = list(
      list(id = "AVAL", method = "VS.VSSTRESN")
    )
  )

list_parameters(advs)
```

Update and remove work as expected:

```{r}
advs <- advs |>
  update_parameter(id = "WSTCIR", label = "Waist Circumference") |>
  remove_parameters(id = "BMIGRP")

list_parameters(advs)
```

### Rows

Rows follow the same pattern. Inspect a row with `select_row()`:

```{r}
select_row(advs, id = "BASELINE") |>
  str()
```

### Saving Changes

Write the modified domain back to YAML with `write_config()`:

```{r}
out <- tempfile(fileext = ".yml")
write_config(advs, path = out)
```

The written file can be loaded back with `mighty_domain()`.

## Working with a Study

### Loading a Study

Load all domain specifications from a directory with `mighty_study()`. The
directory can contain `_study.yml` (study-level properties) and `_mighty.yml`
(mighty framework configuration).

```{r}
study_path <- system.file("examples", package = "mighty.metadata")
study <- mighty_study(study_path)
study
```

Access individual domains with `$`. Study-level properties from `_study.yml`
are stored in `@study` and mighty framework configuration from `_mighty.yml` is stored
in `@mighty`. The `@` operator accesses properties of S7 objects:

```{r}
names(study)
str(study@study)
str(study@mighty)
```

The `_study.yml` file provides the `study_id`. The `_mighty.yml` file provides
`external_data` definitions (source domains and their keys).

### Populating Core Variables

A "core variable" is an ADSL column that should appear in every consumer domain
(ADVS, ADAE, etc.) as a predecessor column --- for example, SEX and RACE for
subgroup analyses.

Note: `core` (string: Req/Cond/Perm) records the ADaM conformance level and is
unrelated to "core variable" propagation. The two fields that control
propagation are:

- `is_core` (Boolean, column-level in ADSL) --- marks a column for propagation
  to consumer domains.
- `usecore` (Boolean, domain-level) --- signals that a domain should receive the
  propagated columns. This is a top-level YAML property on the consumer domain,
  so it is set via list assignment rather than `update_column()`.

`populate_core()` reads these flags and adds the marked ADSL columns to each
consumer domain as predecessor columns.

The bundled examples do not include these fields, so we add them here to
demonstrate the workflow:

```{r}
study$ADSL <- study$ADSL |>
  update_column(id = "SEX", is_core = TRUE) |>
  update_column(id = "RACE", is_core = TRUE)

study$ADAE[["usecore"]] <- TRUE

study <- study |>
  populate_core()

list_columns(study$ADAE)
```

SEX and RACE now appear in ADAE as predecessor columns sourced from ADSL.

### Populating Predecessor Metadata

Columns that reference another domain (e.g., `method: ADSL.SAFFL`) can inherit
metadata from the referenced column. `populate_sparse()` performs this lookup
across the study, filling in only missing properties.

```{r}
# Before: SAFFL in ADVS references ADSL.SAFFL
select_column(study$ADVS, id = "SAFFL") |>
  str()
```

```{r}
study <- study |> populate_sparse()

# After: origin inherited from ADSL
select_column(study$ADVS, id = "SAFFL") |>
  str()
```

### The Complete Study Pipeline

Here is the full pipeline in one block:

```{r}
study <- mighty_study(study_path)

# Mark ADSL core variables
study$ADSL <- study$ADSL |>
  update_column(id = "SEX", is_core = TRUE) |>
  update_column(id = "RACE", is_core = TRUE)

# Mark consumer domains
study$ADAE[["usecore"]] <- TRUE

# Run the pipeline
study <- study |>
  populate_core() |>
  populate_sparse()
```

If your YAML files already include the `is_core` and `usecore` flags, the
entire pipeline collapses to a single call. Passing `populate = TRUE` is
equivalent to calling `populate_core()` then `populate_sparse()` after loading:

```{r}
study <- mighty_study(study_path, populate = TRUE)
```

### Saving a Study

Write all domain files, `_study.yml`, and `_mighty.yml` back to disk
with `write_config()`:

```{r}
out <- withr::local_tempdir()
write_config(study, path = out)
list.files(out)
```

Omit `path` to write back to the original directory (`study@path`).

## Conditional Metadata

Pooled specifications can serve multiple studies by using conditional `include`
fields. Conditions are wrapped in `{braces}` and evaluated as R expressions
(via `glue::glue_data()`) against the study's `@study` values.

```{r}
study <- mighty_study(study_path)

study$ADVS <- study$ADVS |>
  update_column(
    id = "STUDYID",
    include = "{study_id == 'example_study'}"
  )
```

When `study_id` in `@study` matches, the condition is `TRUE` and the column is
kept:

```{r}
resolved <- resolve_includes(study)
list_columns(resolved$ADVS)
```

Override with a different value and the column is removed:

```{r}
resolved <- resolve_includes(study, info = list(study_id = "other"))
list_columns(resolved$ADVS)
```

`include` works on parameters and rows too, not just columns.

## Creating a Flat Column Table

`create_md_col()` flattens the study's column specifications into a single
tibble. This is the format consumed by downstream mighty tools.

```{r}
create_md_col(study)
```

## Next Steps

This vignette covered the core workflows: `mighty_domain()` for single datasets,
`mighty_study()` for collections, `populate_core()` and `populate_sparse()` for
metadata propagation, `write_config()` for saving changes, `resolve_includes()`
for conditional specifications, and `create_md_col()` for flat output.

To learn more:

- `vignette("adam-schema")` documents the domain YAML schema reference
- `vignette("study-schema")` documents the study YAML schema reference
- `vignette("mighty-schema")` documents the mighty YAML schema reference
- The [package reference](https://novonordisk-opensource.github.io/mighty.metadata/reference/index.html)
  lists all available functions