--- title: "Getting Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include = FALSE} library(mighty.metadata) ``` This vignette walks you through the key workflows for defining and working with ADaM specifications: loading and editing a single domain, assembling a multi-domain study, propagating metadata across domains, applying conditional includes, and producing flat output for downstream tooling. ## The YAML Specification Format Each ADaM domain is defined in a YAML file. The simplest case is a subject-level dataset like ADSL, which only has columns: ```yaml id: ADSL label: Subject-Level Analysis Dataset class: SUBJECT LEVEL ANALYSIS DATASET structure: One record per subject keys: USUBJID population: base: - domain: DM depends: [USUBJID] filter: USUBJID != "" columns: - id: STUDYID label: Study Identifier method: DM.STUDYID core: Req - id: USUBJID label: Unique Subject Identifier method: DM.USUBJID core: Req # ... more columns ... ``` Top-level keys define the domain identity (`id`, `label`, `class`, `structure`, `keys`). The `population` block declares which source domains supply raw data and what row-level filters apply. The mighty code generator reads this to build the initial population step; mighty.metadata stores it but does not execute it. Each column has an `id`, `label`, and `core` conformance level: `Req` (Required), `Cond` (Conditionally Required), or `Perm` (Permissible). The `method` field describes how the column is derived. A `DOMAIN.COLUMN` pattern (e.g., `DM.STUDYID`) means the column is a predecessor --- its metadata can be inherited from the referenced source via `populate_sparse()`. BDS (Basic Data Structure) domains like ADVS add `parameters` and `rows`: ```yaml id: ADVS label: Vital Signs Analysis Dataset keys: [USUBJID, PARAMCD, AVISITN] parameters: - id: BMI label: Body Mass Index (kg/m^2) columns: - id: AVAL method: Derived from height and weight rows: - id: BASELINE method: Add baseline visit as a new row ``` See `vignette("adam-schema")` for the full schema reference. ## Working with a Single Domain The package provides a consistent set of verbs for columns, parameters, and rows: | Action | Columns | Parameters | Rows | |--------|----------------------|-------------------------|--------------------| | List | `list_columns()` | `list_parameters()` | `list_rows()` | | Select | `select_column()` | `select_parameter()` | `select_row()` | | Add | `add_column()` | `add_parameter()` | `add_row()` | | Update | `update_column()` | `update_parameter()` | `update_row()` | | Move | `move_column()` | `move_parameter()` | `move_row()` | | Remove | `remove_columns()` | `remove_parameters()` | `remove_rows()` | The `remove_*` functions accept a character vector to remove multiple items at once. ### Loading and Inspecting Load a domain specification from a YAML file with `mighty_domain()`. The file is validated against the ADaM JSON schema on load. ```{r} path <- system.file("examples", "advs.yml", package = "mighty.metadata") advs <- mighty_domain(path) advs ``` Use `list_*()` functions to see what the specification contains: ```{r} list_columns(advs) list_parameters(advs) list_rows(advs) ``` Drill into a specific column with `select_column()`: ```{r} select_column(advs, id = "AVAL") |> str() ``` ### Modifying Columns Every modification automatically re-validates the domain against the schema. All column functions return the modified domain, so they compose naturally into a pipe chain. Here we add an actual treatment column sourced from ADSL, update a label, and drop an unused column: ```{r} advs <- mighty_domain(path) |> add_column( id = "TRTA", label = "Actual Treatment", method = "ADSL.TRT01A", .pos = 5 ) |> update_column(id = "AVAL", label = "Analysis Value (Numeric)") |> remove_columns(id = "AVALC") list_columns(advs) ``` #### Schema Validation Validation runs on every modification and on initial load. You can also call `validate()` explicitly at any time: ```{r} validate(advs) ``` If a modification violates the schema, you get an immediate error. For example, adding a column with a duplicate ID fails: ```{r, error = TRUE} advs |> add_column(id = "AVAL", label = "Duplicate") ``` ### Modifying Parameters Parameters use the same verbs. The key difference is the `columns` argument in `add_parameter()`, which accepts a nested list of column overrides specific to that parameter: ```{r} select_parameter(advs, id = "BMI") |> str() ``` ```{r} advs <- advs |> add_parameter( id = "WSTCIR", label = "Waist Circumference (cm)", columns = list( list(id = "AVAL", method = "VS.VSSTRESN") ) ) list_parameters(advs) ``` Update and remove work as expected: ```{r} advs <- advs |> update_parameter(id = "WSTCIR", label = "Waist Circumference") |> remove_parameters(id = "BMIGRP") list_parameters(advs) ``` ### Rows Rows follow the same pattern. Inspect a row with `select_row()`: ```{r} select_row(advs, id = "BASELINE") |> str() ``` ### Saving Changes Write the modified domain back to YAML with `write_config()`: ```{r} out <- tempfile(fileext = ".yml") write_config(advs, path = out) ``` The written file can be loaded back with `mighty_domain()`. ## Working with a Study ### Loading a Study Load all domain specifications from a directory with `mighty_study()`. The directory can contain `_study.yml` (study-level properties) and `_mighty.yml` (mighty framework configuration). ```{r} study_path <- system.file("examples", package = "mighty.metadata") study <- mighty_study(study_path) study ``` Access individual domains with `$`. Study-level properties from `_study.yml` are stored in `@study` and mighty framework configuration from `_mighty.yml` is stored in `@mighty`. The `@` operator accesses properties of S7 objects: ```{r} names(study) str(study@study) str(study@mighty) ``` The `_study.yml` file provides the `study_id`. The `_mighty.yml` file provides `external_data` definitions (source domains and their keys). ### Populating Core Variables A "core variable" is an ADSL column that should appear in every consumer domain (ADVS, ADAE, etc.) as a predecessor column --- for example, SEX and RACE for subgroup analyses. Note: `core` (string: Req/Cond/Perm) records the ADaM conformance level and is unrelated to "core variable" propagation. The two fields that control propagation are: - `is_core` (Boolean, column-level in ADSL) --- marks a column for propagation to consumer domains. - `usecore` (Boolean, domain-level) --- signals that a domain should receive the propagated columns. This is a top-level YAML property on the consumer domain, so it is set via list assignment rather than `update_column()`. `populate_core()` reads these flags and adds the marked ADSL columns to each consumer domain as predecessor columns. The bundled examples do not include these fields, so we add them here to demonstrate the workflow: ```{r} study$ADSL <- study$ADSL |> update_column(id = "SEX", is_core = TRUE) |> update_column(id = "RACE", is_core = TRUE) study$ADAE[["usecore"]] <- TRUE study <- study |> populate_core() list_columns(study$ADAE) ``` SEX and RACE now appear in ADAE as predecessor columns sourced from ADSL. ### Populating Predecessor Metadata Columns that reference another domain (e.g., `method: ADSL.SAFFL`) can inherit metadata from the referenced column. `populate_sparse()` performs this lookup across the study, filling in only missing properties. ```{r} # Before: SAFFL in ADVS references ADSL.SAFFL select_column(study$ADVS, id = "SAFFL") |> str() ``` ```{r} study <- study |> populate_sparse() # After: origin inherited from ADSL select_column(study$ADVS, id = "SAFFL") |> str() ``` ### The Complete Study Pipeline Here is the full pipeline in one block: ```{r} study <- mighty_study(study_path) # Mark ADSL core variables study$ADSL <- study$ADSL |> update_column(id = "SEX", is_core = TRUE) |> update_column(id = "RACE", is_core = TRUE) # Mark consumer domains study$ADAE[["usecore"]] <- TRUE # Run the pipeline study <- study |> populate_core() |> populate_sparse() ``` If your YAML files already include the `is_core` and `usecore` flags, the entire pipeline collapses to a single call. Passing `populate = TRUE` is equivalent to calling `populate_core()` then `populate_sparse()` after loading: ```{r} study <- mighty_study(study_path, populate = TRUE) ``` ### Saving a Study Write all domain files, `_study.yml`, and `_mighty.yml` back to disk with `write_config()`: ```{r} out <- withr::local_tempdir() write_config(study, path = out) list.files(out) ``` Omit `path` to write back to the original directory (`study@path`). ## Conditional Metadata Pooled specifications can serve multiple studies by using conditional `include` fields. Conditions are wrapped in `{braces}` and evaluated as R expressions (via `glue::glue_data()`) against the study's `@study` values. ```{r} study <- mighty_study(study_path) study$ADVS <- study$ADVS |> update_column( id = "STUDYID", include = "{study_id == 'example_study'}" ) ``` When `study_id` in `@study` matches, the condition is `TRUE` and the column is kept: ```{r} resolved <- resolve_includes(study) list_columns(resolved$ADVS) ``` Override with a different value and the column is removed: ```{r} resolved <- resolve_includes(study, info = list(study_id = "other")) list_columns(resolved$ADVS) ``` `include` works on parameters and rows too, not just columns. ## Creating a Flat Column Table `create_md_col()` flattens the study's column specifications into a single tibble. This is the format consumed by downstream mighty tools. ```{r} create_md_col(study) ``` ## Next Steps This vignette covered the core workflows: `mighty_domain()` for single datasets, `mighty_study()` for collections, `populate_core()` and `populate_sparse()` for metadata propagation, `write_config()` for saving changes, `resolve_includes()` for conditional specifications, and `create_md_col()` for flat output. To learn more: - `vignette("adam-schema")` documents the domain YAML schema reference - `vignette("study-schema")` documents the study YAML schema reference - `vignette("mighty-schema")` documents the mighty YAML schema reference - The [package reference](https://novonordisk-opensource.github.io/mighty.metadata/reference/index.html) lists all available functions