--- title: "Get started with artoo" vignette: > %\VignetteIndexEntry{Get started with artoo} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} format: html --- ```{r} #| include: false library(artoo) ``` This guide walks the whole artoo round-trip once, on the bundled demo data. A **spec** plus your **data** go through `apply_spec()`, write to a **file**, and read back **identical** — that loop is artoo's lossless guarantee. Every step below runs as-is; there is nothing to download. ## The round-trip at a glance ```{=html}
fix · inspect the spec set_type() · check_spec() spec + data apply_spec() scaffold · coerce · order · sort · stamp write_*() file read_*() identical data lossless round-trip — what you read back equals what you wrote
``` ## 1. Get a spec A `artoo_spec` is the canonical description of your datasets: variables, CDISC data types, lengths, labels, controlled-terminology codelists, and sort keys — always for exactly **one** CDISC standard. `read_spec()` reads one from Define-XML, a Pinnacle 21 workbook, or artoo's native JSON; `artoo_spec()` assembles one from metadata frames. The package bundles ready-made specs built from the official CDISC Define-XML 2.1 release examples — `adam_spec` for ADaM (ADSL, ADAE) and `sdtm_spec` for SDTM (DM, VS, TS, SUPPDM). Each also ships as a P21 workbook you can open in Excel: ```{r} adam_spec p21 <- system.file("extdata", "adam-spec.xlsx", package = "artoo") identical(spec_standard(read_spec(p21)), spec_standard(adam_spec)) ``` Because `read_spec()` and `write_spec()` are inverses on each format, format conversion is one composition — Define-XML in, P21 workbook out: ```r read_spec("define.xml") |> write_spec("spec.xlsx") ``` ## 2. Apply the spec `apply_spec()` is the conform pipeline: it coerces each column to its CDISC data type, orders the columns, sorts by the dataset keys, and stamps the result with its metadata. A variable the spec declares but the data lacks is reported, never fabricated as an empty column. The input is never mutated, no column is ever dropped, and a coercion that would damage values aborts before it runs — with two honest one-line exits: keep the wider source type with `apply_spec(..., on_coercion_loss = "keep")`, or retype the spec with `set_type()`. ```{r} adsl <- apply_spec(cdisc_adsl, adam_spec, "ADSL") ``` The conformance findings ride along on the result — read them back as a frame with `conformance()`: ```{r} nrow(conformance(adsl)) ``` The pipeline is standard-neutral: an SDTM domain conforms identically — only the spec and the dataset change. ```{r} dm <- apply_spec(cdisc_dm, sdtm_spec, "DM") nrow(conformance(dm)) ``` ## 3. Inspect the columns `columns()` is the quick look a SAS programmer expects from `PROC CONTENTS`: one row per variable with position, type, length, format, label, and the CDISC key sequence. It works on a conformed frame, any plain data frame, or a file path: ```{r} columns(adsl) ``` ## 4. Write to any format — losslessly Every writer carries the full metadata model, so the write is lossless by construction. The writers return their input invisibly, so one conformed frame fans out to every deliverable: ```{r} xpt <- tempfile(fileext = ".xpt") json <- tempfile(fileext = ".json") adsl |> write_xpt(xpt) |> write_json(json) ``` Any file converts to any other without re-applying the spec — the metadata travels inside (or beside) the container: ```{r} parquet <- tempfile(fileext = ".parquet") write_parquet(read_json(json), parquet) ``` ## 5. Read back, intact Reading restores the values, the R classes (dates as `Date`, times as `hms`), the labels, and the metadata — identically from every format: ```{r} back <- read_json(json) get_meta(back)@dataset$records columns(back) ``` One honest caveat: the XPORT byte layout stores only name, label, length, and formats, so `columns()` on an `.xpt` path shows a blank `Key` — the key sequence (like codelist references) rides the metadata-carrying formats and the in-session frame, never the 1980s transport bytes. That round-trip identity is the whole point: what you submit is what you archived is what you analysed. ## Where to next - [Specifications](https://vthanik.github.io/artoo/articles/specs.html) — read a spec from Define-XML or a workbook, inspect it with the `spec_*` accessors, and fix it in place with `set_type()` / `repair_spec()`. - [Conform & validate](https://vthanik.github.io/artoo/articles/conform.html) — `apply_spec()` in depth, then every conformance finding from `check_spec()` and `check_study()`, and the errors artoo raises. - [Formats & lossless conversion](https://vthanik.github.io/artoo/articles/convert.html) — any-to-any round trips, encodings, the `on_invalid` policy, and the qualification evidence a regulated pipeline needs. - [Recipes](https://vthanik.github.io/artoo/articles/recipes.html) — end-to-end ADaM and SDTM builds, dates and `--DTC`, and codelist decoding, each rendered live on the demo data. ```