--- title: "Auditing an R package you have just received" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Auditing an R package you have just received} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ```{r setup} library(checkhelper) ``` This vignette is the canonical end-to-end walkthrough: a colleague hands you an R package and asks "is this CRAN-ready?". The goal is to surface every CRAN-blocking issue with the **smallest possible number of `R CMD check` runs**, then apply the safe automatic fixes. For audits that have their own pipeline (full CRAN environment, file-system snapshots), see the companion vignette `vignette("pre-submission-gates", package = "checkhelper")`. ## TL;DR - the audit script ```{r} pkg <- "/path/to/the/package" # 1. Run R CMD check ONCE and reuse it everywhere it's needed. chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran") # 2. Static audits (no extra check needed). audit_tags(pkg) # exported funs without @return / internals without @noRd audit_ascii(pkg) # non-ASCII characters in R/, tests/, vignettes/, man/, DESCRIPTION, NAMESPACE audit_dataset_doc(pkg) # datasets in data/ without a roxygen block audit_citation(pkg) # old-style personList() / citEntry() in inst/CITATION audit_dontrun(pkg) # \dontrun{} blocks in man/*.Rd audit_description(pkg) # unquoted package names in DESCRIPTION's Description field audit_downloads(pkg) # network / download call sites to review for offline-safe guards # 3. Audits that need the check output - pass `chk` to skip a 2nd run. audit_globals(pkg, checks = chk) # 4. Apply the safe fixes. fix_globals(pkg, checks = chk, write = TRUE) # Preview before applying: fix_ascii() returns invisibly, so capture # it to see which files would change. preview <- fix_ascii(pkg, dry_run = TRUE) preview[preview$changed, ] fix_ascii(pkg, dry_run = FALSE) # then apply fix_dataset_doc("my_data", pkg = pkg, description = "Description of my_data", source = "Internal") # one call per undocumented dataset ``` ## Why share the check object? `audit_globals()` and `fix_globals()` parse the `notes` field of an `rcmdcheck::rcmdcheck()` result to extract the `no visible binding for global variable` and `no visible global function definition` notes. By default each call runs its own check, which is slow on a real package. Both functions accept a `checks =` argument. When supplied, they skip the `rcmdcheck()` call and parse the existing object. This lets you run the check **once** and reuse the result for the whole audit. ```{r} chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran") audit_globals(pkg, checks = chk) fix_globals(pkg, checks = chk, write = TRUE) ``` The other audits do not need a check at all: | Audit | Needs `R CMD check`? | Notes | |----------------------|-----------------------|----------------------------------------| | `audit_tags()` | no | static via roxygen2 | | `audit_ascii()` | no | line-by-line via `stringi::stri_enc_isascii()` | | `audit_dataset_doc()`| no | inspects `data/` and `R/` | | `audit_citation()` | no | static parse of `inst/CITATION` | | `audit_description()`| no | tokenises DESCRIPTION's Description | | `audit_dontrun()` | no | line-by-line scan of `man/*.Rd` | | `audit_downloads()` | no | AST walk of `R/`, `tests/`, `vignettes/`, `inst/` | | `audit_globals()` | **yes** (reusable) | accepts `checks =` | | `audit_userspace()` | yes (own pipeline) | takes file-system snapshots, separate | | `audit_check()` | yes | this **is** the check, with CRAN env | ## Per-issue cheatsheet ### Globals (`no visible binding`) `audit_globals()` returns a 3-element list of names CRAN flagged: - `globalVariables` - undeclared variables that need a `utils::globalVariables()` declaration. - `functions` - external functions that need an `@importFrom` line. - `operators` - NSE tokens, data.table / rlang pronouns (`:=`, `.SD`, `.N`, `.data`, `!!`, ...) that also need an `@importFrom` rather than a `globalVariables()` entry. `fix_globals(write = TRUE)` writes the `globalVariables` set into `R/globals.R` (merging with whatever names that file already declares - the freshly detected names are added on top of the existing ones, deduplicated). The operators section is printed on stdout so you wire each one into a roxygen `@importFrom` block by hand: ```{r} audit_globals(pkg, checks = chk) fix_globals(pkg, checks = chk, write = TRUE) ``` When a token is exported by more than one candidate package (e.g. `:=` is exported by both data.table and rlang), every candidate is listed and you pick one consciously - no silent guessing. Without `write = TRUE`, `fix_globals()` only prints both blocks to copy-paste. ### Missing roxygen tags `audit_tags()` flags exported functions without `@return` and documented internals without `@noRd`. Read-only - no automatic fix because adding accurate `@return` text needs a human: ```{r} audit_tags(pkg) ``` ### Non-ASCII characters `audit_ascii()` walks `R/`, `tests/`, `vignettes/`, `man/`, `DESCRIPTION` and `NAMESPACE` line-by-line and reports every line containing non-ASCII characters (columns: `file`, `line`, `text`, `n_tokens`). `fix_ascii()` then rewrites them - using the parser AST so each token is rewritten per its context: string literals become `\uXXXX` escapes, comments and roxygen get `Latin-ASCII` transliteration. **It dry-runs by default**: ```{r} audit_ascii(pkg) # Always preview which files would change. fix_ascii() returns # invisibly - capture the result to inspect per-file detail # (path, changed, n_tokens, n_chars). preview <- fix_ascii(pkg, dry_run = TRUE) preview[preview$changed, ] # Apply when you've reviewed the proposed rewrite. fix_ascii(pkg, dry_run = FALSE) ``` Identifiers with non-ASCII characters are refused by default (renaming would be a breaking change). ### Undocumented datasets `audit_dataset_doc()` lists every `data/*.rda` without a matching roxygen block under `R/`. `fix_dataset_doc()` writes a documentation skeleton (one call per dataset, takes the dataset name): ```{r} audit_dataset_doc(pkg) fix_dataset_doc("my_data", pkg = pkg, description = "Description of my_data", source = "Internal") ``` The skeleton is editable: you fill in the description / source / column-by-column comments by hand, then re-run `devtools::document()`. ### Old-style `inst/CITATION` `audit_citation()` parses `inst/CITATION` statically (no `eval()`) and surfaces every call to `personList()`, `as.personList()` or `citEntry()` that CRAN rejects on submission with `Package CITATION file contains call(s) to old-style ...`. It returns a tibble with `call`, `line` and a one-line `suggestion` for the modern equivalent (`c()` on `person()` objects; `bibentry()` instead of `citEntry()`): ```{r} audit_citation(pkg) ``` Read-only - rewriting a CITATION file usually needs editorial judgment, so there is no automated `fix_citation()`. ### Unquoted package names in `Description` CRAN incoming pretest emits `Package names should be quoted in the Description field` when a package name (or any software name) appears in the `Description` field of `DESCRIPTION` without surrounding single quotes. `audit_description()` reads the `Description` field, tokenises it, and surfaces every word that matches an installed package name yet is not wrapped in single quotes. The package's own name is intentionally skipped, and so are compound forms like `dplyr-style` or `httr2-based` (a hyphen on either side disqualifies the token from being a standalone package reference). Returns a tibble with `word`, `position` and `suggestion`: ```{r} audit_description(pkg) ``` Read-only - the fix is editorial (decide whether each hit is a real package reference or a coincidental word, then wrap with single quotes). ### Network / download calls CRAN policy: package code that downloads files or hits the network at install or runtime must degrade gracefully when the network is unavailable (offline build farms, sandboxed CI, locked-down user environment). Common rejection causes: downloads from inside `.onLoad()`, `.onAttach()`, vignettes or examples that have no `tryCatch()` / `skip_if_offline()` / `\dontrun{}` guard. `audit_downloads()` walks `R/`, `tests/`, `vignettes/` and `inst/`, parses each file, and surfaces every call to a known download or HTTP function: `download.file()`, `httr::GET()`, `httr2::req_perform()`, `curl::curl_download()`, etc. The call site (file + line) is paired with a one-line suggestion. Detection is purely static, so a user-defined function that shadows a downloader name (`download.file <- function(...) { ... }`) does not trigger a false positive on the definition site - only call sites are flagged: ```{r} audit_downloads(pkg) ``` Read-only - the fix is editorial: decide for each call whether the right CRAN-safe pattern is `tryCatch()` (continue on offline), `testthat::skip_if_offline()` (skip the test), or `\dontrun{}` (drop the example from the test surface). ### `\dontrun{}` blocks in examples CRAN policy is that `\dontrun{}` should only wrap example code that genuinely cannot be executed (missing API key, missing system dependency, side effect on the user's filespace). Otherwise prefer `\donttest{}`, which still gets exercised by `R CMD check --run-donttest` but is skipped by default. `audit_dontrun()` walks `man/*.Rd` line-by-line and surfaces every `\dontrun{}` opener (commented-out `% \dontrun{` mentions are ignored), with the source Rd file, the documented topic, the line number and a one-line suggestion. Read-only - the call is your review checklist: ```{r} audit_dontrun(pkg) ``` ## Minimal end-to-end on a fake package `create_example_pkg()` builds a fake package that deliberately trips each audit. The two `with_*` flags below activate the non-ASCII and undocumented-dataset fixtures so every audit has something to surface: ```{r} pkg <- create_example_pkg(with_nonascii = TRUE, with_undocumented_data = TRUE) chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran") audit_tags(pkg) # @return / @noRd issues audit_ascii(pkg) # accents in comments / strings audit_dataset_doc(pkg) # data/demo_dataset.rda has no doc audit_citation(pkg) # old-style personList() / citEntry() audit_dontrun(pkg) # \dontrun{} blocks in examples audit_description(pkg) # unquoted package names in Description audit_downloads(pkg) # network call sites to review for offline-safe guards audit_globals(pkg, checks = chk) fix_globals(pkg, checks = chk, write = TRUE) fix_ascii(pkg, dry_run = FALSE) fix_dataset_doc("demo_dataset", pkg = pkg, description = "A small demo dataset", source = "Generated by create_example_pkg()") ``` After applying the fixes, re-run the check (the package state has changed, so a new `rcmdcheck()` is needed) and confirm 0 / 0 / 0. ## Next step: pre-submission gates When the dev-time audits above are clean, run the heavier gates that have their own pipeline and **cannot** reuse `chk`: - `audit_check()` - `R CMD check` with the full CRAN incoming environment. - `audit_userspace()` - checks that tests / examples / vignettes leave no files behind. Both are documented in `vignette("pre-submission-gates", package = "checkhelper")`.