--- title: "Writing to a REDCap Project" author: Will Beasley [Biomedical & Behavior Methodology Core](https://www.ouhsc.edu/bbmc/team/), OUHSC Pediatrics;;
Raymond Balise, University of Miami School of Medicine
Stephan Kadauke, Children's Hospital of Philadelphia output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Writing to a REDCap Project} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r} #| include = FALSE knitr::opts_chunk$set( collapse = TRUE, comment = "#>", tidy = FALSE ) ``` Writing data *to* REDCap is more difficult than reading data *from* REDCap. When you read, you receive data in the structure that the REDCap provides you. You have some control about the columns, rows, and data types, but there is not a lot you have to be concerned. In contrast, the structure of the dataset you send to the REDCap server must be precise. You need to pass special variables so that the REDCap server understands the hierarchical structure of the data points. This vignette walks you through that process. If you are new to REDCap and its API, please first understand the concepts described in these two [vignettes](https://ouhscbbmc.github.io/REDCapR/articles/): - [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) - [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) # Part 1 - Intro ## Strategy As described in the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) vignette, the best way to read and write data from projects with longitudinal/repeating elements is to break up the "block matrix" dataset into individual datasets. Each rectangle should have a coherent grain. Following this strategy, we'll write to the REDCap server in two distinct steps: 1. Upload the patient-level instrument(s) 2. Upload the each repeating instrument separately. The actual upload phase is pretty straight-forward --it's just a call to `REDCapR::redcap_write()`. Most of the vignette's code prepares the dataset so that the upload will run smoothly. ## Pre-requisites See the [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) vignette and 1. [Verify REDCapR is installed](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcapr-is-installed) 2. [Verify REDCap Access](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcap-access) 3. [Review Codebook](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#review-codebook) ## Retrieve Token Please closely read the [Retrieve Protected Token](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#part-2---retrieve-protected-token) section, which has important security implications. The current vignette imports a fake dataset into REDCap, and we'll use a token stored in a local file. ``` r # retrieve-credential path_credential <- system.file("misc/dev-2.credentials", package = "REDCapR") credential <- REDCapR::retrieve_credential_local( path_credential = path_credential, project_id = 3748 ) c(credential$redcap_uri, credential$token) ``` ## Datasets to Write to Server To keep this vignette focused on writing/importing/uploading to the server, we'll start with the data that needs to be written. These example tables were prepared by [Raymond Balise](https://github.com/RaymondBalise) for our 2023 [R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, "Using REDCap and R to Rapidly Produce Biomedical Publications". There are two tables, each with a different [granularity](https://www.1keydata.com/datawarehousing/fact-table-granularity.html): - `ds_patient`: each row represents one patient, - `ds_daily`: each row represents one daily measurement per patient. ``` r # load-patient ds_patient <- "test-data/vignette-repeating-write/data-patient.rds" |> system.file(package = "REDCapR") |> readr::read_rds() ds_patient ``` ``` r # load-repeating ds_daily <- "test-data/vignette-repeating-write/data-daily.rds" |> system.file(package = "REDCapR") |> readr::read_rds() ds_daily ``` # Part 2 - Write Data: One row per patient Besides the [`data.frame`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html) to write to REDCap, the only required arguments of the [`REDCapR::redcap_write()`](https://ouhscbbmc.github.io/REDCapR/reference/redcap_write.html) function are `redcap_uri` and `token`; both are contained in the credential object created in the previous section. As discussed in the [Troubleshooting vignette](https://ouhscbbmc.github.io/REDCapR/articles/TroubleshootingApiCalls.html#writing), we recommend running these two preliminary checks before trying to write the dataset to the server for the very first time. ## Prep: Stoplight Fields If the REDCap project isn't longitudinal and doesn't have arms, uploading a patient-level data.frame to REDCap doesn't require adding variables. However we typically populate the `*_complete` variables to communicate the record's status. If the row is needs a human to add more values or inspect the existing values consider [marking the instrument](https://ouhscbbmc.github.io/REDCapR/reference/constant.html) "incomplete" or "unverified"; the patient's instrument record will appear red or yellow in REDCap's Record Dashboard. Otherwise consider marking the instrument "complete" so it will appear green. With this example project, the only patient-level instrument is "enrollment", so the corresponding variable is `enrollment_complete`. ``` r # patient-complete ds_patient <- ds_patient |> dplyr::mutate( enrollment_complete = REDCapR::constant("form_complete"), ) ``` ## Prep: `REDCapR::validate_for_write()` `REDCapR::validate_for_write()` inspects a data frame to anticipate potential problems before writing with REDCap's API. A tibble is returned, with one row per potential problem (and a suggestion how to avoid it). Ideally an 0-row tibble is returned. ``` r REDCapR::validate_for_write(ds_patient, convert_logical_to_integer = TRUE) ``` If you encounter problems that can be checked with automation, please tell us in [an issue](https://github.com/OuhscBbmc/REDCapR/issues). We'll work with you to incorporate the new check into `REDCapR::validate_for_write()`. When a dataset's problems are caught before reaching the server, the solutions are easier to identify and implement. ## Prep: Write Small Subset First If this is your first time with a complicated project, consider loading a small subset of rows and columns. In this case, we start with only three columns and two rows. ``` r # patient-subset ds_patient |> dplyr::select( # First three columns id_code, date, is_mobile, ) |> dplyr::slice(1:2) |> # First two rows REDCapR::redcap_write( ds_to_write = _, redcap_uri = credential$redcap_uri, token = credential$token, convert_logical_to_integer = TRUE ) ``` ## Prep: Recode Variables where Necessary Some variables in the data.frame might be represented differently than in REDCap. A common transformation is changing strings into the integers that underlie radio buttons. Common approaches are [`dplyr::case_match()`](https://dplyr.tidyverse.org/reference/case_match.html) and using joining to lookup tables (if the mappings are expressed in a csv). Here's an in-line example of `dplyr::case_match()`. ``` r ds_patient <- ds_patient |> dplyr::mutate( race = dplyr::case_match( race, "White" ~ 1L, "Black or African American" ~ 2L, "Asian" ~ 3L, "Native American" ~ 4L, "Pacific Islander" ~ 5L, "Multiracial" ~ 6L, "Refused or don't know" ~ 7L ) ) ``` ```{r codebook-race} #| echo = FALSE, #| out.extra = 'style = "fig.width=1200px"' knitr::include_graphics("images/codebook-race.png") ``` ## Write Entire Patient-level Table If the small subset works, we usually jump ahead and try all columns and rows. If this larger table fails, split the difference between (a) the smaller working example and (b) the larger failing example. See if this middle point (that has fewer rows and/or columns than the failing point) succeeds or fails. Then repeat. This "bisection" or "binary search" [debugging technique](https://medium.com/codecastpublication/debugging-tools-and-techniques-binary-search-2da5bb4282c7) is helpful in many areas of programming and statistical modeling. ``` r # patient-entire ds_patient |> REDCapR::redcap_write( ds_to_write = _, redcap_uri = credential$redcap_uri, token = credential$token, convert_logical_to_integer = TRUE ) ``` # Part 3 - Write Data: Repeating Instrument ## Add Plumbing Variables As stated in the vignette's intro, the structure of the dataset uploaded to the server must be precise. When uploading repeating instruments, there are several important columns: 1. `record_id`: typically indicates the patient's id. (This field can be renamed for the project.) 2. `redcap_event_name`: If the project is longitudinal or has arms, this indicates the event. Otherwise, you don't need to add this variable. 3. `redcap_repeat_instrument`: Indicates the instrument/form that is repeating for these columns. 4. `redcap_repeat_instance`: Typically a sequential positive integer (*e.g.*, 1, 2, 3, ...) indicating the order. The combination of these variables needs to be unique. Please read the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) vignette for details of these variables and their meanings. You need to pass specific variables so that the REDCap server understands the hierarchical structure of the data points. ``` r # repeat-plumbing ds_daily <- ds_daily |> dplyr::group_by(id_code) |> dplyr::mutate( redcap_repeat_instrument = "daily", redcap_repeat_instance = dplyr::row_number(da_date), daily_complete = REDCapR::constant("form_complete"), ) |> dplyr::ungroup() |> dplyr::select( id_code, # Or `record_id`, if you didn't rename it # redcap_event_name, # If the project is longitudinal or has arms redcap_repeat_instrument, # The name of the repeating instrument/form redcap_repeat_instance, # The sequence of the repeating instrument tidyselect::everything(), # All columns not explicitly passed to `dplyr::select()` daily_complete, # Indicates incomplete, unverified, or complete ) # Check for potential problems. (Remember zero rows are good.) REDCapR::validate_for_write(ds_daily, convert_logical_to_integer = TRUE) ds_daily ``` ## Writing Repeating Instrument Variables ``` r # daily-entire ds_daily |> REDCapR::redcap_write( ds_to_write = _, redcap_uri = credential$redcap_uri, token = credential$token, convert_logical_to_integer = TRUE ) ``` # Part 4 - Next Steps ## More Complexity This vignette required only two data.frames, but more complex projects sometimes need more. For example, each repeating instrument should be its own data.frame and writing step. Arms and longitudinal events need to be considered too. ## Batching By default, `REDCapR::redcap_write()` requests datasets of 100 patients as a time, and stacks the resulting subsets together before returning a data.frame. This can be adjusted to improve performance; the 'Details' section of `REDCapR::redcap_write()` discusses the trade offs. I usually shoot for \~10 seconds per batch. ## Manual vs API Manual downloading/uploading might make sense if you're do the operation only once. But when does it ever stop after the first time? If you have trouble uploading, consider adding a few fake patients & measurements and then download the csv. It might reveal something you didn't anticipate. But be aware that it will be in the block matrix format (*i.e.*, everything jammed into one rectangle.) ## REDCap's CDIS The [Clinical Data Interoperability Services](https://projectredcap.org/software/cdis/) (CDIS) use [FHIR](https://www.hl7.org/fhir/overview.html) to move data from your institution's [EMR/EHR](https://www.healthit.gov/faq/what-are-differences-between-electronic-medical-records-electronic-health-records-and-personal) (eg, Epic, Cerner) to REDCap. Research staff have control over which patient records are selected or eligible. Conceptually it's similar to writing to REDCap's with the API, but at much bigger scale. Realistically, it takes months to get through your institution's human layers. Once established, a project would be populated with EMR data in much less development time --assuming the desired data models corresponds with FHIR [endpoints](https://hl7.org/fhir/endpoint.html). # Notes This vignette was originally designed for the [2023 R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, *Using REDCap and R to Rapidly Produce Biomedical Publications Cleaning Medical Data* with [Raymond R. Balise](https://github.com/RaymondBalise), Belén Hervera, Daniel Maya, Anna Calderon, Tyler Bartholomew, Stephan Kadauke, and João Pedro Carmezim Correia and the [2024 R/Medicine](https://rconsortium.github.io/RMedicine_website/Program.html) workshop, *REDCap + R: Teaming Up in the Tidyverse*, with Stephan Kadauke. The workshop slides are for [2023](https://github.com/RaymondBalise/r_med_redcap_2023_public) and [2024](https://github.com/skadauke/rmedicine_2024_redcap_r_workshop). This work was made possible in part by the NIH grant [U54GM104938](https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U54GM104938&arg_ProgOfficeCode=127) to the [Oklahoma Shared Clinical and Translational Resource)](http://osctr.ouhsc.edu). # Session Information For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.
Environment ```{r session-info, echo=FALSE} if (requireNamespace("sessioninfo", quietly = TRUE)) { sessioninfo::session_info() } else { sessionInfo() } ```