Evaluating Output

Saving results
Evaluating Results

For this example we take the case of Viral Sinusitis and several treatments as events. We set our minEraDuration = 7, minCombinationDuration = 7 and combinationWindow = 7. We treat multiple events of Viral Sinusitis as separate cases by setting concatTargets = FALSE. When set to TRUE it would append multiple cases, which might be useful for time invariant target cohorts like chronic conditions.

library(CDMConnector)
library(dplyr)
library(TreatmentPatterns)

cohortSet <- readCohortSet(
  path = system.file(package = "TreatmentPatterns", "exampleCohorts")
)

con <- DBI::dbConnect(
  drv = duckdb::duckdb(),
  dbdir = eunomiaDir()
)

cdm <- cdmFromCon(
  con = con,
  cdmSchema = "main",
  writeSchema = "main"
)

cdm <- generateCohortSet(
  cdm = cdm,
  cohortSet = cohortSet,
  name = "cohort_table",
  overwrite = TRUE
)

## ℹ Generating 8 cohorts

## ℹ Generating cohort (1/8) - acetaminophen✔ Generating cohort (1/8) - acetaminophen [163ms]
## ℹ Generating cohort (2/8) - amoxicillin✔ Generating cohort (2/8) - amoxicillin [147ms]
## ℹ Generating cohort (3/8) - aspirin✔ Generating cohort (3/8) - aspirin [141ms]
## ℹ Generating cohort (4/8) - clavulanate✔ Generating cohort (4/8) - clavulanate [140ms]
## ℹ Generating cohort (5/8) - death✔ Generating cohort (5/8) - death [93ms]
## ℹ Generating cohort (6/8) - doxylamine✔ Generating cohort (6/8) - doxylamine [146ms]
## ℹ Generating cohort (7/8) - penicillinv✔ Generating cohort (7/8) - penicillinv [137ms]
## ℹ Generating cohort (8/8) - viralsinusitis✔ Generating cohort (8/8) - viralsinusitis [197ms]

cohorts <- cohortSet %>%
  # Remove 'cohort' and 'json' columns
  select(-"cohort", -"json", -"cohort_name_snakecase") %>%
  mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>%
  rename(
    cohortId = "cohort_definition_id",
    cohortName = "cohort_name",
  )

outputEnv <- computePathways(
  cohorts = cohorts,
  cohortTableName = "cohort_table",
  cdm = cdm,
  minEraDuration = 7,
  combinationWindow = 7,
  minPostCombinationDuration = 7,
  concatTargets = FALSE
)

## -- Qualifying records for cohort definitions: 1, 2, 3, 4, 5, 6, 7, 8
##  Records: 14041
##  Subjects: 2693
## -- Removing records < minEraDuration (7)
##  Records: 11347
##  Subjects: 2159
## >> Starting on target: 8 (viralsinusitis)
## -- Removing events outside window (startDate: 0 | endDate: 0)
##  Records: 8327
##  Subjects: 2142
## -- splitEventCohorts
##  Records: 8327
##  Subjects: 2142
## -- Collapsing eras, eraCollapse (30)
##  Records: 8327
##  Subjects: 2142
## -- Iteration 1: minPostCombinationDuration (7), combinatinoWindow (7)
##  Records: 6799
##  Subjects: 2142
## -- Iteration 2: minPostCombinationDuration (7), combinatinoWindow (7)
##  Records: 6663
##  Subjects: 2142
## -- Iteration 3: minPostCombinationDuration (7), combinatinoWindow (7)
##  Records: 6662
##  Subjects: 2142
## -- After Combination
##  Records: 6662
##  Subjects: 2142
## -- filterTreatments (First)
##  Records: 6657
##  Subjects: 2142
## -- Max path length (5)
##  Records: 6653
##  Subjects: 2142
## -- treatment construction done
##  Records: 6653
##  Subjects: 2142

results <- export(
  andromeda = outputEnv,
  minCellCount = 1,
  nonePaths = TRUE,
  outputPath = tempdir()
)

## Wrote csv-files to: C:\Users\mvankessel\AppData\Local\Temp\RtmpY1v2gG

Saving results

Now that we ran our TreatmentPatterns analysis and have exported our results, we can evaluate the output. The export() function in TreatmentPatterns returns an R6 class of TreatmentPatternsResults. All results are query-able from this object. Additionally the files are written to the specified outputPath. If no outputPath is set, only the result object is returned, and no files are written.

If you would like to save the results to csv-, or zip-file after the fact you can still do this. Or upload it to a database:

# Save to csv-, zip-file
results$saveAsCsv(path = tempdir())

## Wrote csv-files to: C:\Users\mvankessel\AppData\Local\Temp\RtmpY1v2gG

results$saveAsZip(path = tempdir(), name = "tp-results.zip")

## Wrote zip-file to: C:\Users\mvankessel\AppData\Local\Temp\RtmpY1v2gG

# Upload to database
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "sqlite",
  server = file.path(tempdir(), "db.sqlite")
)

results$uploadResultsToDb(
  connectionDetails = connectionDetails,
  schema = "main",
  prefix = "tp_",
  overwrite = TRUE,
  purgeSiteDataBeforeUploading = FALSE
)

## 
## Attaching package: 'DatabaseConnector'

## The following objects are masked from 'package:CDMConnector':
## 
##     dbms, insertTable

## Connecting using SQLite driver
## Uploading file: attrition.csv to table: attrition

## - Preparing to upload rows 1 through 12

## Inserting data took 0.0251 secs
## Uploading file: counts_age.csv to table: counts_age

## - Preparing to upload rows 1 through 63

## Inserting data took 0.0372 secs
## Uploading file: counts_sex.csv to table: counts_sex

## - Preparing to upload rows 1 through 2

## Inserting data took 0.016 secs
## Uploading file: counts_year.csv to table: counts_year

## - Preparing to upload rows 1 through 52

## Inserting data took 0.019 secs
## Uploading file: metadata.csv to table: metadata

## - Preparing to upload rows 1 through 1

## Inserting data took 0.00762 secs
## Uploading file: summary_event_duration.csv to table: summary_event_duration

## - Preparing to upload rows 1 through 88

## Inserting data took 0.0171 secs
## Uploading file: treatment_pathways.csv to table: treatment_pathways

## - Preparing to upload rows 1 through 372

## Inserting data took 0.0242 secs
## Uploading file: cdm_source_info.csv to table: cdm_source_info

## - Preparing to upload rows 1 through 1

## Inserting data took 0.0171 secs
## Uploading file: analyses.csv to table: analyses

## - Preparing to upload rows 1 through 1

## Warning: Column 'description' is of type 'logical', but this is not supported
## by many DBMSs. Converting to numeric (1 = TRUE, 0 = FALSE)

## Inserting data took 0.0154 secs
## Uploading file: arguments.csv to table: arguments

## - Preparing to upload rows 1 through 1

## Inserting data took 0.0157 secs

## Uploading data took 5.63 secs

Evaluating Results

treatmentPathways

The treatmentPathways file contains all the pathways found, with a frequency, pairwise stratified by age group, sex and index year.

head(results$treatment_pathways)

We can see the pathways contain the treatment names we provided in our event cohorts. Besides that we also see the paths are annoted with a + or -. The + indicates two treatments are a combination therapy, i.e. amoxicillin+clavulanate is a combination of amoxicillin and clavulanate. The - indicates a switch between treatments, i.e. acetaminophen-penicillinv is a switch from acetaminophen to penicillin v. Note that these combinations and switches can occur in the same pathway, i.e. acetaminophen-amoxicillin+clavulanate. The first treatment is acetaminophen that switches to a combination of amoxicillin and clavulanate.

countsAge, countsSex, and countsYear

The countsAge, countsSex, and countsYear contain counts per age, sex, and index year.

head(results$counts_age)

head(results$counts_sex)

head(results$counts_year)

summaryStatsTherapyDuration

The summaryEventDuration contains summary statistics from different events, across all found “lines”. A “line” is equal to the level in the Sunburst or Sankey diagrams. The summary statistics allow for plotting of boxplots with the plotEventDuration() function.

results$plotEventDuration()

Not that besides our events there are two extra rows: mono-event, and combination-event. These are both types of events on average.

We see that most events last between 0 and 100 days. We can see that for combination-events and amoxicillin+clavulanate there is a tendency for events to last longer than that. amoxicillin+clavulanate most likely skews the duration in the combination-events group.

We can alter the x-axis to get a clearer view of the durations of the events:

results$plotEventDuration() +
  ggplot2::xlim(0, 100)

Now we can more clearly investigate particular treatments. We can see that penicilin v tends to last quite short across all treatment lines, while aspirin and acetaminophen seem to skew to a longer duration.

Additionally we can also set a minCellCount for the individual events.

results$plotEventDuration(minCellCount = 10) +
  ggplot2::xlim(0, 100)

metadata

The metadata file is a file that contains information about the circumstances the analysis was performed in, and information about R, and the CDM.

results$metadata

Sunburst Plot & Sankey Diagram

From the filtered treatmentPathways file we are able to create a sunburst plot.

The inner most layer is the first event that occurs, going outwards. This aligns with the event duration plot we looked at earlier.

results$plotSunburst()

We can also create a Sankey Diagram, which in theory displays the same data. Additionally you see the Stopped node in the Sankey diagram. This indicates the end of the pathway. It is mostly a practical addition so that single layer Sankey diagrams can still be plotted.

results$plotSankey()