---
title: "Evaluating Output"
always_allow_html: yes
output:
html_document:
toc: yes
toc_depth: '3'
df_print: paged
html_vignette:
toc: yes
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Evaluating Output}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
pdf_document:
toc: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
withr::local_envvar(
R_USER_CACHE_DIR = tempfile(),
EUNOMIA_DATA_FOLDER = Sys.getenv("EUNOMIA_DATA_FOLDER", unset = tempfile())
)
```
For this example we take the case of *Viral Sinusitis* and several treatments as events. We set our `minEraDuration = 7`, `minCombinationDuration = 7` and `combinationWindow = 7`. We treat multiple events of *Viral Sinusitis* as separate cases by setting `concatTargets = FALSE`. When set to `TRUE` it would append multiple cases, which might be useful for time invariant target cohorts like chronic conditions.
```{r setup_treatment_patterns, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE), warning=FALSE, error=FALSE}
library(CDMConnector)
library(dplyr)
library(TreatmentPatterns)
cohortSet <- readCohortSet(
path = system.file(package = "TreatmentPatterns", "exampleCohorts")
)
con <- DBI::dbConnect(
drv = duckdb::duckdb(),
dbdir = eunomiaDir()
)
cdm <- cdmFromCon(
con = con,
cdmSchema = "main",
writeSchema = "main"
)
cdm <- generateCohortSet(
cdm = cdm,
cohortSet = cohortSet,
name = "cohort_table",
overwrite = TRUE
)
cohorts <- cohortSet %>%
# Remove 'cohort' and 'json' columns
select(-"cohort", -"json", -"cohort_name_snakecase") %>%
mutate(type = c("event", "event", "event", "event", "exit", "event", "event", "target")) %>%
rename(
cohortId = "cohort_definition_id",
cohortName = "cohort_name",
)
outputEnv <- computePathways(
cohorts = cohorts,
cohortTableName = "cohort_table",
cdm = cdm,
minEraDuration = 7,
combinationWindow = 7,
minPostCombinationDuration = 7,
concatTargets = FALSE
)
results <- export(
andromeda = outputEnv,
minCellCount = 1,
nonePaths = TRUE,
outputPath = tempdir()
)
```
## Saving results
Now that we ran our TreatmentPatterns analysis and have exported our results, we can evaluate the output. The `export()` function in TreatmentPatterns returns an R6 class of `TreatmentPatternsResults`. All results are query-able from this object. Additionally the files are written to the specified `outputPath`. If no `outputPath` is set, only the result object is returned, and no files are written.
If you would like to save the results to csv-, or zip-file after the fact you can still do this. Or upload it to a database:
```{r save, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
# Save to csv-, zip-file
results$saveAsCsv(path = tempdir())
results$saveAsZip(path = tempdir(), name = "tp-results.zip")
# Upload to database
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = "sqlite",
server = file.path(tempdir(), "db.sqlite")
)
results$uploadResultsToDb(
connectionDetails = connectionDetails,
schema = "main",
prefix = "tp_",
overwrite = TRUE,
purgeSiteDataBeforeUploading = FALSE
)
```
## Evaluating Results
### treatmentPathways
The treatmentPathways file contains all the pathways found, with a frequency, pairwise stratified by age group, sex and index year.
```{r readTreatmentPathways, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
head(results$treatment_pathways)
```
We can see the pathways contain the treatment names we provided in our event cohorts. Besides that we also see the paths are annoted with a `+` or `-`. The `+` indicates two treatments are a combination therapy, i.e. `amoxicillin+clavulanate` is a combination of _amoxicillin_ and _clavulanate_. The `-` indicates a switch between treatments, i.e. `acetaminophen-penicillinv` is a switch from _acetaminophen_ to _penicillin v_. Note that these combinations and switches can occur in the same pathway, i.e. `acetaminophen-amoxicillin+clavulanate`. The first treatment is _acetaminophen_ that *switches* to a combination of _amoxicillin_ and _clavulanate_.
### countsAge, countsSex, and countsYear
The countsAge, countsSex, and countsYear contain counts per age, sex, and index year.
```{r counts, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
head(results$counts_age)
head(results$counts_sex)
head(results$counts_year)
```
### summaryStatsTherapyDuration
The summaryEventDuration contains summary statistics from different events, across all found "lines". A "line" is equal to the level in the Sunburst or Sankey diagrams. The summary statistics allow for plotting of boxplots with the `plotEventDuration()` function.
```{r summaryStatsTherapyDuration, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration()
```
Not that besides our events there are two extra rows: *mono-event*, and *combination-event*. These are both types of events on average.
We see that most events last between 0 and 100 days. We can see that for *combination-events* and *amoxicillin+clavulanate* there is a tendency for events to last longer than that. *amoxicillin+clavulanate* most likely skews the duration in the *combination-events* group.
We can alter the x-axis to get a clearer view of the durations of the events:
```{r, warning=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration() +
ggplot2::xlim(0, 100)
```
Now we can more clearly investigate particular treatments. We can see that *penicilin v* tends to last quite short across all treatment lines, while *aspirin* and *acetaminophen* seem to skew to a longer duration.
Additionally we can also set a `minCellCount` for the individual events.
```{r, warning=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotEventDuration(minCellCount = 10) +
ggplot2::xlim(0, 100)
```
### metadata
The metadata file is a file that contains information about the circumstances the analysis was performed in, and information about R, and the CDM.
```{r metadata, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$metadata
```
### Sunburst Plot & Sankey Diagram
From the filtered treatmentPathways file we are able to create a sunburst plot.
The inner most layer is the first event that occurs, going outwards. This aligns with the event duration plot we looked at earlier.
```{r sunburstPlot, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotSunburst()
```
We can also create a Sankey Diagram, which in theory displays the same data. Additionally you see the *Stopped* node in the Sankey diagram. This indicates the end of the pathway. It is mostly a practical addition so that single layer Sankey diagrams can still be plotted.
```{r sankeyDiagram, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
results$plotSankey()
```
```{r cleanup, include=FALSE, eval=require("CDMConnector", quietly = TRUE, warn.conflicts = FALSE, character.only = TRUE)}
# Close Andromeda objects
Andromeda::close(outputEnv)
# Close connection to CDM Reference
DBI::dbDisconnect(conn = con)
rm(defaultSettings, minEra60, splitAcuteTherapy, includeEndDate, con, cdm)
```