--- title: "Introduction to Jellyfisher" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Jellyfisher} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(jellyfisher) ``` **Jellyfisher** is an R package (a [htmlwidget](https://www.htmlwidgets.org/)) for visualizing tumor evolution and subclonal compositions using Jellyfish plots. The package is based on the [Jellyfish](https://github.com/HautaniemiLab/jellyfish) visualization tool, bringing its functionality to R users. Jellyfisher supports both [ClonEvol](https://github.com/hdng/clonevol) results and plain data frames, making it compatible with various tools and workflows. ## Input data The input data should follow specific structures for _samples_, _phylogeny_, subclonal _compositions_, and optional _ranks_. The `jellyfisher` package includes an example data set (`jellyfisher_example_tables`) based on the following publication: Lahtinen, A., Lavikka, K., Virtanen, A., et al. "Evolutionary states and trajectories characterized by distinct pathways stratify patients with ovarian high-grade serous carcinoma." _Cancer Cell_ **41**, 1103–1117.e12 (2023). DOI: [10.1016/j.ccell.2023.04.017](https://doi.org/10.1016/j.ccell.2023.04.017). ```{r echo=F, message=F} # Subset the data to keep the compiled vignette a bit smaller jellyfisher_example_tables <- jellyfisher_example_tables |> select_patients(c("EOC69", "EOC677", "EOC495", "EOC809")) ``` ### Samples - `sample` (string): specifies the unique identifier for each sample. - `displayName` (string, optional): allows for specifying a custom name for each sample. If the column is omitted, the `sample` column is used as the display name. - `rank` (integer): specifies the position of each sample in the Jellyfish plot. For example, different stages of a disease can be ranked in chronological order: diagnosis (1), interval (2), and relapse (3). The zeroth rank is reserved for the root of the sample tree. Ranks can be any integer, and unused ranks are automatically excluded from the plot. If the `rank` column is absent, ranks are assigned based on each sample’s depth in the sample tree. - `parent` (string): identifies the parent sample for each entry. Samples without a specified parent are treated as children of an imaginary root sample. ```{r} head(jellyfisher_example_tables$samples, 25) ``` ### Phylogeny - `subclone` (string): specifies subclone IDs, which can be any string. - `parent` (string): designates the parent subclone. The subclone without a parent is considered the root of the phylogeny. - `color` (string, optional): specifies the color for the subclone. If the column is omitted, colors will be generated automatically. - `branchLength` (number): specifies the length of the branch leading to the subclone. The length may be based on, for example, the number of unique mutations in the subclone. The branch length is shown in the Jellyfish plot's legend as a bar chart. It is also used when generating a phylogeny-aware color scheme. ```{r} head(jellyfisher_example_tables$phylogeny, 25) ``` ### Subclonal compositions Subclonal compositions are specified in a [tidy](https://vita.had.co.nz/papers/tidy-data.pdf) format, where each row represents a subclone in a sample. - `sample` (string): specifies the sample ID. - `subclone` (string): specifies the subclone ID. - `clonalPrevalence` (number): specifies the clonal prevalence of the subclone in the sample. The clonal prevalence is the proportion of the subclone in the sample. The clonal prevalences in a sample must sum to 1. ```{r} head(jellyfisher_example_tables$compositions, 25) ``` ### Ranks The ranks in the example data set are used to indicate the time points when the samples were acquired. - `rank` (integer): specifies the rank number. The zeroth rank is reserved for the inferred root of the sample tree. However, you are free to define a title for it. - `title` (string): specifies the title for the rank. ```{r} head(jellyfisher_example_tables$ranks, 6) ``` ## Plotting ### Basic plotting The three tables are passed to the `jellyfisher` function as a named list. The function generates an interactive Jellyfish plot based on the input data. If the data set contains multiple patients, the Jellyfisher htmlwidget shows navigation buttons to switch between patients. ```{r} jellyfisher(jellyfisher_example_tables, width = "100%", height = 450) ``` ### Plotting with custom options ```{r} jellyfisher(jellyfisher_example_tables, options = list( sampleHeight = 70, sampleTakenGuide = "none", tentacleWidth = 3, showLegend = FALSE ), width = "100%", height = 400) ``` ### Plotting a single patient When plotting multiple patients, Jellyfisher shows buttons (Previous and Next) to navigate between patients. When the data contains only one patient, these buttons are hidden. The package also provides a `select_patients` function to filter the data set with ease. ```{r} jellyfisher_example_tables |> select_patients("EOC677") |> jellyfisher(width = "100%", height = 400) ``` ## Adjusting the sample tree structure The sample trees in the example data set were constructed as follows: *"For each sample, we checked whether an earlier time point included exactly one sample from the same anatomical location. If such a sample existed, it was assigned as the parent; otherwise, the inferred root was used as the parent."* However, this mechanistic approach may not always produce credible sample trees. ### Changing parent The *r1Bow1* (bowel) sample in the following jellyfish plot is derived from an earlier bowel sample *p2Bow1_c*, which has no traces of the subclone 12. ```{r} jellyfisher_example_tables |> select_patients("EOC809") |> jellyfisher(width = "100%", height = 600) ``` Using the `set_parents` function, we can adjust the parent of the *r1Bow1* sample to be *p2Per1_cO* (peritoneum), which is a possible source of the metastasis due to its proximity. The high prevalence of subclone 12 in this sample suggests that it is the likely source of the metastasis in the *r1Bow1* sample. ```{r} jellyfisher_example_tables |> select_patients("EOC809") |> set_parents(list("EOC809_r1Bow1_DNA1" = "EOC809_p2Per1_cO_DNA2")) |> jellyfisher(width = "100%", height = 600) ``` ### Changing topology While ranks (the columns) can indicate the time points when the samples were acquired, they can also be used to simply show the sample's depth in the sample tree. For instance, the following plot shows all the samples on the same rank, indicating that they were diagnostic samples acquired at the same time. ```{r} jellyfisher_example_tables |> select_patients("EOC495") |> jellyfisher(width = "100%", height = 650) ``` However, one can argue that the LN (lymph node) samples represent a later development in the disease, and thus, they should be placed on a later rank. We can remove the existing ranks, define new parent-child relationships, and let Jellyfisher assign the ranks based on the sample tree depth. ```{r} tables <- jellyfisher_example_tables |> select_patients("EOC495") # Remove existing ranks. The ranks will be assigned automatically based # on samples' depths in the sample tree. tables$samples$rank <- NA # Rank titles should be removed as well because they are no longer valid. tables$ranks <- NULL tables |> set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1", "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |> jellyfisher(width = "100%", height = 500) ``` If we think that the lymph node samples represent an even later development, we can manually assign ranks to the samples. The `set_ranks` function provides an easy way to do this. ```{r} tables |> set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1", "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |> set_ranks(list("EOC495_pLNR_DNA1" = 2, "EOC495_pLNL1_DNA1" = 3, "EOC495_pLNL2_DNA1" = 4), default = 1) |> jellyfisher(width = "100%", height = 400) ``` ```{r echo=F} rm(tables) ``` ## Handling non-aberrant cells Typically in tumor evolution studies, the focus is on the subclonal compositions of the tumor cells. Thus, the root node in the phylogenetic tree represents the founding clone. However, the data may also contain non-aberrant cells, i.e., the tumor purity is less than 100%. For these cases, the `normalsInPhylogenyRoot` option instructs Jellyfisher to treat the root node as non-aberrant cells. The option has two consequences: (1) No tentacles are drawn between the root subclones, and (2) the root subclone is colored as white when the phylogeny-aware color scheme is used. ```{r} # Subclone N at the root represents the non-aberrant cells. # The letter N has no special meaning in Jellyfisher. non_aberrant <- list( samples = data.frame(sample = c("A", "B")), compositions = data.frame( sample = c("A", "A", "A", "B", "B", "B"), subclone = c("N", "1", "2", "N", "1", "2"), clonalPrevalence = c(0.2, 0.4, 0.4, 0.3, 0.3, 0.4) ), phylogeny = data.frame( subclone = c("N", "1", "2"), parent = c(NA, "N", "1") ) ) ``` ```{r} non_aberrant |> jellyfisher(options = list( normalsAtPhylogenyRoot = TRUE ), width = "100%", height = 350) ``` In the above plot, the root clone, which represents the non-aberrant cells, is hidden from the inferred root sample. However, sometimes a patient may have multiple independent clones, and in these cases the root clone is shown: ```{r} # Change the parent of subclone 2 to N non_aberrant$phylogeny$parent[non_aberrant$phylogeny$subclone == "2"] <- "N" non_aberrant |> jellyfisher(options = list( normalsAtPhylogenyRoot = TRUE ), width = "100%", height = 350) ``` ## Session info ```{r} sessionInfo() ```