--- title: "Introduction to forestploter" author: "Alimu Dayimu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Introduction to forestploter} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, dpi = 300, comment = "#>" ) ``` [Forest plots](https://en.wikipedia.org/wiki/Forest_plot) are commonly used in medical research publications, especially in [meta-analysis](https://en.wikipedia.org/wiki/Meta-analysis). They can also be used to report the coefficients and confidence intervals (CIs) of regression models. There are many packages available for drawing forest plots. The most popular one is [forestplot](https://CRAN.R-project.org/package=forestplot). Other packages specialized for meta-analysis include [meta](https://CRAN.R-project.org/package=meta), [metafor](https://CRAN.R-project.org/package=metafor), and [rmeta](https://CRAN.R-project.org/package=rmeta). Some packages, like [ggforestplot](https://nightingalehealth.github.io/ggforestplot/index.html), use [ggplot2](https://CRAN.R-project.org/package=ggplot2) to draw forest plots, though ggforestplot is not yet available on CRAN. The main differences between `forestploter` and other packages are: * Focuses specifically on forest plots. * Treats the forest plot as a table, where elements are aligned in rows and columns. Users have full control over what and how to display the forest plot contents. * Graphical parameters are controlled with a theme. * Allows post-hoc plot editing. * Supports CIs in multiple columns and by groups. # Basic Forest Plot The layout of the forest plot is determined by the dataset provided. Please refer to the other vignette for instructions on changing text or background, adding or inserting text, adding borders to cells, and editing the color of the CI in specific cells. ## Simple Forest Plot The first step is to prepare a `data.frame` to be used as the basic layout of the forest plot. Column names of the data will be drawn as the header, and the contents inside the data will be displayed in the forest plot. One or more blank columns without any content (blanks) should be provided to draw a confidence interval. **The width of the box to draw the CI is determined by the width of this column. Increase the number of spaces in the column to provide more space for drawing the CI.** First, we need to prepare the data for plotting. ```{r prepare-data} library(grid) library(forestploter) # Read provided sample example data dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter")) # Keep needed columns dt <- dt[, 1:6] # Indent the subgroup if there is a number in the placebo column dt$Subgroup <- ifelse(is.na(dt$Placebo), dt$Subgroup, paste0(" ", dt$Subgroup)) # Replace NA with blank or NA will be transformed to character dt$Treatment <- ifelse(is.na(dt$Treatment), "", dt$Treatment) dt$Placebo <- ifelse(is.na(dt$Placebo), "", dt$Placebo) dt$se <- (log(dt$hi) - log(dt$est)) / 1.96 # Add a blank column for the forest plot to display CI # Adjust the column width with spaces; increase the number of spaces below # to provide a larger area for drawing the CI dt$` ` <- paste(rep(" ", 20), collapse = " ") # Create a confidence interval column to display dt$`HR (95% CI)` <- ifelse(is.na(dt$se), "", sprintf("%.2f (%.2f to %.2f)", dt$est, dt$low, dt$hi)) head(dt) ``` The data prepared above will be used as the basic layout of the forest plot. The example below demonstrates how to draw a simple forest plot. A footnote is added as a demonstration. ```{r simple-plot, out.width="80%", fig.width = 8, fig.height = 6} p <- forest(dt[, c(1:3, 8:9)], est = dt$est, lower = dt$low, upper = dt$hi, sizes = dt$se, ci_column = 4, ref_line = 1, arrow_lab = c("Placebo Better", "Treatment Better"), xlim = c(0, 4), ticks_at = c(0.5, 1, 2, 3), footnote = "This is the demo data. Please feel free to change\nanything you want.") # Print plot plot(p) ``` ## Change Theme We will now use the same data as above and add a summary point. Additionally, we will change the graphical parameters for the confidence interval and other parts of the plot. The theme of the forest plot can be adjusted with the `forest_theme` function. Check the manual for more details. ```{r simple-plot-theme, out.width="80%", fig.width = 7, fig.height = 3.3} dt_tmp <- rbind(dt[-1, ], dt[1, ]) dt_tmp[nrow(dt_tmp), 1] <- "Overall" dt_tmp <- dt_tmp[1:11, ] # Define theme tm <- forest_theme(base_size = 10, # Confidence interval point shape, line type/color/width ci_pch = 15, ci_col = "#762a83", ci_fill = "black", ci_alpha = 0.8, ci_lty = 1, ci_lwd = 1.5, ci_Theight = 0.2, # Set a T end at the end of CI # Reference line width/type/color refline_gp = gpar(lwd = 1, lty = "dashed", col = "grey20"), # Vertical line width/type/color vertline_lwd = 1, vertline_lty = "dashed", vertline_col = "grey20", # Change summary color for filling and borders summary_fill = "#4575b4", summary_col = "#4575b4", # Footnote font size/face/color footnote_gp = gpar(cex = 0.6, fontface = "italic", col = "blue")) pt <- forest(dt_tmp[, c(1:3, 8:9)], est = dt_tmp$est, lower = dt_tmp$low, upper = dt_tmp$hi, sizes = dt_tmp$se, is_summary = c(rep(FALSE, nrow(dt_tmp) - 1), TRUE), ci_column = 4, ref_line = 1, arrow_lab = c("Placebo Better", "Treatment Better"), xlim = c(0, 4), ticks_at = c(0.5, 1, 2, 3), footnote = "This is the demo data. Please feel free to change\nanything you want.", theme = tm) # Print plot plot(pt) ``` ## Text Justification and Background with Theme By default, all cells are left-aligned. However, it is possible to justify any cells in the forest plot by setting parameters in `forest_theme`. Set `core = list(fg_params = list(hjust = 0, x = 0))` to left-align content, and `rowhead = list(fg_params = list(hjust = 0.5, x = 0.5))` to center the header. Set `hjust = 1` and `x = 0.9` to right-align text. **You can also change the justification of text with `edit_plot`. See details in another vignette.** The same rule applies to changing the background color by setting `core = list(bg_params = list(fill = c("#edf8e9", "#c7e9c0", "#a1d99b")))`. Change settings in `core` if you want to change graphical parameters of contents, and `colhead` for the header. Change settings in `fg_params` to modify the text. See parameters for `textGrob()` in the `grid` package. Change `bg_params` to modify settings for background graphical parameters. See `gpar()` in the `grid` package. You should pass parameters as a list. More details can be found [here](https://CRAN.R-project.org/package=gridExtra/vignettes/tableGrob.html). Provide a single value if you want cells to have the same justification or a vector for each cell. As you can see, the second example justifies text by row using the provided vector, and the vector will be recycled. ```{r text-justification, out.width="80%", fig.width = 7, fig.height = 2} dt <- dt[1:4, ] # Header center and content right tm <- forest_theme(core = list(fg_params = list(hjust = 1, x = 0.9), bg_params = list(fill = c("#edf8e9", "#c7e9c0", "#a1d99b"))), colhead = list(fg_params = list(hjust = 0.5, x = 0.5))) p <- forest(dt[, c(1:3, 8:9)], est = dt$est, lower = dt$low, upper = dt$hi, sizes = dt$se, ci_column = 4, title = "Header center content right", theme = tm) # Print plot plot(p) # Mixed justification tm <- forest_theme(core = list(fg_params = list(hjust = c(1, 0, 0, 0.5), x = c(0.9, 0.1, 0, 0.5)), bg_params = list(fill = c("#f6eff7", "#d0d1e6", "#a6bddb", "#67a9cf"))), colhead = list(fg_params = list(hjust = c(1, 0, 0, 0, 0.5), x = c(0.9, 0.1, 0, 0, 0.5)))) p <- forest(dt[, c(1:3, 8:9)], est = dt$est, lower = dt$low, upper = dt$hi, sizes = dt$se, ci_column = 4, title = "Mixed justification", theme = tm) plot(p) ``` ## Text Parsing Similar to text justification, you can parse text in any cells. Parsing all text will remove the blanks in the data, which will also apply to the blank columns used to draw the whisker. ```{r text-parsing, out.width="80%", fig.width = 7, fig.height = 2} # Check out the `plotmath` function for math expression. dt <- data.frame( Study = c("Study ~1^a", "Study ~2^b", "NO[2]"), low = c(0.2, -0.03, 1.11), est = c(0.71, 0.35, 1.79), hi = c(1.22, 0.74, 2.47) ) dt$SMD <- sprintf("%.2f (%.2f, %.2f)", dt$est, dt$low, dt$hi) dt$` ` <- paste(rep(" ", 20), collapse = " ") fig_dt <- dt[, c(1, 5:6)] # Get a matrix of which row and columns to parse parse_mat <- matrix(FALSE, nrow = nrow(fig_dt), ncol = ncol(fig_dt)) # Here we want to parse the first column only, you can amend this to whatever you want. parse_mat[, 1] <- TRUE # Remove this if you don't want to parse the column head. tm <- forest_theme(colhead = list(fg_params = list(parse = TRUE)), core = list(fg_params = list(parse = parse_mat))) p <- forest(fig_dt, est = dt$est, lower = dt$low, upper = dt$hi, ci_column = 3, theme = tm) # Add customized footnote. # Due to the limitation of the textGrob, passing a parsed text with linebreak # has some issues. We use a different approach here. txt <- "a This is study A
b This is study B" add_grob(p, row = 4, col = 1:2, order = "background", gb_fn = gridtext::richtext_grob, text = txt, gp = gpar(fontsize = 8), hjust = 0, vjust = 1, halign = 0, valign = 1, x = unit(0, "npc"), y = unit(1, "npc")) ``` # Multiple CI Columns Sometimes one may want to have multiple CI columns, each column representing a different outcome. If this is the case, one only needs to provide a vector of the positions of the columns to be drawn in the data. If the number of columns provided to draw the CI columns is the same as the number of `est`, one CI will be drawn into each CI column. If the number of columns provided is less than the number of `est`, the extra `est` will be considered as a group and will be drawn to the CI columns sequentially. In the latter case, the group number equals the number of `est` divided by the number of `ci_column`, and multiple columns will be drawn into one cell. As seen in the example below, the CI will be drawn in columns 3 and 5. The first and second elements in `est`, `lower`, and `upper` will be drawn in columns 3 and column 5. In a multiple groups example, two or more CIs in one cell. The solution is simple: provide all the values sequentially to `est`, `lower`, and `upper`. This means that the first `n` elements in the `est`, `lower`, and `upper` are considered as the same group, and the same for the next `n` elements. The `n` is determined by the number of `ci_column`. As shown in the example below, `est_gp1` and `est_gp2` will be drawn in column 3 and column 5, considered as **group 1**. The `est_gp3` and `est_gp4` will be drawn in column 3 and column 5, considered as **group 2**. This is an example of multiple CI columns and groups: ```{r multiple-group, out.width="80%", fig.width = 8, fig.height = 5} dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter")) dt <- dt[1:7, ] # Indent the subgroup if there is a number in the placebo column dt$Subgroup <- ifelse(is.na(dt$Placebo), dt$Subgroup, paste0(" ", dt$Subgroup)) # Replace NA with blank or NA will be transformed to character dt$n1 <- ifelse(is.na(dt$Treatment), "", dt$Treatment) dt$n2 <- ifelse(is.na(dt$Placebo), "", dt$Placebo) # Add two blank columns for CI dt$`CVD outcome` <- paste(rep(" ", 20), collapse = " ") dt$`COPD outcome` <- paste(rep(" ", 20), collapse = " ") # Generate point estimation and 95% CI. Paste two CIs together and separate by line break. dt$ci1 <- paste(sprintf("%.1f (%.1f, %.1f)", dt$est_gp1, dt$low_gp1, dt$hi_gp1), sprintf("%.1f (%.1f, %.1f)", dt$est_gp3, dt$low_gp3, dt$hi_gp3), sep = "\n") dt$ci1[grepl("NA", dt$ci1)] <- "" # Any NA to blank dt$ci2 <- paste(sprintf("%.1f (%.1f, %.1f)", dt$est_gp2, dt$low_gp2, dt$hi_gp2), sprintf("%.1f (%.1f, %.1f)", dt$est_gp4, dt$low_gp4, dt$hi_gp4), sep = "\n") dt$ci2[grepl("NA", dt$ci2)] <- "" # Set-up theme tm <- forest_theme(base_size = 10, refline_lty = "solid", ci_pch = c(15, 18), ci_col = c("#377eb8", "#4daf4a"), footnote_gp = gpar(col = "blue"), legend_name = "Group", legend_value = c("Trt 1", "Trt 2"), vertline_lty = c("dashed", "dotted"), vertline_col = c("#d6604d", "#bababa"), # Table cell padding, width 4 and heights 3 core = list(padding = unit(c(4, 3), "mm"))) p <- forest(dt[, c(1, 19, 23, 21, 20, 24, 22)], est = list(dt$est_gp1, dt$est_gp2, dt$est_gp3, dt$est_gp4), lower = list(dt$low_gp1, dt$low_gp2, dt$low_gp3, dt$low_gp4), upper = list(dt$hi_gp1, dt$hi_gp2, dt$hi_gp3, dt$hi_gp4), ci_column = c(4, 7), ref_line = 1, vert_line = c(0.5, 2), nudge_y = 0.4, theme = tm) plot(p) ``` It is obvious that the `forest` uses whatever you provided as the skeleton of the forest plot. You can use your imagination and put whatever you want in a cell, including line breaks. Please check out the other vignette to modify the alignment of the text. # Different Parameters for Different CI Columns If the desired forest plot has multiple columns, some may want to have different settings for different columns. For example, different CI columns may have different `xlim`, x-axis ticks, x-axis labels, `x_trans`, reference lines, vertical lines, or arrow labels. This can be easily done by providing a list or vector. Provide a list for `xlim`, `vert_line`, `arrow_lab`, and `ticks_at`, and an atomic vector for `xlab`, `x_trans`, and `ref_line`. See the example below. ```{r multiple-param, out.width="70%", fig.width = 10, fig.height = 6.5} dt$`HR (95% CI)` <- ifelse(is.na(dt$est_gp1), "", sprintf("%.2f (%.2f to %.2f)", dt$est_gp1, dt$low_gp1, dt$hi_gp1)) dt$`Beta (95% CI)` <- ifelse(is.na(dt$est_gp2), "", sprintf("%.2f (%.2f to %.2f)", dt$est_gp2, dt$low_gp2, dt$hi_gp2)) tm <- forest_theme(arrow_type = "closed", arrow_label_just = "end") p <- forest(dt[, c(1, 21, 23, 22, 24)], est = list(dt$est_gp1, dt$est_gp2), lower = list(dt$low_gp1, dt$low_gp2), upper = list(dt$hi_gp1, dt$hi_gp2), ci_column = c(2, 4), ref_line = c(1, 0), vert_line = list(c(0.3, 1.4), c(0.6, 2)), x_trans = c("log", "none"), arrow_lab = list(c("L1", "R1"), c("L2", "R2")), xlim = list(c(0, 3), c(-1, 3)), ticks_at = list(c(0.1, 0.5, 1, 2.5), c(-1, 0, 2)), xlab = c("OR", "Beta"), nudge_y = 0.2, theme = tm) plot(p) ``` # Custom CIs It is possible to pass a custom CI drawing function to `forest`. The `fn_ci` accepts the CI drawing function for normal confidence intervals and `fn_summary` for summary CI. Other parameters for those functions can be passed via `forest`. If you need to pass row values as `est` and `lower` to those functions, you need to define the name of the parameters you have passed via `index_args`. This is an advanced technique, and the purpose of this vignette is not to show how to create a function to draw CI, but you can find some tutorials [here](https://www.stat.auckland.ac.nz/~paul/RG3e/chapter8.html) if you are interested. Below is an example of the usage for a box plot CI with the built-in `make_boxplot` function. ```{r custom-ci, out.width="70%", fig.width = 3, fig.height = 3} # Function to calculate Box plot values box_func <- function(x){ iqr <- IQR(x) q3 <- quantile(x, probs = c(0.25, 0.5, 0.75), names = FALSE) c("min" = q3[1] - 1.5 * iqr, "q1" = q3[1], "med" = q3[2], "q3" = q3[3], "max" = q3[3] + 1.5 * iqr) } # Prepare data val <- split(ToothGrowth$len, list(ToothGrowth$supp, ToothGrowth$dose)) val <- lapply(val, box_func) dat <- do.call(rbind, val) dat <- data.frame(Dose = row.names(dat), dat, row.names = NULL) dat$Box <- paste(rep(" ", 20), collapse = " ") # Draw a single group box plot tm <- forest_theme(ci_Theight = 0.2) p <- forest(dat[, c(1, 7)], est = dat$med, lower = dat$min, upper = dat$max, # sizes = sizes, fn_ci = make_boxplot, ci_column = 2, lowhinge = dat$q1, uphinge = dat$q3, hinge_height = 0.2, # values of the lowhinge and uphinge will be used as row values index_args = c("lowhinge", "uphinge"), gp_box = gpar(fill = "black", alpha = 0.4), theme = tm ) p ``` # Saving Plot One can use the base method or the `ggsave` function to save the plot. For the `ggsave` function, please don't ignore the `plot` parameter. The width and height should be tuned to get the desired plot. You can also set `autofit = TRUE` in the `print` or `plot` function to auto-fit the plot, but this may change and not be as compact as it should be. ```{r eval=FALSE} # Base method png('rplot.png', res = 300, width = 7.5, height = 7.5, units = "in") p dev.off() # ggsave function ggplot2::ggsave(filename = "rplot.png", plot = p, dpi = 300, width = 7.5, height = 7.5, units = "in") ``` Or you can get the width and height of the forest plot with `get_wh`, and use this width and height for saving. ```{r eval=FALSE} # Get width and height p_wh <- get_wh(plot = p, unit = "in") png('rplot.png', res = 300, width = p_wh[1], height = p_wh[2], units = "in") p dev.off() # Or get scale get_scale <- function(plot, width_wanted, height_wanted, unit = "in"){ h <- convertHeight(sum(plot$heights), unit, TRUE) w <- convertWidth(sum(plot$widths), unit, TRUE) max(c(w / width_wanted, h / height_wanted)) } p_sc <- get_scale(plot = p, width_wanted = 6, height_wanted = 4, unit = "in") ggplot2::ggsave(filename = "rplot.png", plot = p, dpi = 300, width = 6, height = 4, units = "in", scale = p_sc) ``` # FAQs **Q: The whisker/CI plot area is too narrow. Please help!** **A:** I have to admit that the vignettes were not well written, but you should be able to get the idea if you look at the vignette carefully and check the examples. Increase the widths by having more blank space in the column where the CI is to be drawn. Please check out the first example for how to do this. **Q: Can I modify the width and height of each row and column?** **A:** Yes, although the content of the data decides the heights and widths of the rows and columns, you can also modify these after you have finished plotting. See [this here for details](https://github.com/adayim/forestploter/issues/30#issuecomment-1459038988). You can also use `core = list(padding = unit(c(4, 3), "mm"))` in the `forest_theme` to add some padding to the width and height of each cell. **Q: How should I use weight for sizes?** **A:** The `forest` function will not transform the size, so it will be used as it is. If you want to weigh the size on your own, check [here for some options](https://github.com/adayim/forestploter/issues/37#issuecomment-1450208581). **Q: How can I plot a grouped forest plot?** **A:** You can leave a few blank lines to indicate the group break. You can also use `arrangeGrob` from the `gridExtra` package or `wrap_elements` from `patchwork` to combine two or more forest plots.