--- title: "The `mc2d` package" author: "R. POUILLOT, M.-L. DELIGNETTE-MULLER, D.L. KELLY & J.-B. DENIS" output: pdf_document: toc: true number_sections: true citation_package: natbib geometry: "scale=0.8, centering" biblio-style: plain bibliography: docmcEnglish.bib vignette: > %\VignetteIndexEntry{A Manual for mc2d: Tools for Two-Dimensional Monte-Carlo Simulations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(dpi = 96, fig.width = 6, fig.height = 4, dev = "pdf") ``` This documentation is intended for readers with: - A medium level of experience in R. Please refer to *An Introduction to R* available with the R distribution if needed. - Some knowledge about Monte-Carlo simulation and Quantitative Risk Assessment (QRA). The definitive reference for all function arguments remains the package documentation. It is recommended to try the examples while reading. # Introduction ## What is `mc2d`? `mc2d` means Two-Dimensional Monte-Carlo (*Monte-Carlo à 2 Dimensions*). This package provides: - additional probability distributions; - tools to construct One-Dimensional and Two-Dimensional Monte-Carlo simulations; - tools to analyse One-Dimensional and Two-Dimensional Monte-Carlo simulations. In a previous version, distribution-fitting tools were included; they have since been moved to the separate package `fitdistrplus` \cite{FITDISTRPLUS}. `mc2d` was built for QRA in the Food Safety domain but can be used in other frameworks. ## What is Two-Dimensional Monte-Carlo Simulation (briefly)? The following text and Figure \ref{fig:mc2d} are adapted from \cite{POUILLOT-2004} and \cite{POUILLOT-2007}. The principal reference remains \cite{CULLEN-FREY-1999}. A QRA should reflect the **variability** in the risk and take into account the **uncertainty** associated with the risk estimate. Variability represents temporal, geographical and/or individual heterogeneity of the risk across the population. Uncertainty stems from a lack of perfect knowledge about the QRA model structure and parameters.[^1] [^1]: In the engineering risk community, these are called *aleatoric uncertainty* for variability and *epistemic uncertainty* for uncertainty. A two-dimensional Monte-Carlo simulation separates variability and uncertainty by sampling their respective distributions independently \cite{CULLEN-FREY-1999}. The method proceeds as follows (see Figure \ref{fig:mc2d}): 1. Parameters are divided into four categories: *fixed*, *variable* (variability only), *uncertain* (uncertainty only), and *variable and uncertain*. For the last category, a hierarchical structure is required, e.g.: $$r \mid a,b \sim N(a,b)$$ where the normal distribution represents variability in $r$ conditional on uncertain parameters $a$ and $b$, with hyper-distributions such as $a \sim \text{Unif}(l_a, u_a)$ and $b \sim \text{Unif}(l_b, u_b)$. 2. A set of uncertain parameters is randomly sampled from their distributions. 3. The model is evaluated with a one-dimensional Monte-Carlo simulation of size $N_v$, treating uncertain parameters as fixed. Statistics (mean, SD, percentiles) are computed and stored. 4. Steps 2--3 are repeated $N_u$ times. 5. The $50^\text{th}$ percentile (median) of each statistic is the point estimate; the $2.5^\text{th}$ and $97.5^\text{th}$ percentiles constitute the 95% credible interval. \begin{figure} \noindent \begin{centering} \includegraphics[angle=270, width=1\textwidth]{mc2dsaumon} \par\end{centering} \caption{\label{fig:mc2d}Schematic Representation of a Two-Dimensional Monte-Carlo Simulation.} \end{figure} More formally, in its simplest version the framework is a chain of three random variables: $$p \rightarrow \pi \rightarrow Y$$ with marginal distribution $[p]$ and conditional distributions $[\pi \mid p]$ and $[Y \mid \pi]$, under the assumption $[p,\pi,Y]=[p][\pi \mid p][Y \mid \pi]$. - $Y$ is the variable of interest. - $\pi$ is the parameter governing the distribution of $Y$; its uncertainty is characterised through $[\pi \mid p]$. - $p$ is the parameter governing the uncertainty of $\pi$; its distribution $[p]$ is assumed known. A two-dimensional Monte-Carlo simulation provides a bundle of $N_u$ distributions $[Y \mid \pi=\pi_i]$ where $\pi_i, i=\{1,\ldots,N_u\}$ are drawn from $[\pi \mid p]$. `mc2d` uses arrays of (at least) two dimensions: the first dimension reflects variability, the second reflects uncertainty. ## A basic example {#sec:Example1} **Quantitative Risk Assessment: *Escherichia coli* O157:H7 infection linked to the consumption of frozen ground beef in \<3 year old children.** *Warning*: the data are fictitious and the model is over-simplified; results should not be interpreted. The model assumes: - *E. coli* O157:H7 are randomly distributed in a batch with mean concentration $c=10$ CFU/g. - No bacterial growth occurs (product is frozen until cooked just before consumption). - 2.7% of consumers cook their beef rare, 37.3% medium, and 60.0% well done. - Bacterial inactivation by cooking: no inactivation (rare), 1/5 surviving (medium), 1/50 surviving (well done). - Serving size $s$ for \<3 year children follows a Gamma distribution: $shape=3.93$, $rate=0.0806$. - The dose-response is a one-hit model with probability of illness per hit $r=0.001$. *What is the risk of illness in the population that consumed the contaminated lot?* ### One-Dimensional Monte-Carlo Simulation All distributions represent variability only: $$c = 10, \quad i \sim emp(\{1, 1/5, 1/50\},\{0.027, 0.373, 0.600\}), \quad s \sim \text{Gamma}(3.93, 0.0806)$$ $$n \sim \text{Poisson}(c \times i \times s), \quad r = 0.001, \quad P = 1-(1-r)^n$$ ```{r sh0, echo=FALSE} set.seed(666) options("width"=90, "digits"=3) knitr::opts_chunk$set(fig.path = "figs/") ``` ```{r sh1} library(mc2d) ndvar(1001) conc <- 10 cook <- mcstoc(rempiricalD, values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, shape = 3.93, rate = 0.0806) expo <- conc * cook * serving dose <- mcstoc(rpois, lambda = expo) r <- 0.001 risk <- 1 - (1 - r)^dose EC1 <- mc(cook, serving, expo, dose, risk) print(EC1) summary(EC1) ``` The `print` output shows for each node: the variable mode, `nsv` (variability dimension size), `nsu` (uncertainty dimension size), `nva` (number of variates), basic statistics (min, mean, median, max), number of missing values (`Nas`), the node type (`0` = fixed, `V` = variable, `U` = uncertain, `VU` = variable and uncertain), and the output level `outm`. This one-dimensional simulation gives a mean risk of approximately 5%, with 2.5% of the population having a risk greater than 20.3%. ### Two-Dimensional Monte-Carlo Simulation Adding uncertainty: the mean concentration $c$ follows $N(10,2)$ and the parameter $r$ follows $\text{Unif}(0.0005, 0.0015)$: $$c \sim N(10,2), \quad i,s \text{ as before}, \quad n \sim \text{Poisson}(c \times i \times s), \quad r \sim \text{Unif}(0.0005, 0.0015), \quad P = 1-(1-r)^n$$ ```{r sh2} ndunc(101) conc <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) cook <- mcstoc(rempiricalD, type = "V", values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) expo <- conc * cook * serving dose <- mcstoc(rpois, type = "VU", lambda = expo) r <- mcstoc(runif, type = "U", min = 0.0005, max = 0.0015) risk <- 1 - (1 - r)^dose EC2 <- mc(conc, cook, serving, expo, dose, r, risk) print(EC2, digits = 2) summary(EC2) ``` The `type` argument indicates whether a distribution represents variability (`"V"`, default), uncertainty (`"U"`), or both (`"VU"`). The median of the 101 simulations gives a best estimate of 0.0457 with a 95% credible interval of [0.0238, 0.0794]. # Basic Principles and Functions A typical `mc2d` session: 1. Choose an empirical or parametric distribution for each parameter (`fitdistrplus` \cite{FITDISTRPLUS} is a convenient fitting tool). 2. Construct an `mcnode` object for each parameter (`mcdata`, `mcstoc`). 3. Group `mcnode` objects into an `mc` object (`mc`). 4. Study the `mc` object via summaries, graphs, and sensitivity analysis (`summary.mc`, `plot.mc`, `tornado`, `tornadounc`). ## Preliminary Step Load `mc2d` at the start of your R session with `library(mc2d)`. Set the default Monte-Carlo dimensions with `ndvar()` (variability) and `ndunc()` (uncertainty). ## The `mcnode` Object ### `mcnode` Object Structure An `mcnode` is the basic element of an `mc` object — it is associated with one variable, while an `mc` is a set of associated variables. An `mcnode` is an array of dimension $(nsv \times nsu \times nvariates)$.[^2] Four types exist: [^2]: In this section we only consider univariate nodes with $nvariates=1$. - **`V` mcnode** (Variability): dimension $(nsv \times 1 \times nvariates)$. - **`U` mcnode** (Uncertainty): dimension $(1 \times nsu \times nvariates)$. - **`VU` mcnode** (Variability and Uncertainty): dimension $(nsv \times nsu \times nvariates)$. - **`0` mcnode** (Neither): dimension $(1 \times 1 \times nvariates)$. Not needed in the univariate context but useful for multivariate nodes (Section \ref{sec:Multivar}). \begin{figure} \noindent \begin{centering} \includegraphics[angle=270, width=.7\textwidth]{Illustration} \par\end{centering} \caption{\label{fig:Structure}Structure of the various `mcnode` objects.} \end{figure} Ways to construct an `mcnode`: 1. `mcstoc` — from random number generating functions. 2. `mcdata` — from data. 3. Direct operations on `mcnode` objects. 4. `mcprobtree` — from a mixture of distributions (probability tree). 5. Functions such as `==`, `>`, `is.na`, `is.finite` applied to an existing `mcnode`. ### The `mcstoc` function {#sub:mcstoc} ``` mcstoc(func=runif, type=c("V","U","VU","0"), ..., nsv=ndvar(), nsu=ndunc(), nvariates=1, outm="each", nsample="n", seed=NULL, rtrunc=FALSE, linf=-Inf, lsup=Inf, lhs=FALSE) ``` - `func`: random-data function or its name as a character. Table \ref{tab:Distributions} lists available distributions. - `type`: type of `mcnode` to build. Default: `"V"`. - `...`: arguments passed to `func` (except the sample size). *All must be named.* - `nsv`, `nsu`: number of samples in variability/uncertainty dimensions. - `nvariates`: number of variates (Section \ref{sec:Multivar}). - `outm`: default output for multivariate nodes (Section \ref{sec:Multivar}). - `nsample`: name of the sample-size argument of `func` (usually `"n"`; use `"nn"` for `rhyper` and `rwilcox`). - `seed`: random seed. If `NULL`, seed is unchanged. - `rtrunc`: truncate the distribution between `linf` and `lsup`. - `lhs`: use Latin Hypercube Sampling. \begin{table} \caption{\label{tab:Distributions}Available distributions} \centering{}\begin{tabular}{|c|c|c|c|c|c|c|} \hline Package & Distribution & Function & Size arg. & Other parameters & trunc & lhs\tabularnewline \hline\hline `stats` & beta & `rbeta` & `n` & `shape1, shape2, ncp` & Y & Y\tabularnewline\hline & binomial & `rbinom` & `n` & `size, prob` & Y & Y\tabularnewline\hline & Cauchy & `rcauchy` & `n` & `location, scale` & Y & Y\tabularnewline\hline & chi-squared & `rchisq` & `n` & `df, ncp` & Y & Y\tabularnewline\hline & exponential & `rexp` & `n` & `rate` & Y & Y\tabularnewline\hline & F & `rf` & `n` & `df1, df2, ncp` & Y & Y\tabularnewline\hline & gamma & `rgamma` & `n` & `shape, rate (or scale)` & Y & Y\tabularnewline\hline & geometric & `rgeom` & `n` & `prob` & Y & Y\tabularnewline\hline & hypergeometric & `rhyper` & `nn` & `m, n, k` & Y & Y\tabularnewline\hline & lognormal & `rlnorm` & `n` & `meanlog, sdlog` & Y & Y\tabularnewline\hline & logistic & `rlogis` & `n` & `location, scale` & Y & Y\tabularnewline\hline & neg.\ binomial & `rnbinom` & `n` & `size, prob (or mu)` & Y & Y\tabularnewline\hline & normal & `rnorm` & `n` & `mean, sd` & Y & Y\tabularnewline\hline & Poisson & `rpois` & `n` & `lambda` & Y & Y\tabularnewline\hline & Student's t & `rt` & `n` & `df, ncp` & Y & Y\tabularnewline\hline & uniform & `runif` & `n` & `min, max` & Y & Y\tabularnewline\hline & Weibull & `rweibull` & `n` & `shape, scale` & Y & Y\tabularnewline\hline & Wilcoxon & `rwilcox` & `nn` & `m, n` & Y & Y\tabularnewline\hline `mc2d` & Bernoulli & `rbern` & `n` & `prob` & Y & Y\tabularnewline\hline & emp.\ discrete & `rempiricalD`& `n` & `values, prob` & Y & Y\tabularnewline\hline & emp.\ cont. & `rempiricalC`& `n` & `min, max, values, prob` & Y & Y\tabularnewline\hline & PERT & `rpert` & `n` & `min, mode, max, shape` & Y & Y\tabularnewline\hline & triangular & `rtriang` & `n` & `min, mode, max` & Y & Y\tabularnewline\hline & gen.\ beta & `rbetagen` & `n` & `shape1,shape2,min,max,ncp` & Y & Y\tabularnewline\hline & multinomial & `rmultinomial`& `n` & `size, prob` & N & N\tabularnewline\hline & Dirichlet & `rdirichlet` & `n` & `alpha` & N & N\tabularnewline\hline & mv.\ normal & `rmultinormal`& `n` & `mean, sigma` & N & N\tabularnewline\hline & beta subjective & `rbetasubj` & `n` & `min, mode, mean, max` & Y & Y\tabularnewline\hline & min.\ info. & `rmqi` & `n` & `mqi, mqi.quantile, ...` & Y & Y\tabularnewline\hline \end{tabular} \end{table} In our example, `mcstoc` specified `conc` (normal), `cook` (empirical discrete), `serving` (gamma), and `dose` (Poisson with `mcnode` argument `lambda`). ```{r sh3, eval=FALSE} conc <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) cook <- mcstoc(rempiricalD, type = "V", values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) dose <- mcstoc(rpois, type = "VU", lambda = expo) r <- mcstoc(runif, type = "U", min = 0.0005, max = 0.0015) ``` A normal distribution $N(2,3)$ truncated on $[1.5, 2]$ with Latin Hypercube Sampling:[^3] [^3]: The mean and SD of the non-truncated distribution are not preserved after truncation. ```{r sh3bis} x <- mcstoc(rnorm, mean = 2, sd = 3, rtrunc = TRUE, linf = 1.5, lsup = 2, lhs = TRUE) summary(x) ``` Additional distributions in `mc2d`: `rbern` (Bernoulli), `rempiricalD` (empirical discrete), `rpert` \cite{VOSE-2000}, `rtriang`, `rdirichlet`, `rmultinormal`, `rmqi` (min. quantile info.). The multinomial distribution is vectorized as `rmultinomial` (use instead of `stats::rmultinom`). ### The `mcdata` function {#sub:mcnode} ``` mcdata(data, type=c("V","U","VU","0"), nsv=ndvar(), nsu=ndunc(), nvariates=1, outm="each") ``` See the function documentation for accepted data sizes and types. Example — placing `TRUE` in a `"U"` node for half the simulations: ```{r sh3ter} nu <- ndunc() tmp <- (1:nu) > (nu/2) mcdata(tmp, type = "U") ``` ### Operations on an `mcnode` `mcnode` objects support arithmetic operations with coherent type propagation (Table \ref{tab:Ops}).[^4] [^4]: These rules differ from standard R recycling rules. - `0` + `0` = `0`; `0` + `V` = `V`; `0` + `U` = `U`; `0` + `VU` = `VU` - `V` + `V` = `V`; `V` + `U` = `VU`[^5]; `V` + `VU` = `VU`[^6] - `U` + `U` = `U`; `U` + `VU` = `VU`[^7]; `VU` + `VU` = `VU` [^5]: `U` is recycled by row; `V` classically by column. [^6]: `V` recycled by column. [^7]: `U` recycled by row. \begin{table} \caption{\label{tab:Ops}`mcnode` combination types} \centering{}\begin{tabular}{|c||c|c|c|c|} \hline & 0 & V & U & VU\tabularnewline\hline\hline 0 & 0 & V & U & VU\tabularnewline\hline V & V & V & VU & VU\tabularnewline\hline U & U & VU & U & VU\tabularnewline\hline VU & VU & VU & VU & VU\tabularnewline\hline \end{tabular} \end{table} ```{r sh4, eval=FALSE} expo <- conc * cook * serving # U * V * V → VU risk <- 1 - (1 - r)^dose # U * VU → VU ``` ### The `mcprobtree` function {#sub:The-mcprobtree-function} `mcprobtree` builds an `mcnode` as a mixture of distributions. If the microbiologists are 75% sure that $conc \sim N(10,2)$ and 25% sure that $conc \sim U(8,12)$:[^8] [^8]: Alternatives for `whichdist`: `mcstoc(rempiricalD, type="U", values=c(0,1), prob=c(75,25))` or `mcstoc(rbern, type="U", prob=0.25)`. ```{r sh4b} conc1 <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) conc2 <- mcstoc(runif, type = "U", min = 8, max = 12) whichdist <- c(0.75, 0.25) concbis <- mcprobtree(whichdist, list("0" = conc1, "1" = conc2), type = "U") ``` `mcprobtree` can also generate samples from a mixture for variability. ### Other functions for constructing an `mcnode` Comparison operators (`==`, `<`, `<=`, `>=`, `>`) generate an `mcnode` when applied to one. Special functions `is.na`, `is.nan`, `is.finite`, `is.infinite` are also implemented. ```{r sh5} cook < 1 suppressWarnings(tmp <- log(mcstoc(runif, min = -1, max = 1))) tmp is.na(tmp) ``` ### Specifying a correlation between `mcnode`s A Spearman rank correlation structure between 2 or more nodes can be specified with `cornode`, which uses the Iman & Conover method \cite{IMAN-CONOVER-1982}.[^9] [^9]: The resulting correlation (\~0.4) is an approximation because a discrete distribution (`cook`: 3 categories) is correlated with a continuous distribution (`serving`). ```{r sh5bis} cornode(cook, serving, target = 0.5, result = TRUE) ``` Correlations can be specified between `V`, `U`, or `VU` nodes, or between one `V` node and multiple `VU` nodes. A multivariate normal distribution (`rmultinormal`) is another way to specify correlations assuming normality. ## The `mc` Object Once `mcnode` objects are constructed, group them into an `mc` object for analysis. An `mc` is a list of `mcnode`s. It can be constructed with `mc()`, `evalmcmod()`, or within `evalmccut()`. ### The `mc` function ``` mc(..., name=NULL, devname=FALSE) ``` `...` are `mcnode` or `mc` objects. In our example: ```{r sh6, eval=FALSE} EC2 <- mc(conc, cook, serving, expo, dose, r, risk) print(EC2) summary(EC2) ``` ### The `mcmodel` and `evalmcmod` functions `mcmodel` wraps a model expression for later evaluation with `evalmcmod`. Use this once the model is validated, to re-run it with different dimensions or seeds in one line. ```{r sh10} modelEC3 <- mcmodel({ conc <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) cook <- mcstoc(rempiricalD, type = "V", values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) r <- mcstoc(runif, type = "U", min = 0.0005, max = 0.0015) expo <- conc * cook * serving dose <- mcstoc(rpois, type = "VU", lambda = expo) risk <- 1 - (1 - r)^dose mc(conc, cook, serving, expo, dose, r, risk) }) modelEC3 ``` Notes: - The model is wrapped between `{` and `}`. - Any valid R code is allowed inside.[^10] - The model must end with an `mc()` call. [^10]: The simulation dimensions can be referenced via `ndvar()` and `ndunc()` if needed. ``` evalmcmod(expr, nsv=ndvar(), nsu=ndunc(), seed=NULL) ``` ```{r sh11, eval=FALSE} EC3 <- evalmcmod(modelEC3, nsv = 100, nsu = 10, seed = 666) EC4 <- evalmcmod(modelEC3, nsv = 100, nsu = 1000, seed = 666) ``` ### The `mcmodelcut` and `evalmccut` functions For high-dimensional models that exceed R's memory limit, `evalmccut` evaluates the model in a loop over the uncertainty dimension, computing and storing statistics at each step.[^11] [^11]: Using a `tornado` inside the model should be avoided as it slows `evalmccut` considerably. ```{r sh12, eval=FALSE} modEC4 <- mcmodelcut({ ## First block: unidimensional nodes { cook <- mcstoc(rempiricalD, type = "V", values = c(0, 1/5, 1/50), prob = c(0.027, 0.373, 0.6)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) conc <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) r <- mcstoc(runif, type = "U", min = 5e-04, max = 0.0015) } ## Second block: two-dimensional nodes → mc object { expo <- conc * cook * serving dose <- mcstoc(rpois, type = "VU", lambda = expo) risk <- 1 - (1 - r)^dose res <- mc(conc, cook, serving, expo, dose, r, risk) } ## Third block: statistics { list( sum = summary(res), plot = plot(res, draw = FALSE), minmax = lapply(res, range), tor = tornado(res), et = sapply(res, sd) ) } }) res <- evalmccut(modEC4, nsv = 10001, nsu = 101, seed = 666) summary(res) ``` ## Analysing an `mc` Object ### The `summary` function ``` summary(object, probs=c(0,0.025,0.25,0.5,0.75,0.975,1), lim=c(0.025,0.975), ...) ``` Quantiles in `probs` are evaluated in the variability dimension; then the median and `lim` quantiles are evaluated across those statistics. ```{r sh14} tmp <- summary(EC2, probs = c(0.995, 0.999), digits = 12) tmp$risk ``` ### The `hist` function ``` hist(x, griddim=NULL, xlab=names(x), ylab="Frequency", main="", ...) ``` Provides histograms of each `mcnode` in the `mc` object. Note: variability and uncertainty are collapsed, so the histogram may be approximate. ```{r sh15, eval=FALSE} hist(EC2) ``` ```{r Function-hist, echo=FALSE, fig.cap="The `hist` function."} hist(EC2) ``` From `mc2d` version 0.2-0, `gghist` provides a `ggplot2`-based histogram, returning a `ggplot2` object that can be further customised with `+` operators. ### The `plot` function ``` plot(x, prec=0.001, stat=c("median","mean"), lim=c(0.025,0.25,0.75,0.975), na.rm=TRUE, griddim=NULL, xlab=NULL, ylab="Fn(x)", main="", draw=TRUE, paint=TRUE, ...) ``` Plots the empirical CDF with uncertainty envelope. The 0.25 and 0.75 quantiles (default `lim`) form the inner envelope. ```{r sh17, eval=FALSE} plot(EC2) ``` ```{r Function-plot, echo=FALSE, fig.cap="The `plot` function."} plot(EC2) ``` From `mc2d` version 0.2-0, `ggplotmc` provides a `ggplot2`-based version. `spaghetti` draws individual empirical CDF step-functions, one per uncertainty sample, instead of the summary envelope produced by `plot`. The `maxlines` argument caps the number of lines drawn. ``` spaghetti(x, griddim=NULL, xlab=names(x), ylab="F(n)", main="", maxlines=100, ...) ``` ```{r sh17ter, eval=FALSE} spaghetti(EC2) ``` ```{r Function-spaghetti, echo=FALSE, fig.cap="The `spaghetti` function."} spaghetti(EC2) ``` `ggspaghetti` provides the same plot via `ggplot2`. All `mcnode` objects support the same `print`, `summary`, `plot`, and `hist` methods. `ggplotmc`, `gghist`, and `ggspaghetti` on an `mcnode` allow post-processing of the graph. ### The `tornado` function ``` tornado(x, output=length(x), use="all.obs", method=c("spearman","kendall","pearson"), lim=c(0.025,0.975)) ``` Computes Spearman (default) rank correlation between nodes. `output` specifies the output node (default: last). `tornado` creates a `tornado` object with a `plot` method. ```{r sh19} torEC2 <- tornado(EC2) plot(torEC2) ``` ```{r Function-plottor, echo=FALSE, fig.cap="The `plot.tornado` function."} plot(torEC2) ``` From `mc2d` version 0.2-0, `ggtornado` provides a `ggplot2`-based version. ### The `tornadounc` function ``` tornadounc(mc, output=length(mc), quant=c(0.5,0.75,0.975), use="all.obs", method=c("spearman","kendall","pearson"), ...) ``` Examines the impact of uncertainty on an output estimate. Computes Spearman rank correlation between statistics of `mcnode` objects in the variability dimension. `quant` specifies which quantiles to use. ```{r sh19c} tornadounc(EC2, output = "risk", quant = .99) ``` The output shows the impact of uncertain `U` nodes and statistics (mean, median, 99th percentile) computed in the variability dimension of `VU` nodes. ### The `mcratio` function Provides variability, uncertainty, and combined measures \cite{OZKAYNAK-2009}. Given: - **A** = median of uncertainty for the median of variability - **B** = median of uncertainty for the 97.5th percentile of variability - **C** = 97.5th percentile of uncertainty for the median of variability - **D** = 97.5th percentile of uncertainty for the 97.5th percentile of variability Ratios: Variability = B/A; Uncertainty = C/A; Overall Uncertainty = D/A. ```{r sh19d} mcratio(risk) ``` ## Other Functions and `mc` Objects `mc` objects are simply lists of three-dimensional arrays. `$` extracts an `mcnode`. `unmc` removes attributes and collapses unit dimensions to return vectors, matrices, or arrays. ```{r sh19ter} tmp <- unmc(EC2, drop = TRUE) dimu <- ncol(tmp$risk) coef <- sapply(1:dimu, function(x) lm(tmp$risk[, x] ~ tmp$dose[, x])$coef) apply(coef, 1, summary) ``` # Multivariate Nodes {#sec:Multivar} The `nvariates` dimension is the third dimension of an `mcnode`. It is mandatory for multivariate distributions, and useful in other situations. Note that: ```{r sh19ter_b} mcstoc(runif, nvariates = 3, min = c(1, 2, 3), max = 4) ``` does **not** produce a node with 3 variates each having a different lower limit — the vector `c(1,2,3)` is recycled along the variability dimension (standard R behaviour). Use instead: ```{r sh19quatr} lim <- mcdata(c(1, 2, 3), type = "0", nvariates = 3) mcstoc(runif, nvariates = 3, min = lim, max = 4) ``` ## Multivariate Nodes for Multivariate Distributions The primary use of multivariate nodes is for multivariate distributions: Dirichlet, multinomial, multivariate normal, and empirical. **Example 1.** Three-member families each buy 500 g of ground beef. The proportions eaten by the baby, older brother, and mother follow a Dirichlet $(\alpha=(2,3,5))$ distribution. ```{r sh20} (p <- mcstoc(rdirichlet, type = "U", nvariates = 3, alpha = c(2, 3, 5))) s <- mcstoc(rmultinomial, type = "VU", nvariates = 3, size = 500, prob = p) summary(s) ``` **Example 2.** Each family member eats a normal distribution of steak (100, 150, 250 g mean) with positive correlation between the children's servings and negative with the mother's. ```{r sh21} sigma <- matrix(c(10, 2, -5, 2, 10, -5, -5, -5, 10), ncol = 3) (x <- mcstoc(rmultinormal, type = "V", nvariates = 3, mean = c(100, 150, 250), sigma = as.vector(sigma))) cor(x[, 1, ]) ``` Both `mean` and `sigma` can be variable or uncertain.[^12] Example with uncertain mean: [^12]: `rmultinormal` is a vectorized version of `rmvnorm` from `mvtnorm`. ```{r sh21b} m <- mcdata(c(100, 150, 250), type = "0", nvariates = 3) mun <- mcstoc(rnorm, type = "U", nvariates = 3, mean = m, sd = 20) x <- mcstoc(rmultinormal, type = "VU", nvariates = 3, mean = mun, sigma = as.vector(sigma)) cor(x[, 1, ]) ``` **Example 3.** Non-parametric bootstrap: 6 individuals eat 100 g, 12 eat 150 g, 6 eat 170 g, and 6 eat 200 g \cite{CULLEN-FREY-1999}. ```{r sh22} val <- c(100, 150, 170, 200) pr <- c(6, 12, 6, 6) out <- c("min", "mean", "max") (x <- mcstoc(rempiricalD, type = "U", outm = out, nvariates = 30, values = val, prob = pr)) mcstoc(rempiricalD, type = "VU", values = x) ``` The `outm` option controls which statistics to show: `"none"`, `"each"` (default), or a vector of function names applied across all 30 variates. ## Multivariate Nodes as a Third Dimension for Multiple Options Assume uncertainty in `conc`: 75% sure $conc \sim N(10,2)$, 25% sure $conc \sim U(8,12)$. Build a bivariate node and propagate both hypotheses simultaneously. ```{r sh23} conc1 <- mcstoc(rnorm, type = "U", mean = 10, sd = 2) conc2 <- mcstoc(runif, type = "U", min = 8, max = 12) conc <- mcdata(c(conc1, conc2), type = "U", nvariates = 2) cook <- mcstoc(rempiricalD, type = "V", values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) expo <- conc * cook * serving dose <- mcstoc(rpois, type = "VU", nvariates = 2, lambda = expo) r <- mcstoc(runif, type = "U", min = 0.0005, max = 0.0015) risk <- 1 - (1 - r)^dose EC5 <- mc(conc, risk) summary(EC5) ``` (Remember to specify `nvariates = 2` in `mcstoc` for `dose` — `mc2d` cannot infer it.) ## Multivariate Nodes as a Third Dimension for Multiple Contaminants Two contaminants with uncertain concentrations $N(10,2)$ and $N(14,2)$:[^13] [^13]: A correlation between contaminants could be specified via `rmultinormal`. ```{r sh24} mconc <- mcdata(c(10, 14), type = "0", nvariates = 2) conc <- mcstoc(rnorm, type = "U", nvariates = 2, mean = mconc, sd = 2) cook <- mcstoc(rempiricalD, type = "V", values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600)) serving <- mcstoc(rgamma, type = "V", shape = 3.93, rate = 0.0806) expo <- conc * cook * serving dose <- mcstoc(rpois, type = "VU", nvariates = 2, lambda = expo) dosetot <- mcapply(dose, margin = "variates", fun = sum) r <- mcstoc(runif, type = "U", min = 0.0005, max = 0.0015) risk <- 1 - (1 - r)^dosetot EC6 <- mc(conc, risk) summary(EC6) ``` # Another Example: A QRA of Waterborne Cryptosporidiosis in France Adapted from \cite{POUILLOT-2004}. Goal: evaluate the risk of infection with *Cryptosporidium parvum* from consumption of tap water given $n$ oocysts/100 l observed in a storage reservoir (simplified model). ## Tap Water Consumption Model ```{r Crypto1, echo=FALSE} library(mc2d) inca <- structure(c(0, 22.08, 60, 64.4, 72, 82.8, 90, 96, 100, 110, 120, 137.5, 144, 150, 160, 162.5, 165, 180, 182.5, 184, 192, 192.5, 200, 216, 220, 225, 230, 240, 250, 264, 270, 276, 288, 290, 300, 304, 312.8, 320, 322, 325, 330, 336, 340, 350, 360, 370, 375, 380, 384, 390, 400, 414, 420, 425, 430, 432, 432.5, 440, 442, 450, 456, 460, 460.8, 464, 470, 470.4, 480, 490, 500, 504, 510, 510.4, 516, 520, 525, 525.6, 528, 530, 540, 544, 550, 552, 560, 562, 565, 570, 576, 580, 582.5, 584, 585.6, 590, 596, 600, 606, 610, 614, 620, 625, 630, 635.4, 640, 648, 650, 656.2, 660, 664.4, 670, 670.4, 672, 675, 680, 682, 690, 696, 700, 710, 716, 720, 730, 730.4, 740, 744, 750, 756, 760, 774.8, 780, 784, 792, 796, 800, 810, 820, 828, 830, 840, 850, 850.4, 860, 864, 866.4, 870, 880, 890, 900, 908, 910, 915.2, 920, 930, 936, 950, 960, 970, 980, 984, 986.4, 990, 996, 1000, 1015.2, 1020, 1028, 1032, 1036, 1040, 1042, 1050, 1070, 1075, 1078.8, 1080, 1090, 1096, 1100, 1110, 1120, 1126.4, 1128, 1130, 1140, 1148, 1150, 1152, 1160, 1170, 1175, 1176.2, 1190, 1196, 1200, 1214, 1220, 1230, 1240, 1248, 1250, 1260, 1276, 1280, 1290, 1296, 1300, 1320, 1322, 1330, 1340, 1350, 1360, 1370, 1386.4, 1400, 1410, 1414, 1420, 1430, 1440, 1446, 1450, 1460, 1480, 1500, 1520, 1530, 1550, 1560, 1600, 1650, 1680, 1696, 1700, 1710, 1720, 1750, 1760, 1800, 1830, 1840, 1850, 1900, 1920, 1936, 1954, 1980, 1990, 2000, 2014, 2050, 2100, 2200, 2220, 2248, 2250, 2276, 2300, 2310, 2340, 2400, 2550, 2568, 2700, 2720, 2784, 2820, 2876, 3000, 3100, 3108, 3200, 2578, 7, 1, 8, 14, 3, 1, 1, 10, 1, 250, 1, 2, 120, 8, 6, 1, 5, 3, 12, 5, 5, 375, 2, 8, 7, 41, 408, 53, 4, 24, 7, 3, 2, 217, 1, 1, 44, 9, 1, 31, 1, 1, 17, 294, 5, 3, 9, 3, 12, 525, 5, 23, 1, 3, 4, 1, 28, 3, 154, 2, 5, 1, 2, 6, 1, 299, 8, 148, 1, 2, 1, 1, 8, 3, 1, 2, 14, 20, 1, 18, 2, 20, 6, 1, 8, 2, 8, 1, 1, 1, 4, 1, 487, 3, 5, 1, 7, 1, 5, 1, 24, 3, 17, 1, 42, 1, 2, 1, 1, 1, 16, 1, 3, 1, 30, 4, 1, 183, 4, 1, 5, 1, 141, 1, 14, 1, 12, 1, 2, 1, 206, 6, 2, 1, 4, 92, 10, 1, 5, 1, 3, 5, 5, 2, 87, 1, 1, 1, 5, 5, 4, 4, 78, 1, 3, 2, 1, 16, 1, 133, 1, 5, 1, 1, 1, 4, 1, 43, 1, 1, 1, 30, 1, 1, 7, 2, 6, 1, 1, 3, 3, 1, 10, 1, 5, 1, 1, 1, 1, 1, 159, 2, 1, 1, 10, 1, 16, 4, 2, 5, 3, 1, 3, 11, 4, 1, 2, 12, 5, 1, 1, 44, 3, 2, 1, 2, 17, 1, 4, 1, 1, 17, 1, 1, 3, 4, 18, 14, 4, 1, 2, 1, 1, 1, 2, 12, 1, 2, 1, 1, 1, 1, 1, 3, 1, 20, 1, 1, 1, 7, 1, 1, 3, 1, 2, 2, 1, 6, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2), .Dim = c(270L, 2L)) inca <- rep(inca[, 1], inca[, 2]) / 1000 ``` We have raw data on daily tap water consumption from 1,180 consumers (`inca`, histogram below). We could use the empirical distribution directly: ```{r Function-inca, echo=FALSE, fig.cap="Histogram of daily tap water intake."} hist(inca, xlab = "l/day", freq = FALSE, main = "") ``` ```{r Crypto3} ndvar(1001) ndunc(101) mcstoc(rempiricalD, type = "V", values = inca) ``` Instead, we use `fitdistrplus` \cite{FITDISTRPLUS}. The data include many zeros (days without tap water consumption). We model them as a mixture. ```{r Crypto4, fig.show='hide'} library(fitdistrplus) pzero <- sum(inca == 0) / length(inca) inca_non_0 <- inca[inca != 0] descdist(inca_non_0) ``` ```{r Function-descdist, echo=FALSE, fig.cap="Graph from `descdist`.", results='hide', message=FALSE} descdist(inca_non_0) ``` From the shape descriptor plot above, we try a lognormal distribution: ```{r Crypto6} Adj_water <- fitdist(inca_non_0, "lnorm", method = "mle") meanlog <- Adj_water$est[1] sdlog <- Adj_water$est[2] summary(Adj_water) ``` The fit is satisfactory. Uncertainty around the MLE estimates could be incorporated via `bootdist`: ```{r Crypto7bis, eval=FALSE} Boot <- bootdist(Adj_water, bootmethod = "param", niter = ndunc()) Mean_conso <- mcdata(Boot$estim$meanlog, type = "U") Sd_conso <- mcdata(Boot$estim$sdlog, type = "U") conso1 <- mcstoc(rlnorm, type = "VU", meanlog = Mean_conso, sdlog = Sd_conso) ``` For simplicity, we ignore that uncertainty and use `mcprobtree` to build the mixture: ```{r Crypto8} conso0 <- mcdata(0, type = "V") conso1 <- mcstoc(rlnorm, type = "V", meanlog = meanlog, sdlog = sdlog) v <- mcprobtree(c(pzero, 1 - pzero), list("0" = conso0, "1" = conso1), type = "V") summary(v) ``` ## The Dose-Response Model Bootstrap from data `datDR` \cite{CHAPPELL-1996}. A function `DR` with argument `n` is defined and passed to `mcstoc`: ```{r Crypto9} datDR <- list(dose = c(30, 100, 300, 500, 1000, 10000, 100000, 1000000), pi = c(2, 4, 2, 5, 2, 3, 1, 1), ni = c(5, 8, 3, 6, 2, 3, 1, 1)) estDR <- function(pos, ref) { suppressWarnings( -glm(cbind(ref$ni - pos, pos) ~ ref$dose + 0, binomial(link = "log"))$coefficients) } ml <- 1 - exp(-estDR(datDR$pi, datDR) * datDR$dose) DR <- function(n) { boot <- matrix(rbinom(length(datDR$dose) * n, datDR$ni, ml), nrow = length(datDR$dose)) apply(boot, 2, estDR, ref = datDR) } r <- mcstoc(DR, type = "U") summary(r) ``` ## The Model ```{r Crypto10} Rr <- mcstoc(rbeta, type = "U", shape1 = 2.65, shape2 = 3.64) w <- mcstoc(rbeta, type = "V", shape1 = 2.6, shape2 = 3.4) ``` Given $O_o = 2$ oocysts observed in 100 l, the expected number of oocysts in the sample `l`: ```{r Crypto11} Oo <- 2 l <- (Oo + mcstoc(rnbinom, type = "U", size = Oo + 1, prob = Rr)) / 100 ``` The expected number drunk by individuals and the risk ($\times 10000$): ```{r Crypto12} Or <- l * v * w P <- 10000 * (1 - exp(-r * Or)) summary(P) ``` This can be compared to Table 2 in \cite{POUILLOT-2004}. Results for $O_o = \{0,1,2,5,10,20,50,100,1000\}$ can be obtained in one step using: ```{r Crypto13, eval=FALSE} Oo <- mcdata(c(0, 1, 2, 5, 10, 20, 50, 100, 1000), type = "0", nvariates = 9) ``` # As a Conclusion {.unnumbered} We hope that `mc2d` will help risk assessors to construct and analyse their models, and to develop two-dimensional simulations. *Please report any bugs to [rpouillot\@yahoo.fr](mailto:rpouillot@yahoo.fr){.email}.*