---
title: "Tutorial"
output:
  html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Tutorial}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r,include=FALSE, echo=FALSE}
knitr::opts_chunk$set(fig.width = 7,
                      fig.height = 4,
                      # cache = TRUE,
                      warning = FALSE)
```


```{r load_package, echo=FALSE, results='hide', message=FALSE}
library(morseTKTD)
```

The package `morse` is devoted to the analysis of data from standard toxicity
tests. It provides a simple workflow to explore/visualize a data set, and
compute estimations of risk assessment indicators. This document illustrates
a typical use of `morse` on survival and reproduction data, which can be followed
step-by-step to analyze new data sets.

# Survival analysis using a toxicokinetic-toxicodynamic (TKTD) model: GUTS

The steps for a TKTD data analysis are absolutely analogous to what we
described for the analysis at target time.
Here the goal is to estimate the relationship
between chemical compound concentration, time and survival rate using the GUTS models.
GUTS, for General Unified Threshold models of Survival, is a TKTD models generalising most of existing mechanistic models for survival description.
For details about GUTS models, see the vignette *Models in 'morse' package*, and the included references.

## GUTS model with constant exposure concentrations

Here is a typical session to analyse concentration-dependent time-course data
using the so-called "Stochastic Death" (SD) model:

```{r TKTDcst}
# (1) load data set
data(propiconazole)

# (2) check structure and integrity of the data set
survDataCheck(propiconazole)

# (3) create a `survData` object
PRZcst <- survData(propiconazole)

# (4) represent the number of survivors as a function of time
plot(PRZcst)
```

### GUTS model "SD"

To fit the *Stochastic Death* model, we have to specify the `model_type` as `"SD"`:

```{r fitcstSD, echo=TRUE, warning=FALSE}
# (6) fit the TKTD model SD
fit_cstSD <- fit(PRZcst, model_type = "SD", refresh = 0)
```


A function allows to simulated within priors density function:

```{r priorsParams}
in_param <- priors_distribution(fit_cstSD)
head(in_param)
```

Once fitting is done, we can compute posteriors vs. priors distribution with the function `plot()` on
a `PriorPosterior` object as follow:

```{r fitcst_pp1}
# prior/posterior estimates
pp_cstSD = priorPosterior(fit_cstSD)
head(pp_cstSD)
```

Or having the plot:

```{r plot_pp_cstSD}
# plot of parameters estimates
plot(pp_cstSD)
```

The `plot()` function provides a representation of the fitting for each replicates

```{r plot_fit_cstSD}
plot(fit_cstSD)
```

Original data can be removed by using the option `adddata = FALSE`

```{r plot_fit_cstSD_withData}
plot(fit_cstSD, add_data = FALSE)
```

A posterior predictive check is also possible using function `ppc()`:

```{r plot_ppc_cstSD}
# PPC_fit_cstSD <- ppc(fit_cstSD)
# plot(PPC_fit_cstSD)
```

#### Extract `Nsurv`

We can extract the number of survival `Nsurv` estimated by the inference process.

To do so, there is two function, either `extract_Nsurv_ppc` or
`extract_Nsurv_sim`.

```{r extractNsurvPPC}
extNsurvPPC_cstSD <- extract_Nsurv_ppc(fit_cstSD)
head(extNsurvPPC_cstSD)
```

```{r extractNsurvSIM}
extNsurvSIM_cstSD <- extract_Nsurv_sim(fit_cstSD)
head(extNsurvSIM_cstSD)
```

#### Fix background mortality

You can fix the background mortality, parameter `hb`.

```{r fitcstSDFIXhb, warning=FALSE}
# fit the TKTD model SD with fixed hb value
fit_cstSDFIXhb <- fit(PRZcst, model_type = "SD", hb_value = 0.2, refresh = 0)
```


```{r plotcstSDFIXhb_PrintVal}
plot(fit_cstSDFIXhb)
```

Using the `priorPosterior` function, you can recover the summary of each parameters:

```{r summarycstSDFIXhbSummary}
pp_cstSDFIXhb = priorPosterior(fit_cstSDFIXhb)
head(pp_cstSDFIXhb[pp_cstSDFIXhb$pp == "posterior", ])
```

### GUTS model "IT"

The *Individual Tolerance* (IT) model is a variant of TKTD survival
analysis. It can also be used with `morse` as demonstrated hereafter. For the *IT* model, we have to specify the `model_type` as `"IT"`:

```{r fitcstIT}
fit_cstIT <- fit(PRZcst, model_type = "IT", refresh = 0)
```

And the plot of posteriors vs. priors distributions:

```{r PLOTpriors_post_cstIT}
PP_cstIT <- priorPosterior(fit_cstIT)
plot(PP_cstIT)
```

```{r plot_fit_cstIT}
plot(fit_cstIT)
```

```{r ppc_fit_cstIT}
ppc_cstIT = ppc(fit_cstIT)
head(ppc_cstIT)
```

So we can plot the PPC:

```{r plot_ppc_fit_cstIT}
# plot(ppc_cstIT)
```

## Computing prediction

GUTS models can be used to simulate the survival of the organisms under any exposure pattern, using the
calibration done with  function `survFit()` from observed data.
The function for prediction is called `predict()` and returns an object of class `SurvFitPredict`.

```{r predict}
# (1) upload or build a data frame with the exposure profile
# argument `replicate` is used to provide several profiles of exposure
data_4prediction <- data.frame(time = c(1:10, 1:10),
                               conc = c(c(0,0,40,0,0,0,40,0,0,0),
                                        c(21,19,18,23,20,14,25,8,13,5)),
                               replicate = c(rep("pulse", 10), rep("random", 10)))

# (2) Use the fit on constant exposure propiconazole with model SD (see previously)
predict_PRZ_cstSD_4pred <- predict(fit_cstSD, data_4prediction)
```

If `NA` are produced an `error` message is returned.

From an object `survFitPredict`, results can ben plotted with function `plot()`:

```{r predictPlot}
# (3) Plot the predicted survival rate under the new exposure profiles.
plot(predict_PRZ_cstSD_4pred)
```

```{r predictPlotBackground}
# (3) Plot the predicted survival rate under the new exposure profiles.
plot(predict_PRZ_cstSD_4pred, background_concentration = TRUE)
```

```{r predictPlotBackgroundLegend}
# (3) Plot the predicted survival rate under the new exposure profiles.
plot(predict_PRZ_cstSD_4pred, background_concentration = TRUE, add_legend = TRUE)
```

#### Computing `Nsurv` on prediction

We can compute the number of survival based on the initial number of individuals
 by using the `df_mcmc` return by a `SurvPredict` object.
 
To do so, you have to give the initial number of intidivuals.

For instance:

```{r MCMCPredicExternal, warning=FALSE}
Nsurv_predict_PRZ_cst4pred = compute_Nsurv(predict_PRZ_cstSD_4pred, Ninit = 10)
Nsurv_predict_PRZ_cst4pred$df_quantile
```

### Removing background mortality in predictions

While the model has been estimated using the background mortality parameter `hb`, it can be interesting to see the prediction without it.
This is possible with the argument `hb_value`. If `TRUE`, the background mortality is taken into account, and if `FALSE`, the background mortality is set to $0$ in the prediction.

```{r hb_value}
# Use the same data set profile to predict without 'hb'
predict_PRZ_cstSD_4pred_hbOUT <- predict(
  fit = fit_cstSD,
  display.exposure = data_4prediction,
  hb_value = 0)
# Plot the prediction:
plot(predict_PRZ_cstSD_4pred_hbOUT)
```


```{r hb_valueFIX2}
# Use the same data set profile to predict without 'hb'
predict_PRZ_cstSD_4pred_hbFIX2 <- predict(
  fit = fit_cstSD,
  display.exposure = data_4prediction,
  hb_value = 2)
# Plot the prediction:
plot(predict_PRZ_cstSD_4pred_hbFIX2)
```

## Validation criteria: EFSA recommendations

Following EFSA recommendations, the next functions compute qualitative and quantitative model performance criteria suitable for GUTS, and TKTD modelling in general: the percentage of observations within the 95% credible interval of the Posterior Prediction Check (PPC), the Normalised Root Mean Square Error (NRMSE) and the Survival-Probability Prediction Error (SPPE).

**PPC**

The PPC compares the predicted median numbers of survivors associated to their uncertainty limits with the observed numbers of survivors. This can be visualised by plotting the predicted versus the observed values and counting how frequently the confidence/credible limits intersect with the 1:1 prediction line [see previous plot]. Based on experience, PPC resulting in less than 50% of the observations within the uncertainty limits indicate poor model performance.


**Normalised Root Mean Square Error NRMSE**

NRMSE criterion is also based on the expectation that predicted and observed survival numbers matches the 1:1 line in a scatter plot. The criterion is based on the classical root-mean-square error (RMSE), used to aggregate the magnitudes of the errors in predictions for various time-points into a single measure of predictive power. In order to provide a criterion expressed as a percentage, it is suggested using a normalised RMSE by the mean of the observations.


\[
NRMSE = \frac{RMSE}{\overline{Y}} = \frac{1}{\overline{Y}} \sqrt{\frac{1}{n} \sum_{i=1}^{n} (Y_{obs,i} - Y_{pred,i})^2} \times 100
\]

**Survival Probability Prediction Error (SPPE)**

The SPPE indicator is negative (between 0 and -100%) for an underestimation of effects, and positive (between 0 and 100%) for an overestimation of effects. An SPPE value of 0% means an exact prediction of the observed survival probability at the end of the experiment.

\[
SPPE = \left( \frac{Y_{obs, t_{end}}}{Y_{init}} - \frac{Y_{pred, t_{end}}}{Y_{init}} \right) \times 100 =  \frac{Y_{obs, t_{end}} - Y_{pred, t_{end}}}{Y_{init}}  \times 100
\]

### Computing predictions with number of survivors

For *NRMSE* and *SPPE*, we need to compute the number of survivors. To do so, we use the function  `predict_Nsurv()` where two arguments are required: the first argument is a `survFit` object, and the other is a data set with four columns (`time`, `conc`, `replicate` and `Nsurv`). Contrary to the function `predict()`, here the column `Nsurv` is necessary. 

```{r cstTOcst, warning=FALSE}
predict_psurv_PRZ_SD_cstTOcst <- predict(fit_cstSD, propiconazole)
compute_Nsurv_PRZ_SD_cstTOcst <- compute_Nsurv(
  predict_psurv_PRZ_SD_cstTOcst,
  Ninit = compute_Ninit(propiconazole)
)
head(predict_psurv_PRZ_SD_cstTOcst
     $df_quantile)
```

```{r cstTOvar, warning=FALSE}
data(propiconazole_pulse_exposure)
predict_psurv_PRZ_SD_cstTOvar <- predict(fit_cstSD, propiconazole_pulse_exposure)
predict_Nsurv_PRZ_SD_cstTOvar <- compute_Nsurv(
  predict_psurv_PRZ_SD_cstTOvar,
  Ninit = compute_Ninit(propiconazole)
)
head(predict_Nsurv_PRZ_SD_cstTOvar$df_quantile)
```

### Validation criteria

Then, using object produce with the function `compute_Nsurv()` we can compute *PPC*, *NRMSE* and *SPPE* for all models. 

```{r checkNsurvPRED_1}
# compute_check(compute_Nsurv_PRZ_SD_cstTOvar)
```

### Plot and PPC of predict_Nsurv objects

```{r plotPredict_Nsurv}
# plot(predict_Nsurv_PRZ_SD_cstTOvar)
```

When ploting a PPC for a `survFitPredict_Nsurv` object,
3 types of lines are represented (following EFSA recommendations).
- A plain line corresponding to the 1:1 line ($y=x$): prediction match perfectly with observation when dots are on this line. 
- A band of dashed lines corresponding to the range of 25% deviation.
- A band of dotted lines corresponding to the range of 50% deviation.

```{r ppcPredict_Nsurv}
# ppc(predict_Nsurv_PRZ_SD_cstTOvar)
```

### Additional options for EFSA user

Following the naming of parameters in the EFSA Scientific Opinion (2018), which
differs from our naming of parameters, we add an option to be in agreement with
EFSA.

Several names of parameters are used in the TKTD GUTS models. The 'R-package' 
`morse`, and more specifically since the GUTS implementation, several name of 
parameters have been used.

For stability reason of algorithms and package, we do not change parameters name
in implemented algorithms. However, we added argument `EFSA_name` to use EFSA 
naming in the functions `priors_distribution()` providing the distributions of 
priors (note: distributions of posteriors are obtained with `$mcmc` element of
a `survFit`object) and `plot_prior_post()` plotting priors distributions versus 
posteriors distributions.

For instance:

```{r nameEFSA_SD}
# head(priors_distribution(fit_cstSD, EFSA_name = TRUE))
# plot_prior_post(fit_cstSD, EFSA_name = TRUE)
```

```{r nameEFSA_IT}
# head(priors_distribution(fit_cstIT, EFSA_name = TRUE))
# plot_prior_post(fit_cstIT, EFSA_name = TRUE)
```

## Lethal concentration

Compared to the target time analysis, TKTD modelling allows to compute and plot the lethal concentration for any *x* percentage and at any time-point. The chosen time-point can be specified with `time_LCx`, by default the maximal time-point in the data set is used.

```{r cstSDLCx}
# LC50 at the maximum time-point:
LCx_cstSD <- lcxt(fit_cstSD, x = 0.5)
plot(LCx_cstSD)

# LC50 at time = 2
lcxt(fit_cstSD, x = 0.5, t = 2) |> plot()
## Note the use of the pipe operator, `|>`, which is a powerful tool for clearly expressing a sequence of multiple operations.
## For more information on pipes, see: http://r4ds.had.co.nz/pipes.html
```

Warning messages are returned when the range of concentrations is not appropriated for one or more LCx calculation(s). 

```{r cstSDLCx_3015}
# LC50 at time = 15
lcxt(fit_cstSD, x = 0.5, t = 15) |> plot()
```

```{r cstITLCx}
# LC50 at the maximum time-point:
LCx_cstIT <- lcxt(fit_cstIT, x = 0.5)
plot(LCx_cstIT)

# LC50 at time = 2
lcxt(fit_cstIT, x = 0.5, t = 2) |> plot()

# LC30 at time = 15
lcxt(fit_cstIT, x = 0.3, t = 15) |> plot()
```

## Multiplication factors: 'margin of safety'

Using prediction functions, GUTS models can be used to simulate the survival rate of organisms exposed to a given exposure pattern.
In general, this realistic exposure profile does not result in any related mortality, but a critical question is to know how far the exposure profile is from adverse effect, that is a "margin of safety".

This idea is then to multiply the concentration in the realistic exposure profile by a "multiplication factor", denoted $MF_x$, resulting in $x\%$ (classically $10\%$ or $50\%$) of additional death at a specified time (by default, at the end of the exposure period).

The multiplication factor $MF_x$ then informs the "margin of safety" that could be used to assess if the risk should be considered as acceptable or not.

Computing an $MF_x$ is easy with function `MFx()`. It only requires object `survFit` and the exposure profile, argument `data_predict` in the function. The chosen percentage of survival reduction is specified with argument `X`, the default is $50$, and the chosen time-point can be specified with `time_MFx`, by default the maximal time-point in the data set is used. 

There is no explicit formulation of $MF_x$ (at least for the GUTS-SD model), so the `accuracy` argument can be used to change the accuracy of the convergence level.

```{r MFx_compt}
# (1) upload or build a data frame with the exposure profile
data_4MFx <- data.frame(time = 1:10,
                        conc = c(0,0.5,8,3,0,0,0.5,8,3.5,0),
                        replicate = "A")

# (2) Use the fit on constant exposure propiconazole with model SD (see previously)
MFx_PRZ_cstSD_4MFx <- lpxt(fit = fit_cstSD, display.exposure = data_4MFx)
```

As the computing time can be long, the function prints the `accuracy` for each step of the tested multiplication factor, for the median and the 95% credible interval.

To compare the initial survival rate (corresponding to a multiplication factor set to 1) with the survival rate at the asked multiplication factor leading to a reduction of $x\%$ of survival (provided with argument `x`).

```{r MFx_plot}
# (3) Plot the survival rate as function of the multiplication factors.
plot(MFx_PRZ_cstSD_4MFx)
```

Then, we can plot the survival rate as a function of the tested multiplication factors. Note that it is a linear interpolation between tested multiplication factor (cross dots on the graph).

```{r MFx_plotMfX}
# (3 bis) Plot the  binary-search of MFx
plot(MFx_PRZ_cstSD_4MFx, plot  = "MFx")
```

As indicated, the warning message just remind you how multiplication factors and linear interpolations between them have been computed to obtain the graph.

What is provided with the function `plot()` is direclty accessible within the object of class `MFx`:

```{r print_df_MFx}
MFx_PRZ_cstSD_4MFx$df_q50
```

### Change the level of multiplication factor

Here is an other example with a 10 percent Multiplication Factor:

```{r MFx_x10}
# (2 bis) fit on constant exposure propiconazole with model SD (see previously)
MFx_PRZ_cstSD_4MFx_x10 <- lpxt(fit = fit_cstSD, x = 0.1, display.exposure = data_4MFx)
plot(MFx_PRZ_cstSD_4MFx_x10)
```

```{r MFx_x10plot2}
plot(MFx_PRZ_cstSD_4MFx_x10, plot = "MFx")
```


### Change the default time of multiplication factor assessment

Multiplication factor is also available for the GUTS IT model (option `quiet = TRUE` remove the output):

```{r MFx_IT}
# (2) Use the fit on constant exposure propiconazole with model IT. No print of run messages.
MFx_PRZ_cstIT_4pred <- lpxt(
  fit = fit_cstIT, x = 0.5, t = 4,
  display.exposure = data_4MFx)

# (3) Plot the survival rate versus multiplication factors.
plot(MFx_PRZ_cstIT_4pred)

# (4) Plot the serching
plot(MFx_PRZ_cstIT_4pred, plot = "MFx")
```

This last example set a reduction of $10\%$ of the survival rate, remove the
background mortality by setting `hb_value = FALSE` and is computed at time `time_MFx = 4`.

```{r MFx_ITplot}
# (2) Use the fit on constant exposure propiconazole with model IT. No print of run messages.
MFx_PRZ_cstIT_4pred <- lpxt(fit = fit_cstIT, x = 0.1, t = 4, display.exposure = data_4MFx)
# 
plot(MFx_PRZ_cstIT_4pred)
# 
plot(MFx_PRZ_cstIT_4pred, plot = "MFx")
```