\documentclass[article,nojss]{jss} \DeclareGraphicsExtensions{.pdf} \usepackage{amsmath,amssymb} \usepackage{textcomp}%has degree celsius \renewcommand{\textfraction}{0.2} \renewcommand{\bottomfraction}{0.7} \renewcommand{\topfraction}{0.7} \renewcommand{\floatpagefraction}{0.8} %Numbering of sections \setcounter{secnumdepth}{3} %Numbering within TOC \setcounter{tocdepth}{3} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% declarations for jss.cls % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \author{Thomas Petzoldt\\ Technische Universit\"at Dresden} \Plainauthor{Thomas Petzoldt} %% comma-separated \title{\pkg{simecol}-Howto: Tips, Tricks and Building Blocks} \Plaintitle{simecol-Howto: Tips, Tricks and Building Blocks} % \VignetteIndexEntry{Simecol-Howto: Tips, Tricks and Building Blocks} \Shorttitle{simecol-Howto} %% a short title (if necessary) %% an abstract and keywords \Abstract{ \flushleft %% workaround for the report style This document is a loose collection of chapters that describe different aspects of modelling and model implementation in \proglang{R} with the \pkg{simecol} package. It supplements the original publication of \citet{Petzoldt2007} from which an updated version, \textbf{simecol-introduction}, is also part of this package. Please refer to this publication when citing this work. } \Keywords{\proglang{R}, \pkg{simecol}, ecological modeling, object-oriented programming (OOP), compiled code, debugging} \Plainkeywords{R, simecol, ecological modeling, compiled code, debugging} \Address{ Thomas Petzoldt\\ Institut f\"ur Hydrobiologie\\ Technische Universit\"at Dresden\\ 01062 Dresden, Germany\\ E-mail: \email{thomas.petzoldt@tu-dresden.de}\\ URL: \url{https://tu-dresden.de/Members/thomas.petzoldt/}\\ } %% need no \usepackage{Sweave.sty} \SweaveOpts{engine=R, eps=FALSE, fig=FALSE, echo=TRUE, include=FALSE, prefix.string=howto, width=6, height=4} \begin{document} <>= library("simecol") data(lv, package = "simecol") options("width"=72) options("prompt" = "R> ", "continue" = "+ ") @ \tableofcontents %% ====================================================================== \section{The Basics} \subsection[Building simecol objects]{Building \pkg{simecol} objects} The intention behind \pkg{simecol} is the construction of ``all-in-one'' model objects. That is, everything that defines one particular model, equations and data are stored together in one \code{simObj} (spoken: sim-Object), and only some general algorithms (e.g. differential equation solvers or interpolation routines) remain external, preferably as package functions (e.g. function \code{lsoda} in the package \pkg{deSolve} \citep{deSolve_jss} or as functions in the user workspace. This strategy has three main advantages: \begin{enumerate} \item We can have several independent versions of one model in the computer memory at the same time. These instances may have different settings, parameters and data or even use different formula, but they do not interfere with each other. Moreover, if all data and functions are \emph{encapsulated} in their \code{simObj}ects, identifiers can be re-used and it is, for example, not necessary to keep track over a large number of variable names or to invent new identifiers for parameter sets of different scenarios. \item We can give \code{simObj}ects away, either in binary form or as source code objects. Everything essential to run such a model is included, not only the formula but also defaults for parameter and data. You, or your users need only \proglang{R}, some packages and your model object. It is also possible to start model objects directly from the internet or, on the other hand, to distribute model collections as \proglang{R} packages. \item All \code{simObj}ects can be handled, simulated and modified with the same generic functions, e.g. \code{sim}, \code{plot} or \code{parms}. Our users can start playing with the models without the need to understand all the internals. \end{enumerate} While it is, of course, preferable to have all parts of a model encapsulated in one object, it is not mandatory to have the complete working model object before starting to use \pkg{simecol}. \pkg{simecol} models (in the following called \code{simObj}ects) can be built step by step, starting with mixed applications composed by rudimentary \code{simObj}ects and ordinary user space functions. When everything works, you should encapsulate all the main parts of your model in the \code{simObj}ect to get a clean object that does not interfere with others. \subsubsection{An example} We start with the example given in the simecol-introduction \citep{Petzoldt2007}, an implementation of the UPCA model of \citet{Blasius1999}, but we write it in the usual \pkg{deSolve} style, i.e. without using \pkg{simecol}: <>= f <- function(x, y, k){x*y / (1+k*x)} # Holling II func <- function(time, y, parms) { with(as.list(c(parms, y)), { du <- a * u - alpha1 * f(u, v, k1) dv <- -b * v + alpha1 * f(u, v, k1) + - alpha2 * f(v, w, k2) dw <- -c * (w - wstar) + alpha2 * f(v, w, k2) list(c(du, dv, dw)) }) } times <- seq(0, 100, 0.1) parms <- c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006) y <- c(u=10, v=5, w=0.1) @ The model is defined by 5 variables in the \proglang{R} user workspace, namely \code{f}, \code{func}, \code{times}, \code{parms} and \code{init}. The implementation is similar to the help page examples of package \pkg{deSolve} and we can solve it exactly in the same manner: <>= library(deSolve) out <- lsoda(y, times, func, parms) matplot(out[,1], out[,-1], type="l") @ \begin{figure} \centering \includegraphics[width=\textwidth]{howto-upca2} \caption{\label{howto-upca1} Output of UPCA model, solved with \code{lsoda} from package \pkg{deSolve}.} \end{figure} \subsubsection{Transition to simecol} \label{s4definition} If we compare this example with the \pkg{simecol} structure, we may see that they are kind of similar. This obvious coincidence is quite natural, because the notation of both, \pkg{deSolve} and \pkg{simecol}, is based on the state-space notation of control theory\footnote{see \url{https://en.wikipedia.org/wiki/State_space_(controls)}, last access 2018-05-07}. Due to this, only small restructuring and renaming is needed to form a \code{simObj}: <>= library("simecol") f <- function(x, y, k){x*y / (1+k*x)} # Holling II upca <- new("odeModel", main = function(time, y, parms) { with(as.list(c(parms, y)), { du <- a * u - alpha1 * f(u, v, k1) dv <- -b * v + alpha1 * f(u, v, k1) + - alpha2 * f(v, w, k2) dw <- -c * (w - wstar) + alpha2 * f(v, w, k2) list(c(du, dv, dw)) }) }, times = seq(0, 100, 0.1), parms = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006), init = c(u=10, v=5, w=0.1), solver = "lsoda" ) @ We may notice that the assignment operators ``\code{<-}'' changed to a declarative equal sign ``\code{=}'' for the slot definitions, that some of the names (\code{y}, \code{func}) were changed to the pre-defined slot names of \pkg{simecol} and that all the slot definitions are now comma separated arguments of the \code{new} function that creates the \code{upca} object. The solver method \code{lsoda} is also given as a character string pointing to the original \code{lsoda} function in package \pkg{deSolve}. The new object can now be simulated very easily with the \code{sim} function of \pkg{simecol} that returns the object with all original slots and one additional slot \code{out} holding the output values. A generic \code{plot} function is also available for basic plotting of the output: <>= upca <- sim(upca) plot(upca) @ It is now also possible to extract the results from \code{upca} with a so called accessor function \code{out}, and to use arbitrary, user-defined plot functions: <>= plotupca <- function(obj, ...) { o <- out(obj) matplot(o[,1], o[,-1], type="l", ...) legend("topright", legend = c("u", "v", "w"), lty=1:3, , bg="white",col = 1:3) } plotupca(upca) @ %\begin{figure} %\centering %\includegraphics[width=\textwidth]{howto-upca5} %\caption{\label{howto-upca5} A user-defined plot function.} %\end{figure} Okay, that's it, but note that function \code{f} is not yet part of the simecol object, that's why we call here a ``mixed implementation''. This function \code{f} is rather simple here, but it would be also possible to call functions of arbitrary complexity from \code{main}. \subsubsection{Creating scenarios} After defining one \code{simecol} object (that we can call a parent object or a \textbf{prototype}), we may create derived objects, simply by copying (cloning) and modification. As an example, we create two scenarios with different parameter sets: <>= sc1 <- sc2 <- upca parms(sc1)["wstar"] <- 0 parms(sc2)["wstar"] <- 0.1 sc1 <- sim(sc1) sc2 <- sim(sc2) par(mfrow=c(1,2)) plotupca(sc1, ylim=c(0, 250)) plotupca(sc2, ylim=c(0, 250)) @ \begin{figure} \centering \includegraphics[width=\textwidth]{howto-upca6} \caption{\label{howto-upca6} Two scenarios of the UPCA model (left: wstar=0, right: wstar=0.1; functional response f is Holling II).} \end{figure} If we simulate and plot these scenarios, we see an exponentially growing $u$ in both cases, and cycles resp. an equilibrium state for $v$ and $w$ for the scenarios respectively (figure \ref{howto-upca6}). If we now also change the functional response function $f$ from Holling II to Lotka-Volterra: <>= f <- function(x, y, k){x * y} @ both model scenarios, \code{sc1} and \code{sc2} are affected by this new definition. <>= sc1 <- sim(sc1) sc2 <- sim(sc2) par(mfrow=c(1,2)) plotupca(sc1, ylim=c(0, 20)) plotupca(sc2, ylim=c(0, 20)) @ \begin{figure} \centering \includegraphics[width=\textwidth]{howto-upca8} \caption{\label{howto-upca8} Two scenarios of the UPCA model (left: wstar=0, right: wstar=0.1; functional response f is Holling II).} \end{figure} Now, we get a stable cycle for $u$ and $v$ in scenario 1 and an equilibrium for all state variables in scenario 2 (figure \ref{howto-upca8}). You may also note that the new function \code{f} has exactly the same parameters as above, including the, in the second case obsolete, parameter \code{k}. In the examples above, function \code{f} was an ordinary function in the user workspace, but it is also possible to implement such functions (or sub-models) directly as part of the model object. As one possibility, one might consider to define local functions within \code{main}, but that would have the disadvantage that such functions are not easily accessible from outside. To allow the latter, \pkg{simecol} has an optional slot ``equations'', that can hold a list of sub-models. Such an equations-slot can be defined either during object creation, or functions may be added afterwards. In the following, we derive two new clones with default parameter settings from the original \code{upca}-object, and then assign one version (the Holling II functional response) to scenario 1 and the other version (simple multiplicative Lotka-Volterra functional response) to scenario 2 (figure \ref{howto-upca9}): <>= sc1 <- sc2 <- upca equations(sc1)$f <- function(x, y, k){x*y / (1+k*x)} equations(sc2)$f <- function(x, y, k){x * y} sc1 <- sim(sc1) sc2 <- sim(sc2) par(mfrow=c(1,2)) plotupca(sc1, ylim=c(0, 20)) plotupca(sc2, ylim=c(0, 20)) @ \begin{figure} \centering \includegraphics[width=\textwidth]{howto-upca9} \caption{\label{howto-upca9} Two scenarios of the UPCA model (left: functional response \code{f} is Holling II, right functional response is Lotka-Volterra).} \end{figure} This method allows to compare models with different structures in the same way as scenarios with different parameter values. In addition, it is also possible to define model objects with different versions of \textbf{built-in} sub-models, that can be alternatively enabled: <>= upca <- new("odeModel", main = function(time, y, parms) { with(as.list(c(parms, y)), { du <- a * u - alpha1 * f(u, v, k1) dv <- -b * v + alpha1 * f(u, v, k1) + - alpha2 * f(v, w, k2) dw <- -c * (w - wstar) + alpha2 * f(v, w, k2) list(c(du, dv, dw)) }) }, equations = list( f1 = function(x, y, k){x*y}, # Lotka-Volterra f2 = function(x, y, k){x*y / (1+k*x)} # Holling II ), times = seq(0, 100, 0.1), parms = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006), init = c(u=10, v=5, w=0.1), solver = "lsoda" ) equations(upca)$f <- equations(upca)$f1 @ \subsubsection{Debugging} As stated before, all-in-one encapsulation of all functions and data in \code{simObj}ects has many advantages, but there is also one disadvantage, namely debugging. Debugging of \proglang{S4} objects is sometimes cumbersome, especially if slot-functions (e.g. \code{main}, \code{equations}, \code{initfunc}) come into play. These difficulties are not much important for well-functioning ready-made model objects, but they appear as an additional burden during model building, in particular if these models are technically not as simple as in our example. %\subsubsection{Method 1: Debug functions in the user workspace} Fortunately, there are easy workarounds. One of them is implementing the technically challenging parts in the user-workspace first using the above mentioned mixed style. Then, after developing and debugging the model and if everything works satisfactory, integrating the parts into the object is straightforward, given that you keep the general structure in mind. In the example below, we implement the main model as a workspace function \code{fmain}\footnote{Note that this function must never be named ``func'', for some rather esoteric internal reasons which we shall not discuss further here.} with the same interface (parameters and return values) as above, which is then called by the \code{main}-function of the \code{simObj}: <>= f <- function(x, y, k){x*y / (1+k*x)} # Holling II fmain <- function(time, y, parms) { with(as.list(c(parms, y)), { du <- a * u - alpha1 * f(u, v, k1) dv <- -b * v + alpha1 * f(u, v, k1) + - alpha2 * f(v, w, k2) dw <- -c * (w - wstar) + alpha2 * f(v, w, k2) list(c(du, dv, dw)) }) } upca <- new("odeModel", main = function(time, y, parms) fmain(time, y, parms), times = seq(0, 100, 0.1), parms = c(a=1, b=1, c=10, alpha1=0.2, alpha2=1, k1=0.05, k2=0, wstar=0.006), init = c(u=10, v=5, w=0.1), solver = "lsoda" ) @ This function \code{fmain} as well as any other submodels like \code{f} can now be debugged with the usual \proglang{R} tools, e.g. \code{debug}: <>= debug(fmain) upca <- sim(upca) @ Debugging can be stopped by \code{undebug(fmain)}. If everything works, you can add the body of \code{fmain} to \code{upca} manually, and it is even possible to do this in the formalized \code{simecol} way of object modification: <>= main(upca) <- fmain # assign workspace function to main slot equations(upca)$f <- f # assign workspace function to equations rm(fmain, f) # optional, for saving memory and avoiding confusion str(upca) # show the object @ Now, we can delete \code{f} and \code{fmain} to have a clean workspace with only the necessary objects. %\subsubsection{Method 2: Extract functions temporarily} \subsection{Different ways to store simObjects} One of the main advantages of \pkg{simecol} is, that model objects can be made persistent and that it is easy to distribute and share \code{simObj}ects over the internet. The most obvious and simple form is, of course, to use the original source code of the objects, i.e. the function call to \code{new} with all the slots which creates the \proglang{S4}-object (see section \ref{s4definition}), but there are also other possibilities. \pkg{simecol} objects can be saved in machine readable form as \proglang{S4}-object binaries with the \code{save} method of \proglang{R}, which stores the whole object with all its equations, initial values, parameters etc. and also the simulation output if the model was simulated before saving. <>= save(upca, file="upca.Rdata") # persistent storage of the model object load("upca.Rdata") # load the model @ Conversion of the \proglang{S4} object to a list is another possibility, that yields a representation that is readable by humans \textbf{and} by \proglang{R}: <<>>= l.upca <- as.list(upca) @ This method allows to get an alternative text representation of the \code{simObj}, that can be manipulated by code parsing programs or dumped to the hard disk: <<>>= dput(l.upca, file="upca_list.R") @ and this is completely reversible via: <<>>= l.upca <- dget("upca_list.R") upca <- as.simObj(l.upca) @ Sometimes it may be useful to store simObjects in an un-initialized form, in particular if they are to be distributed in packages. Let's demonstrate this again with a simple Lotka-Volterra model. In the first step, we define a function, that returns a \code{simecol} object: <>= genLV <- function() { new("odeModel", main = function (time, init, parms) { x <- init p <- parms dx1 <- p["k1"] * x[1] - p["k2"] * x[1] * x[2] dx2 <- - p["k3"] * x[2] + p["k2"] * x[1] * x[2] list(c(dx1, dx2)) }, parms = c(k1=0.2, k2=0.2, k3=0.2), times = c(from=0, to=100, by=0.5), init = c(prey=0.5, predator=1), solver = "lsoda" ) } @ Now, the function contains the instruction, how \proglang{R} can create a new instance of such a model. The \code{simecol} object is not created yet, but a call to the creator function can bring it to live: <>= lv1 <- genLV() plot(sim(lv1)) @ This style is used in package \pkg{simecolModels}\footnote{\pkg{simecolModels} can be downloaded from the R-Forge server, \url{https://simecol.r-forge.r-project.org/}.}, a collection of (mostly) published ecological models. %\subsection{The initfunc} % basics already explained in the manual % avoiding re-initialization %\subsection{Using alternative solver packages} %\pkg{PBSddeSolve}-example \dots % use a time delayed differential model as example %user-defined solvers and assignment of functions to the solver-slot \dots %\subsection{Using vectorized equations} \subsection{Methods to work with S4 objects} The \proglang{S4} scheme includes several utility functions which can be used to inspect objects and their methods. As an example, \code{showMethods} can be used to list all different functions, that are available for one method (here \code{sim}), depending on the object types involved: <>= showMethods("sim") @ Based on this information, it is now possible to inspect the source code of the particular method, e.g. the \code{sim}-function for differential equation models (class \code{odeModel}): <>= getMethod("sim", "odeModel") @ In addition to this, \proglang{R} has several other functions to inspect or manipulate objects, e.g. \code{hasMethod}, \code{findMethod}, or \code{setMethod}, please see the documentation of these functions for details. % hasMethod("sim", "odeModel") % findMethod("sim", "odeModel") % setMethod %% ====================================================================== \section{Fitting Parameters} The preferred method to obtain model parameters is direct measurement of process rates. This can be done in own controlled experiments or by taking results from the literature. However, not all processes are accessible by direct measurement. In such cases process parameters can be identified indirectly by calibration from observed data. \subsection{Example model and data} In order to demonstrate parameter fitting we use a simple ODE model with two coupled differential equations: \begin{align} \frac{dX}{dt} &= \mu X - D X \\ \frac{dS}{dt} &= D (S_0 - S) - \frac{1}{Y} \mu X \\ \intertext{with Monod functional response:} \mu &= v_m \frac{S}{k_m + S} \\ \intertext{and:} X &= \text{abundance or biomass concentration} \nonumber\\ S &= \text{substrate concentration} \nonumber\\ D &= \text{dilution rate} \nonumber\\ Y &= \text{yield constant (i.e. conversion factor between S and X)} \nonumber\\ D &= \text{dilution rate} \nonumber\\ mu &= \text{growth rate} \nonumber\\ v_m &= \text{maximum growth rate} \nonumber\\ k_m &= \text{half saturation constant} \nonumber \end{align} \subsection{Load the model and assign a solver} The chemostat model is one of the standard examples in package \pkg{simecol}, that can be loaded like a data set. In the following, instead of the default solver \code{lsoda} an explicit solver \code{adams} can be used, because the problem is not stiff. However, package \pkg{deSolve} only contains dedicated functions for the most common solvers like \code{lsoda} or \code{rk4}, while other solvers like \code{adams} (for non-stiff) or \code{bdf} (for stiff systems) are specified by calling \code{ode} with an appropriate \code{method}-option. <>= data(chemostat) solver(chemostat) # shows which solver we have ## assign an alternative solver solver(chemostat) <- function(y, times, func, parms, ...) { ode(y, times, func, parms, method = "adams", ...) } @ Now, we create two working copies \code{cs1} and \code{cs2} for further usage: <>= cs1 <- cs2 <- chemostat @ Let's also assume we have the following test data: <>= obstime <- seq(0, 20, 2) yobs <- data.frame( X = c(10, 26, 120, 197, 354, 577, 628, 661, 654, 608, 642), S = c(9.6, 10.2, 9.5, 8.2, 6.4, 4.9, 4.2, 3.8, 2.5, 3.8, 3.9) ) @ Note that in this example, \code{yobs} contains two columns (and only two) with exactly the same column names as the state variables. \subsection{Parameter estimation in simecol} Package \pkg{simecol} has a basic function for fitting parameters of ODE models (\code{fitOdeModel}) built in. This function is a wrapper that uses existing optimization functions of \proglang{R}, currently \code{nlminb} with the ``PORT'' algorithm, \code{newuoa} and \code{bobyqa} from package \pkg{minqa} \citep{Powell2009,minqa} and \code{optim} with ``Nelder-Mead'', ``BFGS'', ``CG'', ``L-BFGS-B'' and ``SANN'' \cite[cf.][and others]{Rcore2015,Nash1990}. Among these only ``PORT'', ``bobyqa'' and ``L-BFGS-B'' support constrained optimization natively, whereas for the others \code{fitOdeModel} tries to emulate box constraints (more or less successfully) using an arctan-transformation (see \code{p.constrain}). In order to save computation time it is suggested to use an efficient variable time step algorithm (e.g. \code{adams} od \code{bdf}, depending on the system) and to set external time steps of the model to the time steps contained in the observational data: <>= times(cs1) <- obstime @ For the most basic call to the parameter fitting function we need only the model object, the time steps and the observational data: <>= res <- fitOdeModel(cs1, obstime = obstime, yobs=yobs) @ In this case, \textbf{all} parameters are fitted by least squares between \textbf{all} state variables and their corresponding observations. The start values for parameter estimation are taken from the \code{simObj}ect. Sum of squares of the individual state variables is weighted by the inverse standard deviation of the observations and the Nelder-Mead algorithm is used by default. Many options are available to control the parameter fitting, e.g. to fit only a subset of parameters (\code{whichpar}), to control the amount of information displayed (\code{debuglevel}, \code{trace}), weighting of state variables (\code{sd.yobs}) or individual data points (\code{weights}), or the algorithm used (\code{method}). In the following, we fit only $v_m$, $k_m$ and $Y$ with the PORT algorithm, that is in many cases faster than the other methods, especially if the range of parameters is known (constrained optimization, e.g. to avoid negative values for $k_m$). It as also a good idea to assign reasonable start values to the \code{simObj}ect. For interactive use it is recommended to set \code{trace=TRUE}. <>= whichpar <- c("vm", "km", "Y") lower <- c(vm=0, km=0, Y=0) upper <- c(vm=100, km=500, Y=200) parms(cs1)[whichpar] <- c(vm=5, km=10, Y=100) res <- fitOdeModel(cs1, whichpar = whichpar, lower = lower, upper=upper, obstime = obstime, yobs = yobs, method = "PORT", control=list(trace = FALSE)) @ The results are now stored in a list structure \code(res), from which parameters, objective (sum of squares) or information about convergence can be extracted: <>= res @ Future versions of \pkg{simecol} will probably contain specific functions to extract this information in a more user-friendly style. The success of parameter estimation can now be controlled numerically and graphically. First we assign the new parameters to a copy of the model object: <>= parms(cs2)[whichpar] <- res$par @ We get $r^2$ for both state variables with: <>= times(cs2) <- obstime ysim <- out(sim(cs2)) 1 - var(ysim$X - yobs$X) / var(yobs$X) 1 - var(ysim$S - yobs$S) / var(yobs$S) @ Note that time steps in data and model must be the same. However, for producing a smooth figure it is recommended to use a smaller time step. We will need this figure also for further examples, so it makes sense to define a function: <>= plotFit <- function() { times(cs2) <- c(from=0, to=20, by=.1) ysim <- out(sim(cs2)) par(mfrow=c(1,2)) plot(obstime, yobs$X, ylim = range(yobs$X, ysim$X)) lines(ysim$time, ysim$X, col="red") plot(obstime, yobs$S, ylim= range(yobs$S, ysim$S)) lines(ysim$time, ysim$S, col="red") } plotFit() @ \begin{figure} \centering \includegraphics[width=\textwidth]{howto-fit1} \caption{\label{howto-fit1} Observed data and fitted model.} \end{figure} \subsection{Estimation of initial values} In the example above, we assumed that the initial values are were known and in fact, we used the built-in initial values from the default example. In many cases, however, initial values must be estimated from observed data because they are either completely unknown or known with a certain error. In order to estimate initial values from data, we have to add them to the list of parameters in a technical sense. <<>>= parms(cs1) <- c(parms(cs1), init(cs1)) parms(cs1) @ The second step is to assign these parameters back to the vector of initial values, at the beginning of a new simulation, i.e. to initialize the model object with new start values determined by the optimization algorithm before simulation and calculation of sum of squares. For such purposes, \pkg{simecol} objects have a special function slot \code{initfunc} and the only thing we have to do is to assign an appropriate function which copies the appropriate values from \code{parms} to \code{init}. <<>>= initfunc(cs1) <- function(obj) { init(obj) <- parms(obj)[c("X", "S")] obj } @ Note that initfunc gets an object as input which is then modified and returned. Note also, that number and order of initial values must be consistent with \code{main}. <>= whichpar <- c("vm", "km", "X", "S") lower <- c(vm=0, km=0, X=0, S=0) upper <- c(vm=100, km=500, X=100, S=100) parms(cs1)[whichpar] <- c(vm=10, km=10, X=10, S=10) res <- fitOdeModel(cs1, whichpar = whichpar, lower = lower, upper=upper, obstime = obstime, yobs = yobs, method = "Nelder", control=list(trace = FALSE)) @ Assigning fitted parameters and a new simulation now results in: <>= initfunc(cs2) <- initfunc(cs1) parms(cs2)[whichpar] <- res$par plotFit() @ Note that the model object \code{cs2} should have a copy of the \code{initfunc} too, otherwise it is necessary to assign the initial values manually. \begin{figure} \centering \includegraphics[width=\textwidth]{howto-fit2} \caption{\label{howto-fit2} Observed data and fitted model ($k_m$, $v_m$ and initial values were estimated).} \end{figure} \subsection{Scaling and weighting} Appropriate scaling of \textbf{observational variables} is crucial. On one hand, scaling is necessary because variables may be of different order of magnitude or have different measurement units (comparing apples and oranges), but on the other hand estimated parameters heavily depend on this decision. Weighting is related to scaling. It may be required to weight \textbf{individual data points}, depending on their accuracy or the number of measurements they rely on. A third type of scaling is sometimes necessary for numerical reasons if \textbf{parameters} have very different order of magnitude. Some algorithms (e.g. PORT) try to do this automatically but sometimes even for this, an optional scaling argument is required. \subsection{Scaling of variables} The default scaling method used in \pkg{simecol} is division by the standard deviation of observational values (per variable, i.e. per column), but sometimes it is necessary to provide user-specified scaling. Depending on the question and kind of the data, different assumptions may be appropriate, for example mean, median, range, an appropriate conversion constant or kind of ``expert judgement'' that helps to make state variables comparable. The following example uses default scaling (standard deviations of observations): <>= whichpar <- c("vm", "km") parms(cs1)[whichpar] <- c(vm = 5, km = 2) res <- fitOdeModel(cs1, whichpar = whichpar, obstime = obstime, yobs = yobs, method = "Nelder") res$value res$par @ and in the second example, we set scaling for both variables to one (i.e. divide both variables by 1). This means that no scaling is applied at all and the original dimensions remain: <<>>= res <- fitOdeModel(cs1, whichpar = whichpar, obstime = obstime, yobs = yobs, method = "Nelder", sd.yobs = c(1, 1)) res$value res$par @ It is obvious that the resulting sum of squares and also the estimated parameters of these two approaches are different. %% ToDo: %% - make assignment the scaling factors to variables more robust. %% (e.g. named vector) %% - rename sd.yobs ??? \subsection{Weighting of observations} Weighting of observations (i.e. per row or per individual value) can be useful in different circumstances, e.g. if measurements have different precision, if variance is not constant over the range of observations (violation of homoscdasticity) or if data points represent multiple measurements. Let's assume that we want to downweight the last three values (9\dots 11) for any reason to one third, we simply define a data frame with the same structure as the observational data (yobs) and assign appropriate weights: <<>>= weights <- data.frame(X = rep(1, nrow(yobs)), S = rep(1, nrow(yobs))) weights[9:11,] <- 1/3 res <- fitOdeModel(cs1, whichpar = whichpar, obstime = obstime, yobs = yobs, method = "Nelder", weights = weights) res$value res$par @ \subsection{Scaling of parameters} If the parameters to be fitted have very different order of magnitude (e.g. one is 0.1 and some other is $10^7$), then it is possible that optimization fails due to numerical problems. One possibility to avoid this is to reformulate the problem that way, that the size of parameters do not differ ``not too much''. Depending on the algorithm used, it may be also possible to let the optimizer do this rescaling of parameters, e.g. via argument \code{scale} for the PORT algorithm or with \code{control = list(parscale = ...)} for the algorithms in \code{optim}. Please consult the original help pages for details. Normally, the PORT algorithm (in function \code{nlminb}) does automatic rescaling, but if required scaling by $scale = 1/par_{max}$ may help (see \url{http://netlib.bell-labs.com/cm/cs/cstr/153.pdf}): % == thpe 2021-09-22 bell-labs link outdated == <<>>= res <- fitOdeModel(cs1, whichpar = whichpar, obstime = obstime, yobs = yobs, method = "PORT", scale = 1/c(vm = 2, km = 5)) res$value res$par @ \subsection{Parameter estimation with FME} \pkg{FME} \cite[Flexible Modeling Environment,][]{Soetaert2010} is a package that contains functions to perform complex applications of models, consisting of differential equations, especially fitting models to data, sensitivity analysis and Markov chain Monte Carlo. The essential principles of model fitting with \pkg{FME} are quite similar to the above, including fine-tuning of the underlying optimization algorithms. However, in contrast to \code{fitOdeModel}, \pkg{FME}'s \code{modFit} is \textbf{much more powerful}. It allows to use several additional optimizers and gives more detailed output, especially standard errors and covariances (correlations) of parameter estimates. Other advantages are the flexible way to define own minimization criteria (cost functions), the existence of summary functions to extract the outputs and the availability of an advanced Markov chain Monte Carlo (MCMC) algorithm. In the following we give a full example to demonstrate its use with the same model and data as above. We use the chemostat model again, together with the test data set. The user defined cost function (\code{Cost}) contains simulation and comparison between simulated and observed data (function \code{modCost}) as last statement in the function. Note that \code{modCost} requires that both simulated and observed data contain a \code{time} column. This is already available in \code{ysim} (returned by the solver), but we have to add it also to \code{yobs}. As above, appropriate scaling (resp. weighting) of variables is also an important point. Here, setting \code{weight="std"} does the same as the default \code{sd.yobs} in the former example, other possibilities are described in the respective online help pages. %% workaround to omit minpack.lm start messages <>= fme <- require(FME) @ <<>>= library(FME) library(simecol) data(chemostat) cs1 <- chemostat obstime <- seq(0, 20, 2) yobs <- data.frame( X = c(10, 26, 120, 197, 354, 577, 628, 661, 654, 608, 642), S = c(9.6, 10.2, 9.5, 8.2, 6.4, 4.9, 4.2, 3.8, 2.5, 3.8, 3.9) ) Cost <- function(p, simObj, obstime, yobs) { whichpar <- names(p) parms(simObj)[whichpar] <- p times(simObj) <- obstime ysim <- out(sim(simObj)) modCost(ysim, yobs, weight="std") } yobs <- cbind(time=obstime, yobs) Fit <- modFit(p = c(vm=10, km=10), f = Cost, simObj=cs1, obstime=obstime, yobs=yobs, method="Nelder", control=list(trace=FALSE)) @ Setting \code{trace = TRUE} in interactive sessions helps to observe how minimization proceeds. Moreover, it is of course also possible to use other optimization algorithms or to constrain the parameters within sensible ranges. The output can now be extracted with appropriate methods: <<>>= summary(Fit) deviance(Fit) coef(Fit) @ You may also test the following, but we omit its output here to save space: <>= residuals(Fit) df.residual(Fit) plot(Fit) @ %% ====================================================================== \section{Implementing Models in Compiled Languages} Compilation of model code can speed up simulations considerably and there are several ways to call compiled code from \proglang{R}; so it is possible to use functions written in \proglang{C/C++} or \proglang{Fortran} in the ordinary way described in the ``Writing R Extensions'' manual \citep{Rexts2006}. This can speed up computations, but still a certain amout of communication overhead is needed because the control is given back to \proglang{R} in every simulation step. In addition to this basic method, it is also possible to enable direct communication between integration routines and the model code if both are available in compiled languages and if the direct call of a compiled model is supported by the integrator. All integrators of package \pkg{deSolve} support this, see the \pkg{deSolve} documentation for details. \subsection{An Example in C} Now, let's inspect an example. We firstly provide our model as described in the \pkg{deSolve} vignette ``Writing Code in Compiled Language'', here again the Lotka-Volterra-model: \begin{verbatim} /* file: clotka.c */ #include static double parms[3]; #define k1 parms[0] #define k2 parms[1] #define k3 parms[2] /* It is possible to define global variables here */ static double aGlobalVar = 99.99; // for testing only /* initializer: same name as the dll (without extension) */ void clotka(void (* odeparms)(int *, double *)) { int N = 3; odeparms(&N, parms); Rprintf("model parameters succesfully initialized\n"); } /* Derivatives */ void dlotka(int *neq, double *t, double *y, double *ydot, double *yout, int *ip) { // sanity check for the 2 'additional outputs' if (ip[0] < 2) error("nout should be at least 2"); // derivatives ydot[0] = k1 * y[0] - k2 * y[0] * y[1]; ydot[1] = k2 * y[0] * y[1] - k3 * y[1]; // the 2 additional outputs, here for demo purposes only yout[0] = aGlobalVar; yout[1] = ydot[0]; } \end{verbatim} Using \verb!#define! macros are a typical \proglang{C}-trick to get readable names for the parameters. This method is simple and efficient and of course, there are more elaborate possibilities. One alternative is using dynamic variables, another is doing call-backs to \proglang{R}. The \proglang{C} code can now be compiled into a so-called shared library (on Linux) or a DLL on Windows, that can be linked to \proglang{R}. Compilation requires an installed \proglang{C} compiler (\pkg{gcc}) and some other tools that are quite standard on Linux, and which are also available for the Macintosh or, form of the \pkg{R-Tools} collection\footnote{\url{https://cran.r-project.org/bin/windows/Rtools/}} provided by Duncan Murdoch for Windows. If the tools are installed, compilation can be done directly from \proglang{R} with: <>= system("R CMD SHLIB clotka.c") @ The result, a shared library or DLL, can now be linked to the current \proglang{R} session with \code{dyn.load}, that we show here for Windows, and which is quite similar for Linux \cite[see][, Writing R Extensions for details]{Rexts2006}. Note that you set the working directory of \proglang{R} to the path where the DLL resides or use the full path in the call to \code{dyn.load}. <>= modeldll <- dyn.load("clotka.dll") @ You can now call the derivatives \code{dlotka} of the model in the \code{main} function of a \pkg{simecol}-object. \subsection{Enabling Direct Communication Between Model and Solver} Another, more efficient way, is to tell the solver (e.g. \code{lsoda}) directly where to find the model in the DLL. This method circumvents the communication overhead that occurs normally for every call from the solver to the model DLL and is especially effective if small models are called many times, e.g. in case of small time steps or if a model is embedder in an optimization routine. The trick consists of two parts: \begin{enumerate} \item We write an almost empty \code{main} function that returns all the information that the ODE solver needs in form of a list, \item Instead of putting a character reference to an existing solver function into the \code{solver} slot (e.g. \code{"lsoda"}) we write a user-defined interface to the solver and assign it to the solver-slot as shown in the example. \end{enumerate} Now, we can simulate our model as usual, but avoid interpretation and communication overhead of \proglang{R} during the integration. \begin{Schunk} \begin{Sinput} clotka <- new("odeModel", ## note that 'main' does not contain any equations directly ## but returns information where these can be found ## 'nout' is the number of 'additional outputs' main = function(time, init, parms) { # a list with: dllname, func, [jacfunc], nout list(lib = "clotka", func = "dlotka", jacfunc = NULL, nout = 2) }, ## parms, times, init are provided as usual, enabling ## scenario control like for 'ordinary' simecol models parms = c(k1=0.2, k2=0.2, k3=0.2), times = c(from=0, to=100, by=0.5), init = c(prey=0.5, predator=1), ## special solver function that evaluates funclist ## and passes its contents directly to the lsoda ## in the 'compiled function' mode solver = function(init, times, funclist, parms, ...) { f <- funclist() as.data.frame(lsoda(init, times, func=f$func, parms = parms, dllname = f$lib, jacfunc=f$jacfunc, nout = f$nout, ...) ) } ) clotka <- sim(clotka) ## the two graphics on top are the states ## the other are additional variables returned by the C code ## (for demonstration purposes here) plot(clotka) ## Another simulation with more time steps times(clotka)["to"] <- 1000 plot(sim(clotka)) ## another simulation with intentionally reduced accuracy ## for testing plot(sim(clotka, atol=1)) dyn.unload(as.character(modeldll[2])) \end{Sinput} \end{Schunk} You should note a considerable speed-up and you may ask if this is still a \pkg{simecol} object, because the main parts are now in \proglang{C} and you may also ask, why one should still write models in \proglang{R} if \proglang{C} or \proglang{FORTRAN} are so much faster. The answer is that speed of computation is not the only factor. What counts is a good compromise between execution speed and programming effort. Programming in scripting languages like \proglang{R} is much more convenient than programming in compiled languages like \proglang{C} or \proglang{FORTRAN}. Also, programming in compiled languages does only pay its effort required if models are quite large or if a large number of model runs is performed. Even in such cases, a mixed \proglang{R} \textbf{and} \proglang{C} approach can be efficient, because it is only necessary to implement the core functionality of the model in \proglang{C} and most of data manipulation and scenario control can be done in \proglang{R}. \pkg{simecol} follows exactly this philosophy. Implementing everything in \proglang{R} is highly productive if speed is of minor importance, but you \textbf{may} use \proglang{C} etc. whenever necessary, and even in that case you still have the scenario management and data manipulation features of \pkg{simecol}. %%======================================================================= \section{Troubleshooting} This chapter lists the reasons of the most common problems and error messages. You can contribute to these sections by submitting bug reports. \subsection{Error messages when creating simObj-ects} \begin{description} \item[Message:] \begin{verbatim} Error in getClass(Class, where = topenv(parent.frame())) : "odeModel" is not a defined class \end{verbatim} Instead of \code{odeModel} also other class names are possible. \item[Reasons:] The most commmon reason is to forget loading the required \pkg{simecol} package. Another reason may be, that you use a model class that is not contained in simecol (e.g. mistakenly \code{odemodel} instead of \code{odeModel}). \item[Solution:] Load the \pkg{simecol} package by \code{library(simecol)}. If this does not help, check spelling of the class name in \code{new}. \end{description} \begin{description} \item[Message:] \begin{verbatim} Error in assign(nm, L[[nm]], p) : attempt to use zero-length variable name \end{verbatim} \item[Solution:] Use ``\code{=}'' instead of ``\code{<-}'' for the lists (e.g. function definition in equations) and function arguments. \end{description} \begin{description} \item[Message:] \texttt{"Warning: a final empty element has been omitted"} \item[Solution:] Look for obsolete commas after the last element in lists (e.g. params). \end{description} \subsection{Error messages during model simulations} \begin{description} \item[Message:] \begin{verbatim} Error in lsoda(...... : The number of derivatives returned by func() (3) must equal the length of the initial conditions vector (2) \end{verbatim} \item[Reasons:] The number of state variables of ODE systems must be consistent between the input and the output of the derivative function that is called \code{func} in package \code{deSolve} from which this error message is printed. In \pkg{simecol} this function is called \code{main}. The other solvers of \pkg{deSolve} (e.g. \code{rk4}) or other compatible solver packages may issue similar messages. \item[Solution:] Check number of parms and also naming of: \begin{itemize} \item the \code{init} slot of the model definition, \item the usage of the second argument (commonly named \code{init}, \code{x} or \code{y}) in the \code{main} function, \item the return value of \code{main}. The return value is a list whose first argument must be a vector with the same length as \code{init}. Note that also the order of \code{init} and the return value must be identical. \end{itemize} \end{description} %%======================================================================= %\bibliographystyle{jss} \phantomsection \addcontentsline{toc}{chapter}{References} \bibliography{simecol} % clean up <>= options("prompt" = "> ", "continue" = "+ ") @ \end{document}