\documentclass{article} \setlength{\parindent}{0pt} % Eliminate the indent at the beginning of a new paragraph %\setcounter{secnumdepth}{0} % Elimate the section numbering starting from a specific depth (see WikiBook) \usepackage[sort]{natbib} % Bibliography \usepackage{fixltx2e} % Fix some errors \usepackage{graphicx} % To manage external pictures \usepackage{float} % Improves float environment and force the placement figures \usepackage{caption} % customise the captions in floating environments \usepackage{subcaption} % To add subfigures within figures, with labels (see WikiBooks) \usepackage{verbatim} % To improve the verbatim environment, fixing some bugs. \usepackage[colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue]{hyperref} % Manage cross-references and hyperlinks \usepackage{amssymb,amsbsy,amsmath} % Packages for maths \usepackage{bm} % Allow use of bold greek letters in math mode using the \bm{} command. \usepackage{setspace} % Allow doublespacing %\usepackage{epsfig} % Don't remember!!! %\usepackage{fullpage} % Standardized smaller margins for the page \usepackage[left=3cm,top=3cm,bottom=3.5cm,right=3cm]{geometry} % For easy management of document margins \usepackage{fancyhdr} % To customize the header/footer (see WikiBooks) %\usepackage{rotating} % To rotate any objects \numberwithin{equation}{section} % Equation numbers relative to sections %-------------------------%%-------------------------% % \VignetteIndexEntry{Extension of the dlnm package} % \VignettePackage{dlnm} % \VignetteDepends{dlnm} % \VignetteKeyword{Distributed lag non-linear models} \newcommand{\Robj}[1]{{\texttt{#1}}} \newcommand{\Rfun}[1]{{\texttt{#1()}}} \newcommand{\Rdata}[1]{{\texttt{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\Rcomm}[1]{{\textsl{\texttt{#1}}}} \newcommand{\Rpkg}[1]{{\textsf{#1}}} \newcommand{\Rclass}[1]{{\emph{"#1"}}} \newcommand{\Rmethod}[1]{{\texttt{#1()}}} \newcommand{\Rarg}[1]{{\texttt{#1}}} \newcommand{\R}{{\textsf{R}}} \newcommand{\vign}[1]{{\textsc{#1}}} \newcommand{\PM}{{PM\textsubscript{10}}} \newcommand{\ozone}{{O\textsubscript{3}}} \newcommand{\microg}{{$\mu$gr/m\textsuperscript{3}}} \newcommand{\Ctemp}{{$^{\circ}$C}} \begin{document} \SweaveOpts{concordance=TRUE} \SweaveOpts{prefix.string=fig,include=F,keep.source=T,eps=FALSE} <>= options(continue=" ") set.seed(13041975) @ @% TO ELIMINATE THE "+" IN CONSECUTIVE SCRIPT LINES \title{Extensions of the \Rpkg{dlnm} package} \author{Antonio Gasparrini\\ \emph{London School of Hygiene \& Tropical Medicine, UK} } \date{\Rpkg{dlnm} version \Sexpr{packageDescription("dlnm")[["Version"]]} , \Sexpr{Sys.Date()} } \maketitle \tableofcontents \setcounter{footnote}{1} \footnotetext{This document is included as a vignette (a \LaTeX\ document created using the \R{} function \Rfun{Sweave}) of the package \Rpkg{dlnm}. It is automatically downloaded together with the package and can be simply accessed through \R{} by typing \Rcomm{vignette("dlnmExtended")}.} \newpage \setlength{\parskip}{4pt} % Space between paragraph %-------------------------%%-------------------------%%-------------------------% \section{Preamble} \label{sec:preamble} This vignette \vign{dlnmExtended} illustrates recent extensions to the \R{} package \Rpkg{dlnm}, some of them implementing development of the modelling framework of distributed lag linear and non-linear models (DLMs and DLNMs). Primarily, this document describes the generalization of the DLM/DLNM methodology beyond time series data, described in more detail in \citet{gasparrini2014statmed}. In addition, this vignette illustrates other developments, specifically the definition of extended prediction summaries, the flexible application of existing or user-defined functions, and a more general use of the functions for regression analysis. The results included in this document are not meant to represent scientific findings, but are reported with the only purpose of illustrating the capabilities of the \Rpkg{dlnm} package. A general overview of functions included in the package, with information on its installation and a brief summary of the DLNM methodology are included in the vignette \vign{dlnmOverview}, which represents the main documentation of \Rpkg{dlnm}. The user can refer to that vignette for a general introduction to the package. Please send comments or suggestions and report bugs to \href{mailto:antonio.gasparrini@lshtm.ac.uk}{\texttt{antonio.gasparrini@lshtm.ac.uk}}. %-------------------------%%-------------------------%%-------------------------% \section{Data} \label{sec:data} These extensions of the software are illustrated mainly through two examples, using the data sets \Robj{drug} and \Robj{nested} included as data frames objects in the package. In particular, these data are ideal for illustrating the main development presented in this vignette, namely the extension of the modelling framework beyond time series. These data sets contain simulated data from an hypothetical trial on a drug and a nested case-control study, respectively, both including measures of time-varying exposures. They are described in the related help pages, available by typing \Rcomm{help(drug)} or \Rcomm{help(nested}), and in the main vignette \vign{dlnmOverview}. After loading the package in the \R{} session, let's have a look at the first three observations of the data frame \Robj{drug}: <>= library(dlnm) head(drug, 3) @ The data set contains data from a trial, with records for 200 randomized subjects, each receiving doses of a drug for two out of four random weeks, with daily doses varying each week. The exposure level is reported on 7-day intervals corresponding to each week. The data set contains also information on the outcome measured on the 28\textsuperscript{th} day and the sex of the subject. The second data frame \Robj{nested} includes one record for each of 300 cancer cases and 300 controls matched by age. The first four observations are: <>= head(nested, 4) @ The variable \Robj{case} defines the case/control status, while other variables report the age of the subject and the risk set he/she belongs to. The time-varying occupational exposure profiles are stored in the variables \Robj{exp15}--\Robj{exp60}, corresponding to average yearly exposure experienced in age intervals 15--19, 20--24 and so on up to 65 years. Note how the fourth subject, a control sampled at the age of 52, has the exposure profile set to \Robj{NA} from age 55 on. %-------------------------%%-------------------------%%-------------------------% \section{The matrix of exposure histories} \label{sec:Qmatrix} The main difference between the extended and standard DLNM framework is the definition of a \emph{matrix of exposure histories}, namely the series of exposures experienced at lag $\ell$ for each of the $n$ observations, with $\ell=\ell_0,\ldots,L$ and $\ell_0$ and $L$ as minimum and maximum lag, respectively. This $n \times (L-\ell_0+1)$ matrix needs to be put together in different ways depending on the study design and the information available on the time-varying exposure. The same process applies to time series data, although in this case the matrix is reconstructed internally from the vector of exposure series. In time series data the value for the entry $[t,\ell]$ of the matrix of exposure histories is equal to the entry at $[t+1,\ell+1]$, due to the ordered nature of time series data. This correspondence does not apply any more in the extended framework, as the exposure histories for two observations can be completely unrelated. In the first example, I build the matrix of exposure histories for the trial data in the data frame \Robj{drug} (see Section~\ref{sec:data}). The exposure profile for each subject is used to reconstruct the matrix of exposure histories. In this case, the exposure at lag 0 corresponds to that experienced on the 28\textsuperscript{th} day when the outcome is measured for all the subjects. The rest of the exposure history is traced backward up to lag 27, corresponding to exposure in the first day. This is a simple code to expand and reverse the exposure profiles stored by week into a matrix of daily exposure histories: <>= Qdrug <- as.matrix(drug[,rep(7:4, each=7)]) colnames(Qdrug) <- paste("lag", 0:27, sep="") Qdrug[1:3,1:14] @ The exposure histories for lag 0--13 are reported above for the first three subject. The first seven lags (0--6) correspond to exposures during the last week, while lags 7--13 correspond to the third week, and so on. In a second example, I reconstruct the matrix of exposure histories for the data frame \Robj{nested} using the exposure profiles stored in 5-year intervals. These data are expanded to a matrix of exposure histories over lag 3--40, with lag unit equal to a year. However, in this case the computation is more complex, as each subject is sampled at a different age. Specifically, the exposure history is computed backward along the exposure profile starting from the age of the subject. This step requires some additional computation and data manipulation. The function \Rfun{exphist}, which derives an exposure history at a given time of an exposure profile, may be of help: <>= Qnest <- t(apply(nested, 1, function(sub) exphist(rep(c(0,0,0,sub[5:14]), each=5), sub["age"], lag=c(3,40)))) colnames(Qnest) <- paste("lag", 3:40, sep="") Qnest[1:3,1:11] @ The exposure histories for lag 3--13 are reported above for the first three subject. The first subject, sampled at the age of 81, is assumed to experience the exposure at lag 0 between 80 and 81, the exposure at lag 1 between 79 and 80, and so on. As his/her last exposure is at age 65, the exposure history up to lag 13 is set to 0. The second subject, sampled at the age of 69, has the exposure history set to 0 for lag 3, corresponding to the exposure event at 66, and then to 10 for lags 4--8 and 16 for lags 9--13, corresponding to exposure experienced at age periods 60--64 and 55--59, respectively. These exposure histories are consistent with the exposure profiles and age shown in Section~\ref{sec:data}. An example of computation of the matrix of exposure histories for time-to-event analysis using cohort data is illustrated in the code provided as supplementary material in \citet{gasparrini2014statmed}. In that case, multiple exposure histories are computed for each subject at the times he/she contributed to different risk sets, using the same exposure profile. In general, the computation of this matrix depends on study design, information on exposure, lag unit and desired level of approximation. This prevents the definition of functions in the \Rpkg{dlnm} package applicable for this purpose. Nonetheless this issue represents the only additional computational step for using the extended DLNM methodology beyond time series analysis. As shown in the next sections, the use of the functions and interpretation of the results are mostly identical to the standard applications in time series data illustrated in the vignette \vign{dlnmTS}. %-------------------------%%-------------------------%%-------------------------% \section{Applications beyond time series} \label{sec:beyondts} %-------------------------%%-------------------------% \subsection{A simple DLM} \label{sec:dlm} In this first example, I analyse the temporal dependency between the daily doses of a drug and an unspecified health outcome, applying the functions in the \Rpkg{dlnm} package to the data set \Robj{drug}. Specific information on the use of the functions is provided in the related help pages and the vignette \vign{dlnmOverview}. The first step is the definition of a cross-basis function and the derivation of a cross-basis matrix. This is obtained through the function \Rfun{crossbasis}: <>= cbdrug <- crossbasis(Qdrug, lag=27, argvar=list("lin"), arglag=list(fun="ns",knots=c(9,18))) @ The results is stored in the object \Robj{cbdrug}, namely a matrix of transformed variables with special attributes. The first unnamed argument \Rarg{x}, differently from the original applications in time series described in the vignette \vign{dlnmTS}, is the matrix of exposure histories. However, the rest of the syntax is identical. The argument \Rarg{lag} specifies the lag period, with minimum lag placed by default at 0. The lag period must be consistent with the dimension (\emph{i.e.} number of columns) of the matrix \Robj{Qdrug}. The arguments \Rarg{argvar} and \Rarg{arglag} define the exposure-response and lag-response functions, respectively, chosen here as a simple linear function and a natural cubic spline with knots at lag 9 and 18. An intercept is included by default in the lag-response function if not otherwise stated. Given the linearity assumption, this can be technically defined as a DLM. See \Rcomm{?crossbasis} for a complete list of options and additional details. A summary of the transformation can be obtained by the method function \Rfun{summary} for objects of class \Rclass{crossbasis}: <>= summary(cbdrug) @ The matrix \Robj{cbdrug} can be included in the formula of a regression model, in this case a simple linear model assuming a Gaussian distribution, controlling for the effect of sex. This simplified approach does not consider any inter-subject variability, which is beyond the scope of this illustrative example. The estimated exposure-lag-response association can be interpreted by predicting specific effect summaries through the function \Rfun{crosspred}: <>= mdrug <- lm(out~cbdrug+sex, drug) pdrug <- crosspred(cbdrug, mdrug, at=0:20*5) @ The function \Rfun{crosspred} accepts the cross-basis matrix and the related model object as the first two arguments. The argument \Rarg{at}, if provided as a numeric vector, determines the predictor levels at which predictions should be computed. The reference value, not directly defined here, is set by default to 0 for the function \Rfun{lin}. The effect summaries are saved in the object \Robj{pdrug} of class \Rclass{crosspred}, from which can be extracted: <>= with(pdrug,cbind(allfit,alllow,allhigh)["50",]) @ The code above extracts the estimate for the \emph{overall cumulative effects} associated with an exposure to 50, interpreted using two perspectives: either as the overall increase in the outcome after a constant exposure to 50 sustained throughout the lag period of 28 days (\emph{backward} perspective), or as the sum of the contributions of an exposure to 50 in the next 28 days (\emph{forward} perspective). The 95\% confidence intervals are also included, with the confidence level that can be changed with the argument \Rarg{ci.level} in \Rfun{crosspred}. Alternatively, specific combinations of exposure levels and lag values can be extracted from different effect summaries with prefix \Rcode{mat-} stored in \Robj{pdrug}: <>= pdrug$matfit["20","lag3"] @ This is interpreted as the increase in the outcome associated with an intake of a dose level of 20 three days earlier. See \Rcomm{?crosspred} for a full list of the predicted effect summaries. Alternatively, the \Rfun{plot} methods for objects of class \Rclass{crosspred} can be used to generate graphs: <>= plot(pdrug, zlab="Effect", xlab="Dose", ylab="Lag (days)") plot(pdrug, var=60, ylab="Effect at dose 60", xlab="Lag (days)", ylim=c(-1,5)) plot(pdrug, lag=10, ylab="Effect at lag 10", xlab="Dose", ylim=c(-1,5)) @ <>= plot(pdrug, zlab="Effect", xlab="Dose", ylab="Lag (days)") @ <>= plot(pdrug, var=60, ylab="Effect at dose 60", xlab="Lag (days)", ylim=c(-1,5)) @ <>= plot(pdrug, lag=10, ylab="Effect at lag 10", xlab="Dose",ylim=c(-1,5)) @ \begin{figure} \centering \caption{} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdrug3d.pdf} \label{fig:plotdrug3d} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdruglag.pdf} \label{fig:plotdruglag} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdrugvar.pdf} \label{fig:plotdrugvar} \end{subfigure} \label{fig:plotdrug} \end{figure} The first line of code produces the graph in Figure~\ref{fig:plotdrug3d}, namely the bi-dimensional exposure-lag-response association estimated by the regression model, showing how the effect varies across the range of dose and lag values. This type of graph is obtained by leaving the argument \Rarg{ptype} unselected, thus choosing the default value \Rcode{"3d"}. The graphs suggests that the effect of a dose of the drug is pronounced in the first days after the intake and then tends to disappear after 15-20 days. The second and third lines of code produce the graphs in Figures~\ref{fig:plotdruglag}--\ref{fig:plotdrugvar}, respectively, showing the lag-response curve specific to exposure 60 and the exposure-response curve specific to lag 10. The shape of these curves depends on the specific choices for the basis functions selected for producing the cross-basis \Robj{cbdrug}. In particular, the lag-response curve in Figure~\ref{fig:plotdruglag} indicates an exponential decay in the effects. This type of graphs represents slices cut in the 3-D surface of Figure~\ref{fig:plotdrug3d}, with the argument \Rarg{ptype} set by default to \Rcode{"slices"} if one of the argument \Rarg{var} or \Rarg{lag} is specified. Additional argument such as \Rarg{xlab} and \Rarg{ylim} are internally passed to \Rarg{plot.default} to control the graphical parameters. See \Rarg{?plot.default} and \Rcomm{?par} for a complete list, and generally \Rarg{?plot.crosspred} for using plotting functions. %-------------------------%%-------------------------% \subsection{A more complex DLNM} \label{sec:dlnm} In a second example, I assess how protracted exposures to an occupational agent affect the risk of occurrence of cancer, using the data set \Robj{nested}. The steps of the analysis are the same illustrated in Section~\ref{sec:dlm}. An initial assumption is that the exposures sustained in the last three years, corresponding to lag 0--2, are not affecting the risk of occurrence of cancer. Consistently with this assumption, the matrix of exposure history \Robj{Qnest} has been derived for lag period 3--40. The cross-basis matrix can therefore be created by: <>= cbnest <- crossbasis(Qnest, lag=c(3,40), argvar=list("bs",degree=2,df=3), arglag=list(fun="ns",knots=c(10,30),intercept=F)) @ The chosen basis functions are a quadratic spline for the dimensions of predictor and a natural cubic spline for lags. In the former, only the number of degrees of freedom are chosen, and the single knot is placed by default at the median. Note that in the spline for the lag-response the intercept is excluded, so the function is forced to predict a null effect at the beginning of the lag period, consistently with the assumption above. The command \Rcomm{summary(cbnest)} can show additional details. The cross-basis objects can be included again in the formula of a regression model. Compatibly with the nested case-control design, a conditional logistic regression is performed through the function \Rfun{clogit} included in the package \Rpkg{survival}, which needs to be loaded into the session. Effect summaries are then predicted. The code is: <>= library(survival) mnest <- clogit(case~cbnest+strata(riskset), nested) pnest <- crosspred(cbnest, mnest, cen=0, at=0:20*5) @ Note how in this case the centering value must be selected directly through the argument \Rarg{cen}, as no straightforward reference exist for non-linear functions such as \Rfun{bs}. Similarly to the previous example, estimates of the effect summaries can be extracted from the object \Robj{pnest}, although this time using the objects with prefix \Rcode{allRR-} and \Rcode{matRR-} which store the exponentiated predictions in the scale of OR (see \Rcomm{?crosspred}). The same types of graphs displayed in Figure~\ref{fig:plotdrug} are obtained by: <>= plot(pnest, zlab="OR", xlab="Exposure", ylab="Lag (years)") plot(pnest, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40)) plot(pnest, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15)) @ <>= plot(pnest, zlab="OR", xlab="Exposure", ylab="Lag (years)") @ <>= plot(pnest, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40)) @ <>= plot(pnest, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15)) @ \begin{figure} \centering \caption{} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnest3d.pdf} \label{fig:plotnest3d} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnestlag.pdf} \label{fig:plotnestlag} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnestvar.pdf} \label{fig:plotnestvar} \end{subfigure} \label{fig:plotnest} \end{figure} The 3-D graph in Figure~\ref{fig:plotnest3d} is again interpreted as the bi-dimensional exposure-lag-response association between the occupational exposure and the risk of cancer. Note how the lag period is expressed in years in this example. The graph suggests an initial increase in risk, measured as odds ratio (OR), followed by a decrease. The slice graphs in Figures~\ref{fig:plotnestlag}--\ref{fig:plotnestvar} provide additional details. Specifically, the estimated lag-response curve in Figure~\ref{fig:plotnestlag} displays a peak in risk 10 to 15 years after the exposure, with the risk then returning to the baseline level 30 years after the exposure, although the confidence intervals are quite wide. The exposure-response curve in Figure~\ref{fig:plotnestvar} suggests an attenuation of the effect at higher exposures, although again the confidence intervals do not rule out a linear association. %-------------------------%%-------------------------%%-------------------------% \section{Extended prediction summaries} \label{sec:extpred} The usual effect summaries obtained by \Rfun{crosspred} are computed over a grid of exposure and lag values, specified directly or through default selection. In particular, a vector of exposure values is usually passed through the argument \Rarg{at}, while the lag period is selected through \Rarg{lag}. The function then computes \emph{overall cumulative} summaries, stored in vectors with prefix \Rcode{all-}, and \emph{specific} summaries associated to combinations of exposure and lag values, stored in matrices with prefix \Rcode{mat-}. The former refer either to effects associated to a constant exposure throughout the lag period backward in time, or to the total effect contribution of an exposure event within the lag period forward in time. The latter can be used to display exposure-response and lag-response curves. However, in the extended setting described in this vignette it is useful to define alternative and complementary effect summaries. In particular, it may be of interest to predict what is the overall cumulative effect associated to a specific exposure history, possibly characterized by time-varying exposures. These new effect summaries can be easily computed using the function \Rfun{exphist} used in Section~\ref{sec:Qmatrix}, which produces a matrix of exposure histories given an exposure profile. This matrix can be directly passed as the argument \Rarg{at} of \Rfun{crosspred}, rather than a vector of exposure values. As an example, we can use the nested case-control analysis in Section~\ref{sec:dlnm} for computing the overall cumulative OR for an hypothetical subject exposed to exposure 10 for five years, then unexposed for five more years, then exposed to 13 for ten years. From this exposure profile, we can compute the exposure history at the end of the exposure period, looking backward in time. Specifically: <>= expnested <- rep(c(10,0,13), c(5,5,10)) hist <- exphist(expnested, time=length(expnested), lag=c(3,40)) hist @ The function \Rfun{exphist} produces the exposure history at time 20 over lag 3--40. The specific time is set through the argument \Rarg{time} and in this case corresponds to the end of the exposure period in \Robj{expnested}. The last 21 exposures to 0 are included to complete the exposure history up to 40 years. Now we can predict the overall cumulative effect by using \Robj{hist} as the argument \Rarg{at} of \Rfun{crosspred}. Note that the lag period must be consistent with that used in estimation. This is the code: <>= pnesthist <- crosspred(cbnest, mnest, cen=0, at=hist) with(pnesthist, c(allRRfit,allRRlow,allRRhigh)) @ The estimated OR is \Sexpr{round(pnesthist$allRRfit,1)} (95\%CI: \Sexpr{round(pnesthist$allRRlow,1)}--\Sexpr{round(pnesthist$allRRhigh,1)}) compared to a subject with no exposure throughout the whole lag period. The same approach can be used to obtain \emph{dynamic predictions} along time for a specific exposure profile. The idea behind this more complex effect summary is that the risk can be predicted dynamically in time given time-varying exposure histories, based on an assumed exposure-lag-response association. In practice, for each given time, moving forward, the exposure history changes as specific exposure events refer to different lag periods. As an example, I show how the dynamic predicted effect following a specific drug prescription can be estimated using the analysis of the trial data illustrated in Section~\ref{sec:dlm}. Let's assume that a patient is treated with a dose 10 for two weeks, then he/she increases to 50 for one week, then stops for 1 week and starts again with a dose 20 for two weeks. First, I create the daily exposure profile: <>= expdrug <- rep(c(10,50,0,20),c(2,1,1,2)*7) @ The function \Rfun{exphist} can now be used sequentially along the exposure profile to create the matrix of exposure histories for all the time points: <>= dynhist <- exphist(expdrug, lag=27) @ The argument \Rarg{time} of \Rfun{exphist} by default takes the values of all the time points of \Robj{expdrug}, creating the matrix of exposure histories for all of them. This matrix can now be used in \Rfun{crosspred} to obtain the dynamic predictions: <>= pdyndrug <- crosspred(cbdrug, mdrug, at=dynhist) @ The object can now be used to plot the dynamic prediction: <>= plot(pdyndrug,"overall", ylab="Effect", xlab="Time (days)", ylim=c(-10,27), xlim=c(1,50), yaxt="n") axis(2, at=-1:5*5) par(new=TRUE) plot(expdrug, type="h", xlim=c(1,50), ylim=c(0,300), axes=F, ann=F) axis(4, at=0:6*10, cex.axis=0.8) mtext("Dose", 4, line=-1.5, at=30, cex=0.8) @ The unnamed argument \Rarg{ptype} in \Rfun{plot} is set to \Rcode{"overall"}, thus plotting this extended version of overall cumulative association in Figure~\ref{fig:pdynnest}. This graph displays the variation from the baseline outcome associated with the drug prescription profile detailed above, represented as histogram-like vertical lines. As expected, the effect changes dynamically in time, depending on the doses but with a delay determined by the lag structure. \begin{figure} \centering \caption{} \includegraphics[width=0.40\textwidth]{fig-plotdynnest.pdf} \label{fig:pdynnest} \end{figure} %-------------------------%%-------------------------%%-------------------------% \section{Applying user-defined functions} \label{sec:userdeffun} Since version 2.0.0 of the \Rpkg{dlnm} package, important changes have also been implemented in the use of the main functions. In particular, the functions \Rfun{onebasis}, called independently or internally through \Rfun{crossbasis} for applying transformations, now simply acts as a wrapper to other functions (see \Rcomm{onebasis}). The new functions \Rfun{strata}, \Rfun{poly}, \Rfun{thr} and \Rfun{integer} in the \Rpkg{dlnm} package (see the related help pages), together with the functions \Rfun{ns} and \Rfun{bs} in the \Rpkg{splines} package, offer all the previous options of basis transformation. However, this flexible approach offers the possibility of using different functions available in other \R{} packages, or functions directly defined by the user. The called function must have \Rarg{x} as its first argument, and it must return a vector or matrix of transformed variables with attributes storing the arguments which exactly define the transformation. This information will be used later by \Rfun{crosspred} to produce the predictions. Also, the function must be defined as a closure containing formal arguments, meaning that primitive functions such as \Rfun{exp}, \Rfun{sin} or \Rfun{log} cannot be used directly (see the example below). As a first example of applying user-defined functions within the DLNM framework, we can revise the previous analysis illustrated in Section~\ref{sec:dlnm}. Figure~\ref{fig:plotnestvar} suggested a possible attenuation of the effect at high exposures. This fact and the skewness of the exposure distribution can be addressed through a logarithmic transformation. As shown in \citet{gasparrini2014statmed}, this is equivalent to apply a logarithm as exposure-response function in the cross-basis transformation. First, let's define a new log function: <>= mylog <- function(x) log(x+1) @ This step is required as \Rfun{log} is a primitive function and cannot be used directly. The original exposure is summed to 1 to prevent problems with 0 values in the logarithm. The new function \Rfun{mylog} can now be used directly in \Rfun{crossbasis} in place of \Rfun{bs} to model the exposure-response curve: <>= cbnest2 <- crossbasis(Qnest, lag=c(3,40), argvar=list("mylog"), arglag=list(fun="ns",knots=c(10,30),intercept=F)) summary(cbnest2) @ Note how the total \emph{df} of the cross-basis are now 3, if compared to the 9 in the original transformation, as the exposure-response is modelled with only 1 \emph{df}. The rest of the code is almost identical, just substituting the newly created objects: <>= mnest2 <- clogit(case~cbnest2+strata(riskset), nested) pnest2 <- crosspred(cbnest2, mnest2, cen=0, at=0:20*5) plot(pnest2, zlab="OR", xlab="Exposure", ylab="Lag (years)") plot(pnest2, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40)) lines(pnest, var=50, lty=2) plot(pnest2, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15)) lines(pnest, lag=5, lty=2) @ <>= plot(pnest2, zlab="OR", xlab="Exposure", ylab="Lag (years)") @ <>= plot(pnest2, var=50, ylab="OR for exposure 50", xlab="Lag (years)", xlim=c(0,40)) lines(pnest, var=50, lty=2) @ <>= plot(pnest2, lag=5, ylab="OR at lag 5", xlab="Exposure", ylim=c(0.95,1.15)) lines(pnest, lag=5, lty=2) @ \begin{figure}[h] \centering \caption{} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnest3d2.pdf} \label{fig:plotnest3d2} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnestlag2.pdf} \label{fig:plotnestlag2} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotnestvar2.pdf} \label{fig:plotnestvar2} \end{subfigure} \label{fig:plotnest2} \end{figure} The results presented in Figure~\ref{fig:plotnest2} can be compared with those originally displayed in Figure~\ref{fig:plotnest} of Section~\ref{sec:dlnm}. The method function \Rfun{lines} are used to add the original curves to the new plots. The comparison also indicates how the assumption of a logarithmic shape produces a substantial increase in precision. Another example of application of user-defined functions is provided by extending the analysis illustrated in Section~\ref{sec:dlm}. The inspection of Figure~\ref{fig:plotdruglag} suggested that the lag-response curve follows an exponential decay trajectory. It may be reasonable to apply a function modelling this shape instead than a natural cubic spline. This decay function can be defined as: <>= fdecay <- function(x,scale=5) { basis <- exp(-x/scale) attributes(basis)$scale <- scale return(basis) } @ The argument \Rarg{scale}, with default value 5, is used to control the degree of decay. Note how this must be included as an attribute of the vector returned by the new function \Rfun{fdecay}. Again, we can use this new function to obtain the alternative cross-basis transformation: <>= cbdrug2 <- crossbasis(Qdrug, lag=27, argvar=list("lin"), arglag=list(fun="fdecay",scale=6)) summary(cbdrug2) @ Again, the computational step used in Section~\ref{sec:dlm} can be repeated to perform the modified analysis: <>= mdrug2 <- lm(out~cbdrug2+sex, drug) pdrug2 <- crosspred(cbdrug2, mdrug2, at=0:20*5) plot(pdrug2, zlab="Effect", xlab="Dose", ylab="Lag (days)") plot(pdrug2, var=60, ylab="Effect at dose 60", xlab="Lag (days)", ylim=c(-1,5)) lines(pdrug, var=60, lty=2) plot(pdrug2, lag=10, ylab="Effect at lag 10", xlab="Dose", ylim=c(-1,5)) lines(pdrug, lag=10, lty=2) @ <>= plot(pdrug2, zlab="Effect", xlab="Dose", ylab="Lag (days)") @ <>= plot(pdrug2, var=60, ylab="Effect at dose 60", xlab="Lag (days)", ylim=c(-1,5)) lines(pdrug, var=60, lty=2) @ <>= plot(pdrug2, lag=10, ylab="Effect at lag 10", xlab="Dose",ylim=c(-1,5)) lines(pdrug, lag=10, lty=2) @ \begin{figure} \centering \caption{} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdrug3d2.pdf} \label{fig:plotdrug3d2} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdruglag2.pdf} \label{fig:plotdruglag2} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotdrugvar2.pdf} \label{fig:plotdrugvar2} \end{subfigure} \label{fig:plotdrug2} \end{figure} The results are reported in Figure~\ref{fig:plotdrug2}. The comparison with results in Figure~\ref{fig:plotdrug} of Section~\ref{sec:dlm}, included again as dashed lines, shows a dramatic increase in precision, as a strict structure is assumed for the exposure-lag-response surface, which is entirely estimated with only 1 \emph{df}. Note how the argument \Rarg{scale} is selected a priori to 6 and not estimated here, as the function \Rfun{fdecay} is non-linear for this parameter. However, the DLNM framework can also be used with non-linear regression functions such as \Rfun{nls} for estimating the scale parameter. More generally, a critical discussion and some guidance on inference and model selection in the DLNM framework are offered in \citet{gasparrini2014statmed}. %-------------------------%%-------------------------%%-------------------------% \section{A general tool for regression analysis} \label{sec:generaltool} The functions in the package \Rpkg{dlnm} can also be used as a general tool for regression analysis. In particular, the facilities for prediction and graphical representation, developed for assessing bi-dimensional exposure-lag-response associations, can be more generally applied for unlagged exposure-response relationships. Standard method for estimating these uni-dimensional associations include the use of regression splines in unpenalized models, such as generalized linear models (GLMs) and Cox proportional hazard models, or penalized splines in generalized additive models (GAMs). The user can extract predictions in the usual way using the function \Rfun{crosspred}, which can be applied with unpenalized models in conjunction with the function \Rfun{onebasis}, or directly with regression outputs of penalized models fitted using the function \Rfun{gam} in the \Rpkg{mgcv} package. In order to illustrate these options, I replicate examples illustrated in the help pages of the functions \Rfun{ns} in the package \Rpkg{splines} and \Rfun{gam} in the package \Rfun{mgcv}. The first example deomonstrates the use of regression splines with the regression function \Rfun{lm} to assess the relationship between average height (in inches) and weight (in pounds) in a sample of American women aged 30--39. The code is reported in the Examples section of the help page of \Rfun{ns} (see \Rcode{help(ns)}). The same transformation can be obtained by applying the wrapper function \Rfun{onebasis}: <>= library(splines) oneheight <- onebasis(women$height, "ns", df=5) mwomen <- lm(weight ~ oneheight, data=women) @ The use of \Rfun{onebasis} allows the use of other functions in \Rpkg{dlnm} for obtaining predictions and plots, using a simple code: <>= pwomen <- crosspred(oneheight, mwomen, cen=65, at=58:72) with(pwomen, cbind(allfit, alllow, allhigh)["70",]) plot(pwomen, ci="l", ylab="Weight (lb) difference", xlab="Height (in)", col=4) @ The function \Rfun{crosspred} can be applied as usual, just including the object of class \Rclass{onebasis} as its first argument. The results are reported using a height of 65 inches as references. The estimated association, with confidence intervals, can be retrieved simply by accessing the \Rcode{all-} components. Note that, as no lagged effect is allowed, these are identical to the \Rcode{mat-} components, which are however reported as 1-column matrices. The association can be plotted in the usual way with the method function \Rfun{plot}, with the graph shown in Figure~\ref{fig:plotwomen}. Note how it is not important to select the type of the plot with the argument \Rarg{ptype}, as only uni-dimensional graphs can be created. \begin{figure}[h] \centering \caption{} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotwomen.pdf} \label{fig:plotwomen} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotgam.pdf} \label{fig:plotgam} \end{subfigure} \begin{subfigure}{0.32\textwidth} \centering \caption{} \includegraphics[width=\textwidth]{fig-plotgam2.pdf} \label{fig:plotgam2} \end{subfigure} \label{fig:plotgentool} \end{figure} The second example illustrates the application of functions in the package \Rpkg{dlnm} to facilitate the analysis of smooth associations using penalized splines. The (slightly modified) original code, reported in the Examples section of the help page of \Rfun{gam} (see \Rcode{help(gam)}), is: <>= library(mgcv) dat <- gamSim(1,n=200,dist="poisson",scale=.1) b2 <- gam(y ~ s(x0,bs="cr") + s(x1,bs="cr") + s(x2,bs="cr") + s(x3,bs="cr"), family=poisson, data=dat, method="REML") plot(b2, select=3) @ <>= plot(b2, select=3) @ The code performs a GAM estimating smoothed relationships in simulated data with several variables using penalized cubic regression splines through the function \Rfun{s}, and generates the graph for the variable \Robj{x2}, displayed in Figure~\ref{fig:plotgam}. Predictions and plots can also be obtained using \Rpkg{dlnm} functions, with: <>= pgam <- crosspred("x2", b2, cen=0, at=0:100/100) with(pgam, cbind(allRRfit, allRRlow, allRRhigh)["0.7",]) plot(pgam, ylim=c(0,3), ylab="RR", xlab="x2", col=2) @ As shown above, the \Rfun{crosspred} function also works directly with associations defined through the function \Rfun{s} within \Rfun{gam}, simply including the character string with the name of the variable as its first argument. This usage is similar to the more complex cross-basis parameterization obtained using the related smooth \Rcode{cb} smooth constructor (see the vignette \vign{dlnmPenalized}). This step simplifies the computations of estimated assocations, together with measures of uncertainty. In addition, it is possible to plot the smoothed relationship in the response scale or relative risk (RR), and using a reference value that makes easier to interpret the association, as shown in Figure~\ref{fig:plotgam2}. %-------------------------%%-------------------------%%-------------------------% \bibliographystyle{plainnat} \bibliography{biblioVignette} \addcontentsline{toc}{section}{Bibliography} % To add bibliography to the TOC %-------------------------%%-------------------------%%-------------------------% \end{document}