--- title: "Vignette_tneh" author: Juste Goungounga, Judith Breaud, Olayidé Boussari, Valérie Jooste format: html editor: visual output: rmarkdown::html_document: theme: paper toc: true toc_float: true vignette: > %\VignetteIndexEntry{Vignette_tneh} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} #| echo: true #| label: load-packages library(curesurv) ``` # How to analyse survival data using Time-To-Null excess hazard model (TNEH model) ## Additive hazard assumption in net survival setting When the cause of death is unknown, the most common method to estimate the cancer-related survival is net survival. Its estimation assumes that the observed hazard $\lambda_{obs}$ is equal to the sum of the known background mortality hazard in the general population $\lambda_{pop}$ (obtained from national Statistic Institutes such as INSEE in France) and the excess hazard (due to cancer) $\lambda_{exc}$. For one individual $i$, this relation can be expressed as: $$ \lambda_{obs}(t_i|z_i) = \lambda_{pop}(t_i+a_i|z_{pop,i}) + \lambda_{exc}(t_i|z_i) $$ where $Z_{pop}\subset Z$ are the variables of the mortality table and of the model respectively. The cumulative observed hazard can be written as: $$ \Lambda_{obs}(t|z) = \Lambda_{pop}(t+a|z) + \Lambda_{exc}(t|z) $$ and the net survival is obtained as following: $$ S_{n}(t|z) = \exp(-\Lambda_{exc}(t|z)) $$ # TNEH model The TNEH model is a relatively recent excess hazard model developed by Boussari et al. $\\$ The particularity of this model is that it enables the estimation, at the same time as the classical parameters of a model of the excess rate, of a quantity which is obtained by post-estimation by the classical models: it concerns the time after which the excess rate becomes null i.e. the cure point. ## Instantaneous excess hazard The excess hazard proposed can be expressed as following: $$ \lambda_{exc}(t|z;\theta) = \left(\dfrac{t}{\tau(z;\tau*)}\right)^{\alpha(z;\alpha*)-1} \left(1 - \dfrac{t}{\tau(z;\tau*)}\right)^{\beta-1} 1_{\left\{0 \le t \le \tau(z;\tau*)\right\}} $$ where : $\\$ $\tau(z;\tau*) > 0$ is the time to cure, depends on covariates z and vector of parameters $\tau*$. It corresponds to the vector of parameters fitting the time-to-null excess hazard. $\\$ $\alpha(z;\alpha*) > 0$ and $\beta > 1$ are shape parameters. With $\beta>1$, the excess hazard is forced to be null and continuous in $\tau(z;\tau*)$. $\\$ The vector of parameters to be estimated is $\theta = (\alpha*, \beta, \tau*)$ with $\alpha(z;\alpha*) > 0$ . ## Cumulative excess hazard $$ \Lambda_{exc}(t|z;\theta) = \tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right) $$ where B is the beta function $\\$ $F_{Be}$ is the cumulative distribution function (cdf) of the beta distribution ## Net survival $$ S_n(t|z) = \exp(-\Lambda_{exc}(t|z)) = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*),\beta \right)\right) $$ ## The cure fraction $\pi$ The cure fraction corresponds to the net survival at $t = \tau$ in TNEH model. It can be expressed as: $$ \pi(z|\theta) = \exp\left(-\Lambda_{exc}(\tau(z;\tau*)|z)\right) = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right)\right) $$ ## The probability P(t) This quantity corresponds to the probability P(t) of being cured at a given time t after diagnosis knowing that he/she was alive up to time t. It can be expressed as following: $$ P(t|z) = \dfrac{\pi(z|\theta)}{S_n(t|z)} = \exp \left( \tau(z;\tau*) \left( B \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right) - B(\alpha, \beta) \right) \right) $$ To calculates the confidence intervals of $P(t|z)$, can be obtained using the delta method. The application of this method requires the partial derivatives of $P(t|z)$ with respect of the parameters of the model. This can be written as: $$ \dfrac{\partial P(t|z)}{\partial \theta} = \dfrac{1}{S_n(t|z)^2} \left( \dfrac{\partial \pi(z|\theta)}{\partial \theta} S_n(t|z) - \dfrac{\partial S_n(t|z)}{\partial \theta} \pi(z|\theta) \right) $$ # Fit of tneh model using R ## Without covariates There are no covariables acting on parameters $\tau$ ($\tau=\tau_0$) and $\alpha$ ($\alpha=\alpha_0$) ```{r } #| echo: true #| warning: false #| message: false #| label: nocov fit_ad_tneh_nocov <- curesurv(Surv(time_obs, event) ~ 1, pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture", dist = "tneh", link_tau = "linear", data = testiscancer, method_opt = "L-BFGS-B") ``` ```{r} summary(fit_ad_tneh_nocov) ``` We can see that the time-to-cure $\tau=\tau_0$ is estimated at 5.102 years, and the cure proportion is estimated at 84.74%. ### Prediction We predict for varying time after diagnosis ```{r } #| echo: true #| label: predictnocov newdata1 <- with(testiscancer, expand.grid(event = 0, time_obs = seq(0.1,10,0.1))) p_28 <- predict(object = fit_ad_tneh_nocov, newdata = newdata1) ``` ### Plot of different estimators (hazard, survival, probability of being cured conditionally on being alive) ```{r} #| echo: true #| label: plotpredictnocov #| fig.height: 8 #| fig.width: 10 plot(p_28) ``` ### Cure fraction estimation precision The confidence intervals at $1-\alpha$ level for the cure fraction $\pi$ can be written as: $$ \left[\hat{\pi} \pm z_{1 - \alpha / 2} \sqrt{Var(\hat{\hat{\pi}})}\right] $$ where $$ Var(\hat{\pi}) = \dfrac{\partial \hat{\pi}}{\partial \theta} Var(\theta) \left(\dfrac{\partial \hat{\pi}}{\partial \theta}\right)^T $$ ```{r} p_28[1,c("lower_bound_pi_tneh","cured","upper_bound_pi_tneh")] ``` ### Time-to-cure estimation We search the time $\text{t}=\text{TTC}_i$ from which $\text{P}_i(t) = 1-\epsilon$. $\epsilon$ can be fixed to 0.95. The variance formula can be expressed as: $$ Var(TTC) = Var(g(\theta;z_i)) \simeq \left(\dfrac{\partial P(t|z_i;\theta)}{\partial t}_{|t = TTC}\right)^{-2} Var(P(t|z_i;\theta))_{|t=TTC} $$ ```{r} p_28[1,c("lower_bound_TTC_tneh","time_to_cure_ttc","upper_bound_TTC_tneh")] ``` ## With covariates acting both on parameters tau and alpha We create a new age variable : age_crmin the reduced age and "centered" around the age of the youngest patient ```{r} #| echo: true #| warning: false #| message: false testiscancer$age_crmin <- (testiscancer$age- min(testiscancer$age)) / sd(testiscancer$age) ``` ```{r} #| echo: false #| warning: false #| message: false oldpar<-par(no.readonly = TRUE) par(mfrow=c(1,1)) plot(testiscancer$age,testiscancer$age_crmin,type="l",xlab="age",ylab="age_crmin") text(x=min(testiscancer$age),y=0,col="blue",round(min(testiscancer$age),2)) lines(x=rep(min(testiscancer$age),2),y=c(-1,0),col="blue",lty=2) lines(x=c(-1,min(testiscancer$age)),y=c(0,0),col="blue",lty=2) text(x=min(testiscancer$age)+sd(testiscancer$age),y=1.2,col="red",round(sd(testiscancer$age)+min(testiscancer$age),2)) lines(x=rep(min(testiscancer$age)+sd(testiscancer$age),2),y=c(-1,1),col="red",lty=2) lines(x=c(-1,min(testiscancer$age)+sd(testiscancer$age)),y=c(1,1),col="red",lty=2) par(oldpar) ``` This model has the variable age_crmin acting on both $\alpha$ and $\tau$ : ($\alpha=\alpha_0+Z_{age\_crmin}\times\alpha_1$ et $\tau=\tau_0+Z_{age\_crmin}\times\tau_1$) ```{r} #| echo: true #| warning: false #| message: false #| label: withcov fit_m1_ad_tneh <- curesurv(Surv(time_obs, event) ~ z_tau(age_crmin) + z_alpha(age_crmin), pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture", dist = "tneh", link_tau = "linear", data = testiscancer, method_opt = "L-BFGS-B") ``` ```{r} summary(fit_m1_ad_tneh) ``` For the reference individual (that is the patient with age_crmin=0, so the youngest patient, approximately 20y old at diagnosis), the cure proportion is estimated at 96,75% and the time to null excess hazard at 3.258 year. For an individual whose age\_ is_crmin is 1 (that is they are aged the standard deviation more than the youngest, that is approximately 39y old at diagnosis), the time to null excess hazard is 6.721 year. ### Predictions : With varying time for patient of mean age ```{r} #| echo: true #time varying prediction for patient with age mean newdata1 <- with(testiscancer, expand.grid(event = 0, age_crmin = mean(age_crmin), time_obs = seq(0.001,10,0.1))) pred_agemean <- predict(object = fit_m1_ad_tneh, newdata = newdata1) ``` With varying time for oldest patient ```{r} #| echo: true #time varying prediction for patient with age max newdata2 <- with(testiscancer, expand.grid(event = 0, age_crmin = max(age_crmin), time_obs = seq(0.001,10,0.1))) pred_agemax <- predict(object = fit_m1_ad_tneh, newdata = newdata2) ``` At 2 years after diagnostic for patients of different ages ```{r} #| echo: true # predictions at time 2 years with varying age newdata3 <- with(testiscancer, expand.grid(event = 0, age_crmin = seq(min(testiscancer$age_crmin), max(testiscancer$age_crmin), 0.1), time_obs = 2)) pred_age_val <- predict(object = fit_m1_ad_tneh, newdata = newdata3) val_age <- seq(min(testiscancer$age_crmin), max(testiscancer$age_crmin), 0.1) * sd(testiscancer$age) + min(testiscancer$age) ``` ### Plot of main indicators as a function of time for two patients of age mean and age max ```{r } #| echo: true #| label: predict_tnehcov #| fig.height: 10 #| fig.width: 10 par(mfrow = c(2, 2),cex = 1.0) plot(pred_agemax$time,pred_agemax$ex_haz,type = "l",lty = 1,lwd = 2, xlab = "Time since diagnosis", ylab = "excess hazard") lines(pred_agemean$time,pred_agemean$ex_haz,type = "l",lty = 2,lwd = 2) legend("topright",horiz = FALSE, legend = c("hE(t) age.max = 79.9", "hE(t) age.mean = 50.8"), col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2)) grid() plot(pred_agemax$time,pred_agemax$netsurv,type = "l",lty = 1,lwd = 2, ylim = c(0, 1),xlab = "Time since diagnosis",ylab = "net survival") lines(pred_agemean$time,pred_agemean$netsurv,type = "l",lty = 2,lwd = 2) legend("bottomleft",horiz = FALSE, legend = c("Sn(t) age.max = 79.9", "Sn(t) age.mean = 50.8"), col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2)) grid() plot(pred_agemax$time,pred_agemax$pt_cure,type = "l",lty = 1,lwd = 2,ylim = c(0, 1), xlab = "Time since diagnosis",ylab = "probability of being cured P(t)") lines(pred_agemean$time,pred_agemean$pt_cure,type = "l",lty = 2,lwd = 2) abline(v = pred_agemean$tau[1],lty = 2,lwd = 2,col = "blue") abline(v = pred_agemean$time_to_cure_ttc[1],lty = 2,lwd = 2,col = "red") abline(v = pred_agemax$tau[1],lty = 1,lwd = 2,col = "blue") abline(v = pred_agemax$time_to_cure_ttc[1],lty = 1,lwd = 2,col = "red") grid() legend("bottomright",horiz = FALSE, legend = c("P(t) age.max = 79.9","P(t) age.mean = 50.8", "TNEH age.max = 79.9","TTC age.max = 79.9", "TNEH age.mean = 50.8","TTC age.mean = 50.8"), col = c("black", "black", "blue", "red", "blue", "red"), lty = c(1, 2, 1, 1, 2, 2)) par(oldpar) ``` ### Plot of the main indicators 2 years after diagnosis, for patients of varying ages ```{r} #| echo: true #| label: predicted_to_plot_by_age #| fig.height: 8 #| fig.width: 10 par(mfrow=c(2,2)) plot(val_age,pred_age_val$ex_haz, type = "l",lty=1, lwd=2, xlab = "age",ylab = "excess hazard 2y after diagnosis") grid() plot(val_age,pred_age_val$netsurv, type = "l", lty=1,lwd=2,ylim=c(0,1), xlab = "age", ylab = "net survival 2y after diagnosis") grid() plot(val_age,pred_age_val$pt_cure, type = "l", lty=1, lwd=2,ylim=c(0,1), xlab = "age",ylab = "P(t) 2y after diagnosis") grid() plot(val_age,pred_age_val$cured, type = "l", lty=1, lwd=2,ylim=c(0,1), xlab = "age", ylab = "cure proportion") grid() par(oldpar) ``` ## With covariates acting only on parameters adjusting the time-to-null excess hazard tau Effects of centered and reduced age on $\tau$ ($tau=\tau_0+Z_{age\_cr}\times\tau_1$, $\alpha=\alpha_0$) ```{r } #| echo: true #| label: withtauonly #| warning: false #| message: false fit_ad_tneh_covtau <- curesurv( Surv(time_obs, event) ~ z_tau(age_cr), pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture", dist = "tneh", link_tau = "linear", data = testiscancer, method_opt = "L-BFGS-B" ) ``` ```{r} #| echo: true #| label: withtauonly summary(fit_ad_tneh_covtau) ``` This model estimates a cure proportion of 79.4% for individuals of mean age (51.05 year old at diagnosis) with a time to null excess hazard 7.44 year. For an individual of age at diagnosis 70.01 (age_cr=1), the time to null excess hazard is estimated at 9.754. This model estimates that the time to null excess hazard increases by 2.3159 when age_cr increases by (that is, when age at diagnosis increases by 18.96) ### Prediction ```{r } #| echo: true #| label: predicted summary(testiscancer$age_cr) summary(testiscancer$age) newdata2 <- with(testiscancer, expand.grid(event = 0, time_obs = seq(0.1, 10, 0.1), age_cr = c(-0.9358, 0.0276, 0.9525) )) newdata2_1stqu <- newdata2[newdata2$age_cr==-0.9358,] newdata2_2rdqu <- newdata2[newdata2$age_cr==0.0276,] newdata2_3rdqu <- newdata2[newdata2$age_cr==0.9525,] p1stqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_1stqu) p2rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_2rdqu) p3rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_3rdqu) ``` ### Plot of excess hazard at varying times for 3 different ages at diagnosis (33.3, 51.6, 69,1) ```{r} #| echo: true #| label: plot_for_specific_age #| fig.height: 8 #| fig.width: 10 par(mfrow=c(2,2)) plot(p1stqu, main = "Excess hazard for age 33.3", fun = "haz") plot(p2rdqu, fun = "haz", main = "Excess hazard for age 51.57") plot(p3rdqu, fun = "haz", main = "Excess hazard for age 69.11") par(oldpar) ``` ## With covariates acting only on scale parameter alpha Effect of age_cr on only $\alpha$ ($\alpha=\alpha_0+Z_{age\_cr}*\alpha_1$, $\tau=\tau_0$) ```{r } #| echo: true #| label: only_covariate_on_alpha #| message: false #| warning: false fit_ad_tneh_covalpha <- curesurv( Surv(time_obs, event) ~ z_alpha(age_cr), pophaz = "ehazard", cumpophaz = "cumehazard", model = "nmixture", dist = "tneh", link_tau = "linear", data = testiscancer, method_opt = "L-BFGS-B" ) ``` ```{r} #| echo: true #| message: false #| warning: false summary(fit_ad_tneh_covalpha) ``` The time to null excess hazard is estimated at 6.099 year for all individuals, and the cure proportion is estimated at 81.81% for patient of mean age (51.05 yo at diagnosis) ### Predictions ```{r } #| echo: true #| message: false #| warning: false #| include: true p4_33.3 <- predict(object = fit_ad_tneh_covalpha, newdata = newdata2_1stqu) p4_51.6 <- predict(object = fit_ad_tneh_covalpha, newdata = newdata2_2rdqu) p4_69.1 <- predict(object = fit_ad_tneh_covalpha, newdata = newdata2_3rdqu) ``` ### Plot of estimation of probability P(t) of being cured at a given time t after diagnosis knowing that he/she was alive up to time t ```{r} #| echo: true #| label: plot_for_specific_age_2 #| fig.height: 8 #| fig.width: 10 par(mfrow=c(2,2)) plot(p4_33.3, main = "Pt_cure for age 33.3", fun = "pt_cure") plot(p4_51.6, fun = "pt_cure", main = "Pt_cure for age 51.6") plot(p4_69.1, fun = "pt_cure", main = "Pt_cure for age 69.1") par(oldpar) ``` ## Comparing models We fitted 4 models and to compare them we can use the AIC criteria ```{r} AIC(fit_ad_tneh_nocov,fit_ad_tneh_covalpha,fit_ad_tneh_covtau,fit_m1_ad_tneh) ``` The best model is the one with the lowest AIC : it's the one with an effect of age on both \$ \tau\$ and $\alpha$. # Session info ```{r} date() sessionInfo() ```