--- title: "NNMoMo: An R Package for Mortality Modeling with Neural Networks" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{NNMoMo: An R Package for Mortality Modeling with Neural Networks} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) torch_works <- tryCatch({ torch::torch_is_installed() }, error = function(e) FALSE) if (!torch_works) { knitr::opts_chunk$set(eval = FALSE) warning_msg <- "**Note:** This vignette was built on a system where Torch is not installed. Code chunks are displayed but not evaluated." } else { warning_msg <- "" } ``` ```{r setup, echo=TRUE, results='hide', message=FALSE, warning=FALSE} library(NNMoMo) ``` Before any data can be trained, the model needs to be set up. The neural network can be customized using the function’s parameters, but the predefined configuration has been tested and found to be sufficient for most applications. In this example, two different models are configured to show the resulting differences in the outcomes. First, a basic model is computed using the predefined parameters with a "linear" activation function, a "MSE" loss function and "FCN" connections. Second, a model with a "CNN" connection of the neurons is defined. Moreover, the activation function is switched to "tanh" and q_z1 is increased to 50, though these two modifications are rather minor and are expected to have little effect on the overall results. Additionally, a "Poisson" loss function is chosen. ```{r model setup} model_basic <- lcNN() model_CNN_Poisson <- lcNN(loss_type = "Poisson", activation = "tanh", model_type = "CNN", q_z1 = 50) ``` The configuration of the model can be checked by printing it or using `summary()`. ```{r print of model setup} model_CNN_Poisson summary(model_CNN_Poisson) ``` Next, the data needs to be set up for analysis. Country-specific datasets can, for example, be downloaded from the Human Mortality Database or from other reliable sources of mortality data. The datasets for the USA, Canada, Australia, Japan, and Great Britain have already been downloaded and are included with the package (note that they may be outdated). These datasets are used in this example. ```{r data setup} nn_data <- NNMoMoData(NNMoMo_data_USA, NNMoMo_data_CAN, NNMoMo_data_AUS, NNMoMo_data_JPN, NNMoMo_data_GBR) ``` Alternatively, data can be downloaded using `demography::hmd.mx()` or by calling `NNMoMoData()` without any arguments. In this case, a predefined list of 40 countries is downloaded automatically. It is also important to note that, to ensure the neural network has sufficient data, at least 10 countries should normally be included. Again, the obtained data file can be printed or summarized. ```{r print of data} nn_data summary(nn_data) ``` Once the model has been configured and the data prepared, it can be fitted. At this stage, additional parameters can be specified. Most importantly, fitting.epochs determines the number of training epochs, while years.fit and ages.fit define the range of years and ages considered. In this example, common settings have been chosen, and the number of epochs has been set to 5 to reduce computation time in the vignette. This is of course far too few for a proper fit, as around 2000 epochs are recommended. ```{r fitting, echo=TRUE, results='hide', message=FALSE, warning=FALSE} fitted_basic <- fit(model_basic, nn_data, years.fit = 1950:1999, ages.fit = 0:99, fitting.epochs = 5) fitted_CNN_Poisson <- fit(model_CNN_Poisson, nn_data, years.fit = 1950:1999, ages.fit = 0:99, fitting.epochs = 5) ``` After the fitting process, the calculated Lee–Carter models for each country and gender are stored in list-like objects. Information about the fitting process and the fitted models can be obtained by printing or summarizing. ```{r} fitted_CNN_Poisson summary(fitted_CNN_Poisson) ``` The models can then be further analyzed and processed using native StMoMo functions. For demonstration purposes, a pre-trained model with 2000 fitting epochs and more countries is loaded and used for plotting, since the results from only 5 epochs are insufficient for proper visualization. For example, the data for females from the USA can be plotted with: ```{r loading pretrained sets, echo = FALSE, results = 'hide', message = TRUE, warning = FALSE} if (file.exists("fitted_basic_2000.RData")) { load("fitted_basic_2000.RData") } else { message("Pre-fitted model 'fitted_basic_2000' not found. Fitting of new one started...") fitted_basic_2000 <- fit(model_basic, nn_data, years.fit = 1950:1999, ages.fit = 0:99, fitting.epochs = 2000) } if (file.exists("fitted_CNN_Poisson_2000.RData")) { load("fitted_CNN_Poisson_2000.RData") } else { message("Pre-fitted model 'fitted_CNN_Poisson_2000' not found. Fitting of new one started...") fitted_CNN_Poisson_2000 <- fit(model_CNN_Poisson, nn_data, years.fit = 1950:1999, ages.fit = 0:99, fitting.epochs = 2000) } ``` ```{r plotting, fig.width = 7, fig.height = 5} plot(fitted_basic_2000$USA_female) plot(fitted_CNN_Poisson_2000$USA_female) ``` As the native `StMoMo::residuals()` method allows for scaling, which is not supported in the current NNMoMo implementation, a method that raises an error when `scaling = TRUE` has been implemented. ```{r residuals} try(residuals(fitted_basic_2000$USA_female, scale = TRUE)) ``` Moreover, computation of the logLik for fitted models has not yet been implemented in the package, and an error is therefore raised. ```{r} try(logLik(fitted_basic_2000$USA_female)) ```