--- title: "3 Extensions of the basic model structure" author: "Jan-Ole Fischer" output: rmarkdown::html_vignette # output: pdf_document vignette: > %\VignetteIndexEntry{Model extensions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: refs.bib link-citations: yes --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "img/", fig.align = "center", fig.dim = c(8, 6), out.width = "85%" ) ``` > Before diving into this vignette, we recommend reading the vignettes [**Introduction to LaMa**](https://janolefi.github.io/LaMa/articles/Intro_to_LaMa.html) and [**Automatic differentiation via RTMB**](https://janolefi.github.io/LaMa/articles/LaMa_and_RTMB.html) In this vignette, we discuss several extensions of the basic model structure, assuming you know how to fit basic models using automatic differentiation. # Multiple data streams It is often the case that we observe more than one time series of interest, e.g. in animal movement we often have both step lengths and turning angles. If we assume *contemporaneous conditional independence*, i.e. that the observations at each time $t$ are independent of each other given the underlying state $S_t$, modelling such data is straightforward. Inside the likelihood this simply leads to us having to calculate state-dependent densities for each data stream, which are multiplied because of the independence assumption. Let's see this in the `trex` example: ```{r setup and data} library(LaMa) head(trex, 5) ``` We now include the variable `angle` in order to incorporate additional information about the hidden state process. For this variable, we assume a state-dependent *von Mises* distribution, which is basically a normal distribution wrapped around the unit circle. It is parameterised by a mean direction $\mu \in [-\pi, \pi]$ and a concentration parameter $\kappa > 0$. For movement analyses, the state-dependent angle means are usually zero, hence we fix them at zero. The likelihood function is easily modified to incorporate the turning angle: ```{r nll multiple data streams} nll_multi = function(par) { getAll(par, dat) Gamma = tpm(eta) delta = stationary(Gamma) # Parameter transformations for strictly positive parameters mu = exp(log_mu); REPORT(mu) sigma = exp(log_sigma); REPORT(sigma) # additional state-dependent turning angle concentration parameter kappa = exp(log_kappa); REPORT(kappa) # Calculating all state-dependent densities allprobs = matrix(1, nrow = length(step), ncol = N) ind = which(!is.na(step) & !is.na(angle)) # only for non-NA obs. for(j in 1:N){ allprobs[ind,j] = dgamma2(step[ind], mu[j], sigma[j]) * dvm(angle[ind], 0, kappa[j]) # simply multiplying the state-dep. densities } -forward(delta, Gamma, allprobs) # simple forward algorithm } ``` We now simply have to include initial values for the turning angle concentration parameters and fit the model: ```{r multiple streams fit} N = 2 # number of hidden states par = list( eta = rep(-2, N*(N-1)), # (logit) initial t.p.m. parameters log_mu = log(seq(0.3, 1, length=N)), # (log) initial means for step length log_sigma = log(seq(0.2, 0.7, length=N)), # (log) initial sds for step length log_kappa = log(seq(0.2, 0.4, length=N)) # (log) initial concentration for angle ) dat = list( step = trex$step, # hourly step lengths angle = trex$angle, # hourly turning angles N = N ) # Fitting the model obj_multi = MakeADFun(nll_multi, par, silent = TRUE) opt_multi = nlminb(obj_multi$par, obj_multi$fn, obj_multi$gr) mod_multi = report(obj_multi) # reporting from fitted model ``` We can now plot the estimated state-dependent densities separately for each state: ```{r state dist multi, fig.width = 8, fig.height = 4} # extracting estimated parameters delta = mod_multi$delta mu = mod_multi$mu sigma = mod_multi$sigma kappa = mod_multi$kappa # defining color vector color = c("orange", "deepskyblue") oldpar = par(mfrow = c(1,2)) # step length hist(trex$step, prob = TRUE, breaks = 40, bor = "white", main = "", xlab = "step length") for(j in 1:N) curve(delta[j] * dgamma2(x, mu[j], sigma[j]), lwd = 2, add = TRUE, col = color[j]) curve(delta[1]*dgamma2(x, mu[1], sigma[1]) + delta[2]*dgamma2(x, mu[2], sigma[2]), lwd = 2, lty = 2, add = TRUE) legend("top", lwd = 2, col = c(color, 1), lty = c(1,1,2), legend = c("state 1", "state 2", "marginal"), bty = "n") # turning angle hist(trex$angle, prob = TRUE, breaks = seq(-pi, pi, length = 20), bor = "white", main = "", xlab = "turning angle") for(j in 1:2) curve(delta[j] * dvm(x, 0, kappa[j]), lwd = 2, add = TRUE, col = color[j]) curve(delta[1] * dvm(x, 0, kappa[1]) + delta[2] * dvm(x, 0, kappa[2]), lwd = 2, lty = 2, add = TRUE) par(oldpar) ``` # Covariate effects In many empirical applications, either the state-process model or the state-dependent-process model depends on covariates. For example in animal movement, the transition probabilities between states might depend on environmental covariates such as temperature or vegetation cover. We will first look at covariate effects in the state process, and then at covariate effects in the state-dependent process, which is typically called Markov-switching regression. ## Covariate effects in the state process Covariate effects in the state process are usually modelled by expressing the transition probability matrix $\boldsymbol{\Gamma}$ as a function of covariates. Let $\boldsymbol{z}_t$ denote a vector of covariates of length $p+1$ for time points $t = 1, \dots, T$, where the first entry is always equal to $1$ to include an intercept. Moreover, let $\boldsymbol{\beta}_{ij}$ be a vector of regression coefficients, also of length $p+1$ for each off-diagonal element ($i \neq j$) of the t.p.m. First, consider forming linear predictors $$ \eta_{ij}^{(t)} = \boldsymbol{\beta}_{ij}^\top \boldsymbol{z}_t = \beta_{ij0} + \beta_{ij1} z_{t1} + \dots + \beta_{ijp} z_{tp}, $$ for $t = 1, \dots, T$. In order to link transition probabilities to these predictors, we need to ensure that the former are in the interval $(0,1)$, and that each row of the transition matrix sums to one. This is achieved by using the inverse multinomial logistic link function (softmax), which gives us $$ \gamma_{ij}^{(t)} = \Pr(S_t = j \mid S_{t-1} = i) = \frac{\exp(\eta_{ij}^{(t)})}{\sum_{k=1}^N \exp(\eta_{ik}^{(t)})}, $$ where $\eta_{ii}$ is set to zero for $i = 1, \dots, N$ for identifiability and $N$ is the number of hidden states. We can do this computation with `LaMa` by providing a $T \times p$ covariate design matrix `Z` and an $N(N-1) \times (p+1)$ coefficient matrix `beta` to `tpm()`. The function then computes the associated transition matrix for each time point, which is returned as an array of dimension $N \times N \times T$. Note that `tpm()` handles the intercept automatically: if `Z` does not already contain a leading column of ones, it is added internally, so `Z` only needs to contain the actual covariates. ### Simulation example Let's start by simulating data from the above specified model, assuming $N = 2$ states and Gaussian state-dependent distributions. The covariate effects for the state process are fully specified by a parameter matrix `beta` of dimension `c(N*(N-1), p+1)`. By default the function `tpm()` will fill the off-diagonal elements of each transition matrix by column, which can be changed by setting `byrow = TRUE`. The latter is useful, as popular HMM packages like `moveHMM` or `momentuHMM` return the parameter matrix such that the t.p.m. needs to be filled by row. ```{r parameters} # parameters mu = c(5, 20) # state-dependent means sigma = c(4, 5) # state-dependent standard deviations # state process regression parameters beta = matrix(c(-2, -2, # intercepts -1, 1.5, # linear effects 0.25, -0.5), # quadratic effects nrow = 2) n = 1000 # number of observations set.seed(123) z = rnorm(n) Z = cbind(z, z^2) # quadratic effect of z Gamma = tpm(beta, Z) # of dimension c(2, 2, n) delta = c(0.5, 0.5) # non-stationary initial distribution # Let's plot the covariate effects that we simulated: z_p = seq(-2,2, length = 100) # prediction grid for covariate plot Z_p = cbind(z_p, z_p^2) # prediction matrix Gamma_p = tpm(beta, Z_p) plot(z_p, Gamma_p[1,2,], type = "l", lwd = 3, bty = "n", ylim = c(0,1), xlab = "z", ylab = expression(gamma[ij]), col = color[1]) lines(z_p, Gamma_p[2,1,], lwd = 3, col = color[2]) ``` Let's now simulate synthetic data from the above specified model. ```{r data} s = rep(NA, n) s[1] = sample(1:2, 1, prob = delta) # sampling first state from initial distribution for(t in 2:n){ # sampling next state conditional on previous one with tpm at that time point s[t] = sample(1:2, 1, prob = Gamma[s[t-1],,t]) } # sampling observations conditional on the states x = rnorm(n, mu[s], sigma[s]) plot(x[1:200], bty = "n", pch = 20, ylab = "x", col = c(color[1], color[2])[s[1:200]]) ``` ### Fitting an HMM to the data In order to fit the model to the simulated data, we specify the likelihood function, pretending to know the polynomial degree of the effect of $z$ on the transition probabilities, which in reality we would have to determine based on model selection criteria. Note that because the Markov chain is now inhomogeneous — its transition probabilities change over time — it does not have a single stationary distribution. We therefore cannot use `stationary()` to fix the initial distribution and instead have to estimate it. We do this by adding a single unconstrained parameter `logit_delta` to `par` and constructing the initial distribution from it inside the likelihood. ```{r mllk} nll_cov = function(par) { getAll(par, dat) Gamma = tpm(beta, Z) # unconstrained parameterisation of a 2-state initial distribution delta = c(1, exp(logit_delta)) # make > 0 delta = delta / sum(delta) # distribution -> needs to sum to 1 sigma = exp(log_sigma); REPORT(sigma); REPORT(mu) # calculate all state-dependent probabilities allprobs = matrix(1, length(x), N) for(j in 1:N) allprobs[,j] = dnorm(x, mu[j], sigma[j]) # forward algorithm -forward(delta, Gamma, allprobs) } ``` Using automatic differentiation with `RTMB`, we can -- very neatly -- specify `beta` as a parameter matrix in the desired shape. Model fitting is also extremely quick: ```{r model, warning=FALSE} par = list( beta = cbind(rep(-2, 2), matrix(0, 2, 2)), # initialising with slopes = 0 logit_delta = 0, # starting value for initial distribution mu = c(4, 14), # initial state-dependent means log_sigma = c(log(3), log(5)) # initial state-dependent sds ) dat <- list( x = x, Z = Z, N = 2 ) obj_cov = MakeADFun(nll_cov, par, silent = TRUE) system.time( opt <- nlminb(obj_cov$par, obj_cov$fn, obj_cov$gr) ) mod_cov <- report(obj_cov) # reporting from fitted model ``` Let's plot the fitted model, starting with the estimated state-dependent densities. We can obtain approximate global state probabilities from decoded states. ```{r mod_cov_densities} mod_cov$states = viterbi(mod = mod_cov) delta = prop.table(table(mod_cov$states)) # get state-dependent parameters mu = mod_cov$mu sigma = mod_cov$sigma hist(x, prob = TRUE, bor = "white", breaks = 20, main = "", ylim = c(0, 0.06)) curve(delta[1] * dnorm(x, mu[1], sigma[1]), add = TRUE, lwd = 3, col = color[1]) curve(delta[2] * dnorm(x, mu[2], sigma[2]), add = TRUE, lwd = 3, col = color[2]) curve(delta[1] * dnorm(x, mu[1], sigma[1]) + delta[2] * dnorm(x, mu[2], sigma[2]), add = TRUE, lwd = 3, lty = "dashed") legend("topright", col = c(color[1], color[2], "black"), lwd = 3, bty = "n", lty = c(1,1,2), legend = c("state 1", "state 2", "marginal")) ``` In order to plot the estimated effect with 95 percent confidence intervals, we can use `MCreport()` to get samples of `beta` from the approximate normal distribution of the MLE and then use `tpm()` to get the associated transition probabilities on our prediction grid. ```{r cov_visualization} samples = MCreport(obj_cov) # compute samples g12 = sapply(samples$beta, function(b) tpm(b, Z_p)[1,2,]) # gamma_12 g21 = sapply(samples$beta, function(b) tpm(b, Z_p)[2,1,]) # gamma_21 # compute quantiles g12.q = apply(g12, 1, quantile, probs = c(0.025, 0.975)) g21.q = apply(g21, 1, quantile, probs = c(0.025, 0.975)) # at MLE Gamma_p = tpm(mod_cov$beta, Z_p) # predicted tpm # plot fitted state-process regression plot(NA, bty = "n", ylim = c(0,1), xlim = c(-2,2), xlab = "z", ylab = "Transition probability") # confidence interval polygon(c(z_p, rev(z_p)), c(g12.q[1,], rev(g12.q[2,])), col = adjustcolor(color[1], 0.3), border = NA) # MLE lines(z_p, Gamma_p[1,2,], lwd = 3, col = color[1]) # confidence interval polygon(c(z_p, rev(z_p)), c(g21.q[1,], rev(g21.q[2,])), col = adjustcolor(color[2], 0.3), border = NA) # MLE lines(z_p, Gamma_p[2,1,], lwd = 3, col = color[2]) legend("top", col = color, legend = c(expression(gamma[12]), expression(gamma[21])), lwd = 3, bty = "n") ``` ## Covariate effects in the state-dependent process We now look at a setting where covariates influence the mean of the state-dependent distribution, while the state switching is controlled by a homogeneous Markov chain. This is often called **Markov-switching regression**. Assuming the observation process to be conditionally normally distributed, this means $$ X_t \mid S_t = j \sim N(\beta_j^\top z_t, \: \sigma_j^2), \quad j = 1, \dots, N. $$ ### Simulation example First we specify parameters for the simulation. The important change here is that `beta` now contains the regression coefficients for the state-dependent regressions. ```{r parameters2} sigma = c(1, 1) # state-dependent standard deviations (homoscedasticity) # parameter matrix # each row contains parameter vector for the corresponding state beta = matrix(c(8, 10, # intercepts -2, 1, 0.5, -0.5), # slopes nrow = 2) n = 1000 # number of observations set.seed(123) z = rnorm(n) Z = cbind(z, z^2) # quadratic effect of z Gamma = matrix(c(0.9, 0.1, 0.05, 0.95), nrow = 2, byrow = TRUE) # homogeneous t.p.m. delta = stationary(Gamma) # stationary Markov chain ``` ### Simulation In the simulation code, the state-dependent mean now is not fixed anymore, but changes according to the covariate values in `Z`. ```{r data2} s = x = rep(NA, n) s[1] = sample(1:2, 1, prob = delta) x[1] = rnorm(1, beta[s[1],]%*%c(1, Z[1,]), # state-dependent regression sigma[s[1]]) for(t in 2:n){ s[t] = sample(1:2, 1, prob = Gamma[s[t-1],]) x[t] = rnorm(1, beta[s[t],] %*% c(1, Z[t,]), # state-dependent regression sigma[s[t]]) } oldpar = par(mfrow = c(1,2)) plot(x[1:400], bty = "n", pch = 20, ylab = "x", col = c(color[1], color[2])[s[1:400]]) plot(z[which(s==1)], x[which(s==1)], pch = 16, col = color[1], bty = "n", ylim = c(0,15), xlab = "z", ylab = "x") points(z[which(s==2)], x[which(s==2)], pch = 16, col = color[2]) par(oldpar) ``` ### Writing the negative log-likelihood function In the likelihood function, we also add the state-dependent regression in the loop calculating the state-dependent probabilities. The code `cbind(1,Z) %*% t(beta)` computes all linear predictor for the states 1 and 2 at once. It thus returns a matrix of dimension $T \times 2$. ```{r mllk3} nll_msr = function(par) { getAll(par, dat) Gamma = tpm(eta) delta = stationary(Gamma) sigma = exp(log_sigma); REPORT(sigma) # calculate all state-dependent probabilities allprobs = matrix(1, length(x), 2) # state-dependent regression REPORT(beta) Mu = cbind(1,Z) %*% t(beta) # Z %*% beta[j,] all at once for(j in 1:2) allprobs[,j] = dnorm(x, Mu[,j], sigma[j]) # forward algorithm -forward(delta, Gamma, allprobs) } ``` ### Fitting a Markov-switching regression model ```{r model3, warning=FALSE} par = list( eta = rep(-2, 2), # starting values state process beta = cbind(c(10, 10), matrix(0, 2, 2)), # starting values for regression log_sigma = c(log(1), log(1)) # starting values for sigma ) dat = list( x = x, Z = Z ) obj_msr = MakeADFun(nll_msr, par, silent = TRUE) opt_msr = nlminb(obj_msr$par, obj_msr$fn, obj_msr$gr) mod_msr = report(obj_msr) ``` ### Visualising results To visualise the results, we transform the parameters to working parameters and add the two estimated state-specific regressions to the scatter plot. ```{r visualization3} Gamma_hat_reg = mod_msr$Gamma delta_hat_reg = mod_msr$delta sigma_hat_reg = mod_msr$sigma beta_hat_reg = mod_msr$beta # we have some label switching plot(z, x, pch = 16, bty = "n", xlab = "z", ylab = "x", col = color[s]) points(z, x, pch = 20) curve(beta_hat_reg[1,1] + beta_hat_reg[1,2]*x + beta_hat_reg[1,3]*x^2, add = TRUE, lwd = 4, col = color[2]) curve(beta_hat_reg[2,1] + beta_hat_reg[2,2]*x + beta_hat_reg[2,3]*x^2, add = TRUE, lwd = 4, col = color[1]) ``` # A special covariate: Seasonality A notable special case for covariate effects is called *seasonality*. For example, the transition probabilities might vary with the time of day or the day of year. Functions of such covariates need to fulfil the boundary condition that the values of the beginning and end of a cycle need to match. In practice, this is usually done by using a *trigonometric basis expansion* which, in its simplest form, for a cycle length of e.g. $24$, looks like: $$ \eta^{(t)}_{ij} = \beta_0^{(ij)} + \beta_{1}^{(ij)} \sin \bigl(\frac{2\pi \, \text{time}_t}{24} \bigr) + \beta_{2}^{(ij)} \cos\bigl(\frac{2 \pi \, \text{time}_t}{24}\bigr). $$ While this is a non-linear function in the variable *time*, it is still linear in the model parameters. Hence, the sine and cosine expressions can also be precomputed and stored in a design matrix. In `LaMa`, this can be done using the `cosinor()` function, e.g. ```{r cosinor} Z_tod = cosinor(1:24, period = 24) head(Z_tod) ``` If we now express the transition matrix as a function of the linear predictors $\eta^{(t)}_{ij}$ above, the Markov chain has a special property. Its transition probabilities are *periodically varying*, and hence for all $t$ $$ \Gamma^{(t+L)} = \Gamma^{(t)}, $$ where $L$ is the cycle length. As shown by @koslik2023inference, such chains exhibit a special periodic behaviour. In particular, they converge to a unique periodically stationary state distribution. This distribution, giving the probability of the process being at any of the states at a particular time point in the cycle, can be computed using the function `stationary_p()`. It expects a tpm array with third dimension equal to the cycle length and returns a matrix, where each row gives the distribution for that time point in the cycle. Let's try out everything we learned using the `trex` data set as an example. It has a time of day (`tod`) covariate: ```{r trex data set} head(trex, 6) ``` As usual, we set initial parameter values. In the dat list, we include the trigonometric design matrix as well as the integer time of day variable. ```{r parameters seasonal} par = list( log_mu = log(c(0.3, 1)), # initial means for step length log_sigma = log(c(0.2, 0.7)), # initial sds for step length beta = cbind(c(-2,-2), matrix(0, 2, 2)) # initial t.p.m. parameters ) dat = list( step = trex$step, # hourly step lengths N = 2, Z_tod = Z_tod, tod = trex$tod ) ``` In the likelihood function, we only compute the unique 24 transition matrices and then, based on these, the corresponding stationary distribution(s) using `stationary_p()`. When running the forward algorithm in the last line, we then have to i) pick the stationary distribution corresponding to the first time of day value (`Delta[tod[1],]`) and ii) build an array of tpms for the entire time series by indexing the 24 matrices using (`Gamma[,,tod]`). ```{r nll seasonal} nll_seas = function(par) { getAll(par, dat) Gamma = tpm(beta, Z_tod) Delta = stationary_p(Gamma) # periodically stationary distribution ADREPORT(Delta) # Parameter transformations for strictly positive parameters mu = exp(log_mu); REPORT(mu) sigma = exp(log_sigma); REPORT(sigma) # Calculating all state-dependent densities allprobs = matrix(1, nrow = length(step), ncol = N) ind = which(!is.na(step)) # only for non-NA obs. for(j in 1:N){ allprobs[ind,j] = dgamma2(step[ind], mu[j], sigma[j]) } -forward(Delta[tod[1],], Gamma[,,tod], allprobs) } ``` Let's fit the model: ```{r seasonal fit} obj_seas = MakeADFun(nll_seas, par, silent = TRUE) opt_seas = nlminb(obj_seas$par, obj_seas$fn, obj_seas$gr) sdr_seas = sdreport(obj_seas) mod_seas = report(obj_seas) ``` And look at the estimated periodically stationary distribution: ```{r p stationary} Delta = as.list(sdr_seas, "Est", report = TRUE)$Delta Delta_sd = as.list(sdr_seas, "Std", report = TRUE)$Delta # Only plot for state 2 (state 1 is just 1-delta_2) id = c(24, 1:24) # copy value at 24 to 0 for nicer plot plot(0:24, Delta[id,2], type = "b", col = "deepskyblue", bty = "n", lwd = 2, pch = 16, ylim = c(0,1), xlim = c(0, 24), xlab = "time of day", ylab = "Pr(state 2)", xaxt = "n") axis(1, at = c(0, 6, 12, 18, 24), label = c(0, 6, 12, 18, 24)) polygon(c(0:24, 24:0), c(Delta[id,2] + 2*Delta_sd[id,2], rev(Delta[id,2] - 2*Delta_sd[id,2])), col = adjustcolor("deepskyblue", 0.2), border = FALSE) ``` > Continue reading with [**Longitudinal data**](https://janolefi.github.io/LaMa/articles/Longitudinal_data.html) or [**Penalised splines**](https://janolefi.github.io/LaMa/articles/Penalised_splines.html). # References