--- title: "Time Series Tests" output: rmarkdown::html_vignette: css: custom.css code_folding: hide citation_package: natbib toc: yes bibliography: tests.bib bibliography-style: apalike natbiboptions: round vignette: > %\VignetteIndexEntry{Time Series Tests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, comment = "##", R.options = list(width = 60) ) ``` ```{r packages,warning=FALSE,message=FALSE} library(tstests) library(tsdistributions) library(tsgarch) library(flextable) library(xts) data("spy") data("arma_forecast") data("garch_forecast") spyr <- na.omit(diff(log(spy))) ``` # Introduction {#sec-introduction} The `tstests` package provides a number of different statistical tests for evaluating the goodness of fit of time series models as well as their out of sample predictions. The table below summarizes the tests currently implemented, together with the reference paper and type of test. The tests are broadly categorized as Wald [W], Likelihood Ratio [LR], Hausman [H], Lagrange Multiplier [LM] and Portmanteau [P].
| Test | Function | Reference | Type| |:---------------------------------------|:------------------|:--------------------------------------------|----------:| | GMM Orthogonality Test | gmm_test | [@Hansen1982] | [W]| | Nyblom Parameter Constancy Test | nyblom_test | [@Nyblom1989] | [LM]| | Sign Bias Test | signbias_test | [@Engleng1993] | [W]| | Berkowitz Forecast Density Test | berkowitz_test | [@Berkowitz2001] | [LR]| | Hong-Li Non Parametric Density Test. | hongli_test | [@Hong2005] | [P]| | Directional Accuracy Tests | dac_test | [@Pesaran1992],
[@Anatolyev2005] | [H]| | Mincer-Zarnowitz Test | minzar_test | [@Mincer1969] | [W]| | Expected Shortfall Test | shortfall_de_test | [@Du2017] | [LR]| | Value at Risk Test | var_cp_test | [@Christoffersen1998],
[@Christoffersen2004] | [LR]| : Package Tests
One of the key features of the package includes customization of the test output using the `as_flextable` method to generate non-console compatible and nicely formatted tables. The next section provides an overview of the tests together with examples. # Goodness of Fit Tests ## GMM Orthogonality The Generalized Method of Moments framework of @Hansen1982 provides for a set of orthogonality conditions for testing the specification of a model. These conditions test for how well a model captures a set of lower to higher order moments and co-moments by testing the following hypothesis: $$ \begin{aligned} E&\left[z_t\right]=0,\quad E\left[z_t^2-1\right]=0\\ E&\left[z_t^3\right]=0,\quad E\left[z_t^3\right]-3=0\\ E&\left[\left(z^2_t-1\right)\left(z_{t-j}^2-1\right)\right]=0,\quad j=1,\ldots,m\\ E&\left[\left(z^3_t\right)\left(z_{t-j}^3\right)\right]=0,\quad j=1,\ldots,m\\ E&\left[\left(z^4_t-3\right)\left(z_{t-j}^4-3\right)\right]=0,\quad j=1,\ldots,m\\ \end{aligned} $$ where $z_t$ are the standardized residuals from an estimated model, and $m$ the number of lags used for testing the hypothesis. Define the moment (M) generating conditions as: $$ g_T\left(\theta\right) = \frac{1}{T}\sum^T_{t=1}M_t\left(\theta\right) $$ with $\theta$ being the parameter vector. For large T we expect that $g_T\left(\theta\right)$ converges to $E\left[M_t\left(\theta\right)\right]$, and should be equal to zero under a correctly specified model. The GMM estimator of $\theta$ is obtained by minimizing: $$ g_T\left(\theta\right)'S^{-1}g_T\left(\theta\right) $$ where S is the weighting matrix and defined as $$ S = \frac{1}{T}\sum^T_{t=1}M_t\left(\theta\right)M_t\left(\theta\right)'. $$ which turns out to be the asymptotic covariance matrix.^[It is also possible to correct this estimator for serial correlation as suggested by @Hamilton2020, but we leave this for future investigation.] The test is conducted on individual moment conditions as a t-test with $T-1$ degrees of freedom, on the co-moment conditions as a Wald test^[Asymptotically distributed as $\chi^2$.] with $j$ degrees of freedom, and joint tests on the co-moment conditions and all conditions ($J$^[All joint tests are denoted by the capital letter $J$ in the package.]) as Wald tests with $m$ and $4+3m$ degrees of freedom, respectively. As an example, we use the SPY log returns to estimate an exponential GARCH (2,1) model with Johnson's SU distribution, and test all these conditions using the `gmm_test`. ```{r gmmtest} spec <- garch_modelspec(spyr, model = "egarch", order = c(2,1), constant = TRUE, distribution = "jsu") mod <- estimate(spec) skewness <- dskewness("jsu", skew = coef(mod)["skew"], shape = coef(mod)["shape"]) kurtosis <- dkurtosis("jsu", skew = coef(mod)["skew"], shape = coef(mod)["shape"]) + 3 test <- gmm_test(residuals(mod, standardize = TRUE), lags = 2, skewness = skewness, kurtosis = kurtosis, conf_level = 0.95) print(test) ``` None of the p-values are below 5%, therefore we fail to reject the null hypothesis. Notice that the joint conditions (Q2-Q4 and J) have no Mean or Std.Error as these are vector valued. It is also possible to create a nicer looking table, with symbols, using [flextable](https://cran.r-project.org/package=flextable), and we also include the test paper reference in the footnote. ```{r} as_flextable(test, include.decision = FALSE, footnote.reference = TRUE) ``` ## Non Parametric Density Test @Hong2005 introduced a nonparametric portmanteau test, building on the work of @Ait1996, which tests the joint hypothesis of i.i.d **and** $U(0,1)$ for the sequence $x_t$. As noted by the authors, testing $x_t$ using a standard goodness-of-fit test (such as the Kolmogorov-Smirnov) would only check the $U(0,1)$ assumption under i.i.d. and not the joint assumption of $U(0,1)$ and i.i.d. Their approach is to compare a kernel estimator $\hat g_j\left(x_1,x_2\right)$ for the joint density $g_j\left(x_1,x_2\right)$ of the pair $\left\{ {x_t,x_{t-j}} \right\}$ (where $j$ is the lag order) with unity, the product of two $U(0,1)$ densities. Given a sample size $n$ and lag order $j>0$, their joint density estimator is: $$ {{\hat g}_j}\left( {{x_1},{x_2}} \right) \equiv {\left( {n - j} \right)^{ - 1}}\sum\limits_{t = j + 1}^n {{K_h}\left( {{x_1},{{\hat X}_t}} \right){K_h}\left( {{x_2},{{\hat X}_{t - j}}} \right)} $$ where ${{\hat X}_t} = {X_t}\left( {\hat \theta } \right)$, and $\hat \theta$ is a $\sqrt n$ consistent estimator of $\theta_0$. The function $K_h$ is a boundary modified kernel defined as: $$ K_{h}\left({x,y}\right)\equiv\left\{{\matrix{ {h^{-1}k\left({{{x-y}\over{h}}}\right)/\int_{-\left({x/h}\right)}^{1}{k\left({u}\right)du},\quad \textbf{if}\quad x\in\left[{0,h}\right)}\cr {h^{-1}k\left({{{x-y}\over{h}}}\right),\quad \textbf{if}\quad x\in\left[{h,1-h}\right]}\cr {h^{-1}k\left({{{x-y}\over{h}}}\right)/\int_{-1}^{\left({1-x}\right)/h}{k\left({u}\right)du,\quad \textbf{if}\quad x\in\left({1-h,1}\right]}}\cr }}\right. $$ where $h\equiv h\left(n \right)$ is a bandwidth such that $h\rightarrow 0$ as $n\rightarrow \infty$, and the kernel $k(.)$ is a pre-specified symmetric probability density, which is implemented as suggested by the authors using a quartic kernel, $$ k\left( u \right) = \frac{{15}}{{16}}{\left( {1 - {u^2}} \right)^2}{\bf{1}}\left( {\left| u \right| \le 1} \right), $$ where $\bf{1}\left(.\right)$ is the indicator function. Their portmanteau test statistic is defined as: $$ \hat W\left(p \right) \equiv {p^{ - 1/2}}\sum\limits_{j = 1}^p {\hat Q\left( j \right)}, $$ where $$ \hat Q\left( j \right) \equiv {{\left[ {\left( {n - j} \right)h\hat M\left( j \right) - A_h^0} \right]} \mathord{\left/ {\vphantom {{\left[ {\left( {n - j} \right)h\hat M\left( j \right) - A_h^0} \right]} {V_0^{1/2}}}} \right.} {V_0^{1/2}}}, $$ and $$ \hat M\left( j \right) \equiv \int_0^1 {\int_0^1 {{{\left[ {{{\hat g}_j}\left( {{x_1},{x_2}} \right) - 1} \right]}^2}d{x_1}d{x_2}} }. $$ The centering and scaling factors $A_h^0$ and $V_0$ are defined as: $$ \begin{array}{l} A_h^0 \equiv {\left[ {\left( {{h^{ - 1}} - 2} \right)\int_{ - 1}^1 {{k^2}\left( u \right)du + 2\int_0^1 {\int_{ - 1}^b {k_b^2\left( u \right)dudb} } } } \right]^2} - 1\\ {V_0} \equiv 2{\left[ {\int_{ - 1}^1 {{{\left[ {\int_{ - 1}^1 {k\left( {u + v} \right)k\left( v \right)dv} } \right]}^2}du} } \right]^2} \end{array} $$ where, $$ {k_b}\left( . \right) \equiv {{k\left( . \right)} \mathord{\left/{\vphantom {{k\left( . \right)} {\int_{ - 1}^b {k\left( v \right)dv} }}} \right.} {\int_{ - 1}^b {k\left( v \right)dv} }}. $$ Under the correct model specification, the authors show that $\hat W\left( p \right)\rightarrow N\left(0,1\right)$ in distribution. Because negative values of the test statistic only occur under the Null Hypothesis of a correctly specified model, the authors indicate that only upper tail critical values need be considered. The test is quite robust to model misspecification as parameter uncertainty has no impact on the asymptotic distribution of the test statistic as long as the parameters are $\sqrt n$ consistent. Finally, in order to explore possible causes of misspecification when the statistic rejects a model, the authors develop the following test statistic: $$ M\left({m,l}\right)\equiv\left[{\sum\limits_{j=1}^{n-1}{w^{2}\left({j/p}\right)\left({n-j}\right){\rm {\hat\rho}}_{ml}^{2}\left({j}\right)-\sum\limits_{j=1}^{n-1}{w^{2}\left({j/p}\right)}}}\right]/{\left[{2\sum\limits_{j=1}^{n-2}{w^{4}\left({j/p}\right)}}\right]}^{1/2} $$ where ${{\hat \rho }_{ml}}\left( j \right)$ is the sample cross-correlation between $\hat X_t^m$ and $\hat X_{t - \left| j \right|}^l$, and $w\left(.\right)$ is a weighting function of lag order j, and as suggested by the authors implemented as the Bartlett kernel. As in the $\hat W\left( p \right)$ statistic, the asymptotic distribution of $M\left( {m,l} \right)$ is $N\left(0,1\right)$ and upper critical values should be considered. We consider an example below using the SPY dataset and a short sub-sample. Note that the critical value are the upper quantiles of a standardized normal distribution and we use a confidence level of 95% for this example, hence, based on the way the test is conducted, we reject the null of the statistic if it is higher than this value (`qnorm(0.95)`). Based on the example shown, we fail to reject the null of a correctly specified model for this short sub-sample, but caution that using the full sample would have led to an outright rejection, raising questions about parameter stability (next section) and/or structural breaks in the series. ```{r honglitest} spec <- garch_modelspec(spyr[1:1000], model = "egarch", order = c(1,1), constant = TRUE, distribution = "jsu") mod <- estimate(spec) z <- residuals(mod, standardize = TRUE) p <- pdist("jsu", z, mu = 0, sigma = 1, skew = coef(mod)["skew"], shape = coef(mod)["shape"]) as_flextable(hongli_test(as.numeric(p), lags = 4, conf_level = 0.95), include.decision = T, footnote.reference = TRUE) ``` ## Parameter Constancy The parameter constancy tests of @Nyblom1989 and @Hansen1992 are Lagrange multiplier parameter stability tests. The test of @Nyblom1989 tests the null hypothesis that all parameters are constant against the alternative that they follow a martingale process, whilst that of @Hansen1992 tests the null hypothesis of constancy for individual parameters. The distribution of the statistic is non-standard, and related to the distribution of squares of independent Brownian bridges which has the following series expansion^[From equation 3.3 of @Nyblom1989.]: $$ \sum^{\infty}_{k=1}\frac{1}{\left(\pi k\right)^2}\chi^2_{k}\left(p\right) $$ where $p$ are the number of parameters. @Hansen1992 provides a table with critical values for up to 20 parameters based on simulation which has been used extensively as a reference point when conducting this test. Instead, the `tstests` package uses an internal dataset generated by simulation for up to 40 parameters in addition to a kernel smoothed density in order to generate the p-values directly. It should be noted that neither the individual nor joint tests provide any information about the location of breaks, and the distribution is only valid for stationary data. To illustrate the use of the test, we continue with the same model as in the previous section, and turn on the argument to provide guidance on whether to reject the null hypothesis at the 5% level of significance. ```{r} spec <- garch_modelspec(spyr, model = "egarch", order = c(2,1), constant = TRUE, distribution = "jsu") mod <- estimate(spec) test <- nyblom_test(residuals(mod, standardize = TRUE), scores = estfun(mod), parameter_names = names(coef(mod)), parameter_symbols = mod$parmatrix[estimate == 1]$symbol) as_flextable(test, use.symbols = TRUE, footnote.reference = TRUE, include.decision = TRUE) ``` The table indicates that individually we can reject the null of parameter constancy on most of the parameters at the 5% level as well as jointly (**J**). ## Sign Bias The sign bias test of @Engleng1993 evaluates the presence of leverage effects in the standardized residuals (to capture possible misspecification of a GARCH model), by regressing the squared standardized residuals on lagged negative and positive shocks as follows: $$ z^2_t = c_0 + S^{-}_{t-1} + c_2 S^{-}_{t-1}\varepsilon_{t-1} + c_3 S^{+}_{t-1}\varepsilon_{t-1} + u_t $$ where $S^{-}_{t-1} = I_{\varepsilon_{t-1}<0}$, $S^{+}_{t-1} = I_{\varepsilon_{t-1}\ge0}$. and $\varepsilon_t$ the model residuals. The null hypothesis is that all coefficients $\{c_1,c_2,c_3\}$ are individually and jointly zero. The joint test is a Wald test distributed as $\chi^2(3)$. The table below shows the output of this test on the residuals of the model used so far, where the p-values indicate that we cannot reject the null of no sign bias either individually or jointly. ```{r} test <- signbias_test(residuals(mod), sigma = sigma(mod)) as_flextable(test, use.symbols = TRUE, footnote.reference = TRUE) ``` # Forecast Evaluation Tests ## Density Forecast A novel method to analyze how well a conditional density fits the underlying data is through the probability integral transformation (**PIT**) discussed in @Rosenblatt1952 and defined as: $$ {x_t} = \int\limits_{ - \infty }^{{y_t}} {\hat f\left( u \right)du = \hat F\left( {{y_t}} \right)} $$ which transforms the data $y_t$, using the estimated distribution $\hat F$ into i.i.d. $U(0,1)$ under the correctly specified model. Because of the difficulty in testing a $U(0,1)$ sequence, the PIT data ($x_t$) is transformed into $N(0,1)$ by @Berkowitz2001 using the normal quantile ($\Phi^{-1}$) function: $$ z_t = \Phi^{-1}\left(x_t\right) $$ Under the null of a correctly forecast density the values ($z_t$) should be independent across observations, with mean zero, unit variance and non-significant autoregressive terms (zero). For example, the following first order autoregressive dynamics can be tested: $$ z_t = \mu + \rho\left(z_{t-1} - \mu\right) + \varepsilon_t $$ as the unrestricted equation in a likelihood ratio (LR) test against the restricted equation where $\mu=0$, $\sigma=1$ and $\rho=0$. The LR test is then: $$ \mathrm{LR} = -2\left(L\left(0,1,0\right) - L\left(\hat\mu,\hat\sigma,\hat\rho\right)\right) $$ where $L$ is the log-likelihood. The LR test statistic is asymptotically distributed as $\chi^2(2+m)$ where $m$ are the number of autoregressive lags. A final testing step can also be included which looks at the goodness of fit of the transformed series into the Normal. Following @Dowd2004, the output includes the Normality test of @Jarque1987. We use the GARCH based pre-computed backtest^[Based on the SPY log returns data using a GARCH(1,1)-JSU model (see documentation for details and replication code).] in the package for the example: ```{r} p <- pdist('jsu', q = garch_forecast$actual, mu = garch_forecast$forecast, sigma = garch_forecast$sigma, skew = garch_forecast$skew, shape = garch_forecast$shape) as_flextable(berkowitz_test(p)) ``` Both the p-values of the LR and Jarque-Bera tests are greater than 5% so at this level we fail to reject the null hypothesis, and find the density forecast adequately meets the requirements of these tests. ## Directional Accuracy High and significant Directional Accuracy could imply either an ability to predict the sign of the mean forecast or could merely be the result of volatility dependence in the absence of mean predictability as argued by @Christoffersen2006. @Pesaran1992 provide a non-parametric test to evaluate the magnitude and significance of the forecast's directional accuracy. Let $y_t$ denote the actual series and $x_t$ the forecast of that series, then the null hypothesis is that the two are independent (the forecast cannot predict the actual). This is a Hausman type test with statistic: $$ S_n = \frac{ P - P_*}{\sqrt{\left[V\left(P\right) - V\left(P_*\right)\right]}} $$ where: $$ P = \frac{1}{n} \sum^n_{t=1}H\left(y_t x_t\right) $$ representing the proportion of times that x correctly predicts y, with $H(.)$ being the Heaviside function taking values of 1 if the signs of the forecast and actual agree and 0 otherwise. Further, $$ P_* = P_y P_x + \left(1 - P_y\right)\left(1 - P_x\right) $$ with $P_y$ and $P_x$ the proportion of positive cases in y and x respectively. The equation represents the expected proportion of times that x would correctly predict y, under the assumption that they are independently distributed (the null). This difference between realized proportion and expected proportion under the null forms the basis of the Hausman test statistic. The denominator standardizes this difference in proportions by the difference in the variance of the two proportions, with: $$ V\left(P\right) = \frac{1}{n}P_*\left(1 - P_*\right) $$ and $$ \begin{aligned} V\left(P_*\right) =& \frac{1}{n}\left(2 P_y-1\right)^2 P_x\left(1- P_x\right) + \\ & \frac{1}{n}\left(2 P_x-1\right)^2 P_y\left(1- P_y\right)+\\ & \frac{4}{n^2} P_y P_x\left(1 - P_y\right)\left(1 - P_x\right) \end{aligned} $$ The statistic $S_n$ is asymptotically distributed as $N\left(0,1\right)$. While @Pesaran1992 test for sign predictability, @Anatolyev2005 test for mean predictability based on normalized excess profitability implied by a directional strategy which goes long or short depending on the sign of the forecast. One of the key differences between these two tests relates to the observed asymmetric nature of $y_t$, which means that it may matter when $x_t$ fails to predict $y_t$ if during those times the loss (or missed gain) is high enough to make the strategy relatively unprofitable despite having significant directional accuracy. Consider the 1-period return of the trading strategy: $$ r_t = \textrm{sign}\left(x_t\right)y_t $$ with expected return $$ A_n = \frac{1}{n}\sum^n_{t=1}r_t $$ A benchmark strategy which issues buy and sell signals at random with proportions of buys and sells the same as in the trading strategy, has the following expected return: $$ B_n = \left(\frac{1}{n}\sum^n_{t=1}\textrm{sign}\left(x_t\right)\right)\left(\frac{1}{n}\sum^n_{t=1}y_t\right) $$ Under the null of conditional mean independence, then $E\left[y_t|I_{t-1}\right] = \mathrm{constant}$. If the strategy has predictive power, it will generate higher returns on average than the benchmark, and $A_n - B_n$ will be greater than zero and significant. To test the significance of this difference, it needs to be normalized by the the variance of this difference ($V$): $$ V = \frac{4}{n^2} p_y\left(1 - p_y\right)\sum^n_{t=1}\left(y_t- y\right)^2 $$ where $p_y=\frac{1}{2}\left(1 + \frac{1}{n}\sum^n_{t=1}\mathrm{sign}\left(y_t\right)\right)$. Similar to the test of @Pesaran1992, this is also a Hausman test with statistic $\frac{A_n - B_n}{\sqrt(V)}$, asymptotically distribution as $N\left(0,1\right)$. To illustrate we use an ARMA(1,1) pre-computed backtest forecast available in the package: ```{r} as_flextable(dac_test(arma_forecast$actual, arma_forecast$forecast)) ``` The p-values for both tests in the table above suggest that we fail to reject the null of no predictability, with high confidence. ## Forecast Unbiasedness To understand the properties of a good forecast, we start by considering what an optimal forecast should look like. Consider the Wald representation of $y$, at horizon h: $$ y_{n+h} = \mu + \epsilon_{n+h} + b_1\epsilon_{n+h-1} + b_2\epsilon_{n+h-2} + \ldots $$ The optimal h-step forecast is then $$ y_{n+h|n} = \mu + b_h\epsilon_n + b_{h+1}\epsilon_{n-1} + \ldots $$ with optimal forecast error $$ \epsilon_{n+h|n} = \epsilon_{n+h} + b_1\epsilon_{n+h-1} + b_2\epsilon_{n+h-2} + \ldots + b_{h-1}\epsilon_{n+1} $$ and variance of the h-step forecast error increasing in h. The optimal forecast error is unbiased with $E\left[\epsilon_{n+h|n}\right] = 0$, with the h-step forecast errors having at most an MA(h-1) structure and the 1-step ahead forecast error $\epsilon_{n+1|n} = \epsilon_{n+h}$ being white noise. This implies that : $$ \epsilon_{n+h|n} = \alpha + \beta y_{n+h|n} + \varepsilon_{n+h} $$ will have $\alpha=0$ and $\beta=0$. Since $$ \epsilon_{n+h|n} = y_{n+h} - y_{n+h|n} $$ then, $$ y_{n+h|n} = \alpha + \beta x_{n+h|n} + \varepsilon_{n+h|n} $$ can be tested under the null of unbiasedness with $\alpha=0$ and $\beta=1$ using a Wald test. This is the essence of the regression test by @Mincer1969. It effectively tests for forecast bias, though it does not say anything about forecast variance. The table below shows the output of the test using a 15-step ahead forecast on a chosen subsample of SPY log returns from an ARMA(15,0) model. We fail to reject the NULL of $\alpha=0$ and $\beta=1$ given the results of the Wald test with Pr(>Chisq) = 0.96. ```{r,warning=FALSE,message=FALSE} spyr <- na.omit(diff(log(spy))) mod <- arima(as.numeric(spyr[2000:2500]), order = c(15,0,0), transform.pars = TRUE, include.mean = TRUE) p <- predict(mod, n.ahead = 15) test <- minzar_test(actual = as.numeric(spyr[2501:2515]), forecast = as.numeric(p$pred)) as_flextable(test, footnote.reference = T, digits = 2) ``` ## Expected Shortfall The test of @Du2017 combines ideas from @Berkowitz2001 and @Christoffersen1998 to create an unconditional and conditional shortfall test based on the probability integral transformed realizations, conditioned on the forecast distribution, to evaluate the severity and independence of the shortfall residuals. ## The Unconditional Test The unconditional test assesses the expected value of the cumulative violations beyond the Value at Risk (VaR) threshold, using a two-sided t-test, with the following statistic: $$ U_{ES} = \frac{\sqrt{n}\left(\bar H\left(\alpha\right) - \frac{\alpha}{2}\right)}{\sqrt{\alpha\left(\frac{1}{3} - \frac{\alpha}{4}\right)}} $$ where the term $\frac{\alpha}{2}$ is the expected value under a correctly calibrated risk model. $\bar H\left(\alpha\right)$ denotes the sample mean of $H_t\left(\alpha\right)$: $$ \bar H\left(\alpha\right)=\frac{1}{n}\sum^n_{t=1}H_t\left(\alpha\right) $$ with $H_t\left(\alpha\right)$ the cumulative failures (violations beyond VaR) such that: $$ H_t\left(\alpha\right) = \frac{1}{\alpha}\left(\alpha - u_t\right)\mathrm{I}\left(u_t\le\alpha\right). $$ The vector $u$ is the probability integral transformation of the future realization given the forecast distribution. The distribution of the test statistic $U_{ES}$ is asymptotically distributed as $N(0,1)$. Since the statistic is bounded in the unit interval, the confidence intervals need to be constrained to the be between [0,1]. Therefore, the p-value will take the following form: $$ Pr(>|t|) = 2 \min{\left(\mathrm{Pr}\left[|U_{ES}|\le x\right],1-\mathrm{Pr}\left[|U_{ES}|\le x\right]\right)} $$ ## The Conditional Test The conditional test assesses not only the correctness and significance of the expected value of the cumulative failures but also their independence. Consider the auto-covariance of the cumulative violations for lag j: $$ \gamma_j = \frac{1}{n-j}\sum^n_{t=j+1} \left(H_t - \frac{\alpha}{2}\right)\left(H_{t-j}-\frac{\alpha}{2}\right). $$ The autocorrelation is then equal to: $$ \rho_j = \frac{\gamma_j}{\gamma_0} $$ and the test statistic for m lags is: $$ C_{ES} = n\sum^m_{j=1}\rho^2_j $$ which is asymptotically distributed as $\chi^2_m$, and is independent of $\alpha$. ## Example The following example, using the pre-computed GARCH backtest data, highlights the approach to using the `shortfall_de_test` function. ```{r,warning=FALSE,message=FALSE} # the pit data x <- pdist("jsu", q = garch_forecast$actual, mu = garch_forecast$forecast, sigma = garch_forecast$sigma, skew = garch_forecast$skew, shape = garch_forecast$shape) test <- shortfall_de_test(x, alpha = 0.05, lags = 4) as_flextable(test, footnote.reference = T) |> width(j = 1:3,width = 1.2) ``` ## Value at Risk: Coverage and Duration The Value at Risk (VaR) tests in the `tstests` package are based on the proportion of failures test of @Kupiec1995, the conditional coverage test of @Christoffersen1998, and the duration between failures test of @Christoffersen2004. These are summarized in the next sections. ## Proportion of Failures (`UC`) The test of @Kupiec1995 assesses whether the proportion of failures is consistent with the expected proportion of failures at a given level of VaR. This is done by summing the number of violation (binary) and dividing by the length of the forecasts.^[Under the assumption of i.i.d observations, the sequence of failures is distributed as Bernoulli($\alpha$).] Under the null hypothesis that the proportion of failures ($\pi$) is equal to the VaR level ($\alpha$), then the restricted model likelihood is: $$ \mathrm{L}_r = \left(1 - \alpha\right)^{n}\alpha^{k} $$ and the unrestricted (observed) model likelihood: $$ \mathrm{L}_u = \left(1 - \pi\right)^{n}\pi^{k} $$ where $\pi=\frac{n}{\left(n + k\right)}$, $n$ are the number of observations, and $k$ the number of failures. A likelihood ratio test can then be conducted with statistic: $$ \mathrm{LR}_{uc} = -2\log{\frac{\mathrm{L}_r}{\mathrm{L}_u}} $$ which is distributed as $\chi^2_1$. ## Independence of Failures (`CCI`) The proportion of failures (or unconditional) test of @Kupiec1995 tests the coverage of the interval but has no power in detecting whether they are independent. @Christoffersen1998 proposed a test to explicitly check the independence assumption against a first order Markov alternative. Consider a binary, first order Markov chain with transition probability matrix: $$ \Pi_1 = \begin{bmatrix} 1-\pi_{01} & \pi_{01}\\ 1-\pi_{11} & p_{11} \end{bmatrix} $$ where $\pi_{ij}=Pr\left(I_t=j|I_{t-1}-i\right)$, and $I_t$ is the indicator function denoting failures. The approximate likelihood of this process is then $$ \mathrm{L}_u = \left(1-\pi_{01}\right)^{n_{00}}\pi_{01}^{n_{01}}\left(1-\pi_{11}\right)^{n_{10}}\pi_{11}^{n_{11}} $$ where $n_{ij}$ is the number of observation with value i followed by j. For example, $n_{10}$ represents the number of periods with failures followed by periods with no failures. For the restricted model under the null of independence, the first order Markov chain has the following transition probability matrix: $$ \Pi_1 = \begin{bmatrix} 1-\pi_2 & \pi_2\\ 1-\pi_2 & \pi_2\\ \end{bmatrix} $$ with the following likelihood: $$ \mathrm{L}_r = \left(1 - \pi_2\right)^{n_{00} + n_{10}}\pi_2^{n_{01} + n_{11}} $$ Finally, the likelihood ratio for the test of independence can be expressed as: $$ \mathrm{LR}_{cci} = -2\log\frac{L_r}{L_u} $$ which is asymptotically distributed as $\chi^2\left(1\right)$. ## Conditional Coverage (`CC`) The likelihood ratio for the joint test of coverage and independence is simply the sum of the Kupiec coverage and independence likelihood ratios: $$ \mathrm{LR}_{cc} = \mathrm{LR}_{uc} + \mathrm{LR}_{cci} $$ which is asymptotically distributed as $\chi^2\left(2\right)$. ## Duration between Failures (`D`) The conditional coverage independence test in the previous section has limited power in detecting temporal dependence beyond the simple first order Markov structure. To provide a more powerful test @Christoffersen2004 proposed a more general structure using the duration of time between VaR failures (the no-hit duration), defined as : $$ D_i = t_i - t_{i-1} $$ where $t_i$ denotes the time index of violation number $i$. Under the null hypothesis of a correctly specified risk model, this no-hit duration ($D$) should have no memory with a mean duration of $\frac{1}{\alpha}$ periods. Since the only continuous memory free random distribution is the exponential, then under the null hypothesis the distribution of the no-hit duration should be: $$ f_{\mathrm{exp}}\left(D;\alpha\right) = p\exp\left(-\alpha D\right). $$ In order to test for this, an encompassing distribution which allows for duration dependence and also embeds the exponential is required. The Weibull distribution offers one such example, with distribution: $$ f_{\mathrm{W}}\left(D;a,b\right) = a^bbD^{b-1}\exp\left(-\left(aD\right)^b\right). $$ The exponential is a special case when $b=1$. Therefore, the null hypothesis of a memory-less duration process corresponds to $b=1$ and $a=\alpha$, which can be tested using a likelihood ratio test distributed as $\chi^2\left(1\right)$.^[The actual implementation requires calculating the duration of the hit sequence, the censored observations and combining all this to construct the log-likelihood which needs to be solved using numerical methods for the unrestricted likelihood.] ## Example The table below shows the 4 tests for VaR, with p-values well above 10% indicating a correctly specified model for this series during the out of sample period tested. ```{r,warning=FALSE,message=FALSE} q <- qdist("jsu", p = 0.05, mu = garch_forecast$forecast, sigma = garch_forecast$sigma, skew = garch_forecast$skew, shape = garch_forecast$shape) test <- var_cp_test(actual = garch_forecast$actual, forecast = q, alpha = 0.05) as_flextable(test, footnote.reference = T) |> width(j = 1:3,width = 1.2) ``` # References