--- title: "4. Logit models relaxing the iid hypothesis" bibliography: ../inst/REFERENCES.bib format: html: toc: true html-math-method: mathjax vignette: > %\VignetteEngine{quarto::html} %\VignetteIndexEntry{4. Logit models relaxing the iid hypothesis} %\VignetteEncoding{UTF-8} --- ```{r } #| echo: false oopts <-options(width = 80) ``` In the previous section, we assumed that the error terms were iid (identically and independently distributed), i.e., uncorrelated and homoscedastic. Extensions of the basic multinomial logit model have been proposed by relaxing one of these two hypothesis while maintaining the hypothesis of a Gumbel distribution. ## The heteroskedastic logit model The heteroskedastic logit model was proposed by @BHAT:95. The probability that $U_l>U_j$ is: $$ P(\epsilon_j gaze() waldtest(hl.MC) |> gaze() ``` The Wald test can also be computed using the `lht` function from the `car` package: ```{r} #| label: "homoscedasticity tests: Wald test" #| collapse: true car::lht(hl.MC, c('sp.air = 1', 'sp.train = 1')) |> gaze() ``` For the score test, we provide the constrained model as argument, which is the standard multinomial logit model and the supplementary argument which defines the unconstrained model, which is in this case `heterosc = TRUE`. ```{r} #| label: "homoscedasticity tests: score test" #| collapse: true scoretest(ml.MC, heterosc = TRUE) |> gaze() ``` The homoscedasticity hypothesis is strongly rejected using the Wald test, but only at the 1 and 5% level for, respectively, the score and the likelihood ratio tests. ### JapaneseFDI @HEAD:MAYE:04 analyzed the choice of one of the 57 European regions belonging to 9 countries by Japenese firms to implement a new production unit. ```{r} #| label: "loading the JapaneseFDI data set" jfdi <- dfidx(JapaneseFDI, idx = c("firm", country = "region"), drop.index = FALSE) ``` There are two sets of covariates: the wage rate `wage`, the unemployment rate `unemp`, a dummy indicating that the region is eligible to European funds `elig` and the area `area` are observed at the regional level and are therefore relevant for the estimation of the lower model, whereas the social cotisation rate `scrate` and the corporate tax rate `ctaxrate` are observed at the country level and are therefore suitable for the upper model. We first estimate a multinomial logit model: ```{r} #| label: "multinomial logit for JapaneseFDI" ml.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate + ctaxrate | 0, data = jfdi) ``` Note that, as the covariates are only alternative specific, the intercepts are not identified and therefore have been removed. We next estimate the lower model, which analyses the choice of a region within a given country. Therefore, for each choice situation, we estimate the choice of a region on the subset of regions of the country which has been chosen. Moreover, observations concerning Portugal and Ireland are removed as these two countries are mono-region. ```{r} #| label: "lower model estimation" lm.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) | 0, data = jfdi, subset = country == choice.c & ! country %in% c("PT", "IE")) ``` We next use the fitted lower model in order to compute the inclusive value, at the country level: $$ \mbox{iv}_{ig} = \ln \sum_{j \in B_g} e^{\beta^\top x_{ij}}, $$ where $B_g$ is the set of regions for country $g$. When a grouping variable is provided in the `dfidx` function, inclusive values are by default computed for every group $g$ (global inclusive values are obtained by setting the `type` argument to `"global"`). By default, `output` is set to `"chid"` and the results is a vector (if `type = "global"`) or a matrix (if `type = "region"`) with row number equal to the number of choice situations. If `output` is set to `"obs"`, a vector of length equal to the number of lines of the data in long format is returned. The following code indicates different ways to use the `logsum` function: ```{r} #| label: "use of the logsum function" #| collapse: true lmformula <- formula(lm.fdi) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "group"), 2) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "global")) head(logsum(ml.fdi, data = jfdi, formula = lmformula, output = "obs")) head(logsum(ml.fdi, data = jfdi, formula = lmformula, type = "global", output = "obs")) ``` To add the inclusive values in the original `data.frame`, we use `output = "obs"` and the `type` argument can be omitted as its default value is `"group"`: ```{r} #| label: "adding the logsum to the data" JapaneseFDI$iv <- logsum(lm.fdi, data = jfdi, formula = lmformula, output = "obs") ``` We next select the relevant variables for the estimation of the upper model, select unique lines in order to keep only one observation for every choice situation / country combination and finally we coerce the response (`choice.c`) to a logical for the chosen country. ```{r} #| label: "data suitable for the upper model" JapaneseFDI.c <- subset(JapaneseFDI, select = c("firm", "country", "choice.c", "scrate", "ctaxrate", "iv")) JapaneseFDI.c <- unique(JapaneseFDI.c) JapaneseFDI.c$choice.c <- with(JapaneseFDI.c, choice.c == country) ``` Finally, we estimate the upper model, using the previously computed inclusive value as a covariate. ```{r} #| label: "estimation of the upper model" jfdi.c <- dfidx(JapaneseFDI.c, choice = "choice.c", idnames = c("chid", "alt")) um.fdi <- mlogit(choice.c ~ scrate + ctaxrate + iv | 0, data = jfdi.c) ``` If one wants to obtain different `iv` coefficients for different countries, the `iv` covariate should be introduced in the 3th part of the formula and the coefficients for the two mono-region countries (Ireland and Portugal) should be set to 1, using the `constPar` argument. ```{r} #| label: "upper model with different iv coefficients" um2.fdi <- mlogit(choice.c ~ scrate + ctaxrate | 0 | iv, data = jfdi.c, constPar = c("iv:PT" = 1, "iv:IE" = 1)) ``` We next estimate the full-information maximum likelihood nested model. It is obtained by adding a `nests` argument to the `mlogit` function. This should be a named list of alternatives (here regions), the names being the nests (here the countries). More simply, if a group variable has been indicated while using `dfidx`, `nests` can be a boolean. Two flavors of nested models can be estimated, using the `un.nest.el` argument which is a boolean. If `TRUE`, one imposes that the coefficient associated with the inclusive utility is the same for every nest, which means that the degree of correlation inside each nest is the same. If `FALSE`, a different coefficient is estimated for every nest. ```{r} #| label: "nested logit models" nl.fdi <- mlogit(choice ~ log(wage) + unemp + elig + log(area) + scrate + ctaxrate | 0, data = jfdi, nests = TRUE, un.nest.el = TRUE) nl2.fdi <- update(nl.fdi, un.nest.el = FALSE, constPar = c('iv:PT' = 1, 'iv:IE' = 1)) ``` Results are presented below: ```{r} #| label: nlogit #| echo: false models <- list('Mult. logit' = ml.fdi, 'Upper model' = um.fdi, 'Upper model' = um2.fdi, 'Nested logit' = nl.fdi, 'Nested logit' = nl2.fdi) trms <- unique(Reduce("c", lapply(models, function(x) names(coef(x))))) z <- Reduce("cbind", lapply(models, function(x) coef(x)[trms])) dimnames(z) <- list(trms, names(models)) print(z, digits = 3, na.print = "-") ``` For the nested logit models, two tests are of particular interest: - the test of no nests, which means that all the nest elasticities are equal to 1, - the test of unique nest elasticities, which means that all the nest elasticities are equal to each other. y.paras For the test of no nests, the nested model is provided as the unique argument for the `lrtest` and the `waldtest` function. For the `scoretest`, the constrained model (i.e., the multinomial logit model) is provided as the first argument and the second argument is `nests`, which describes the nesting structure that one wants to test. ```{r} #| label: "test of no nests" #| collapse: true lrtest(nl2.fdi) |> gaze() waldtest(nl2.fdi) |> gaze() scoretest(ml.fdi, nests = TRUE, constPar = c('iv:PT' = 1, 'iv:IE' = 1)) |> gaze() ``` The Wald test can also be performed using the `lht` function: ```{r} #| label: "test of no nests with linhyp" #| collapse: true car::lht(nl2.fdi, c("iv:BE = 1", "iv:DE = 1", "iv:ES = 1", "iv:FR = 1", "iv:IT = 1", "iv:NL = 1", "iv:UK = 1")) |> gaze() ``` The three tests reject the null hypothesis of no correlation. We next test the hypothesis that all the nest elasticities are equal. ```{r} #| label: "computing the test for equal iv coefficients" #| collapse: true lrtest(nl2.fdi, nl.fdi) |> gaze() waldtest(nl2.fdi, un.nest.el = TRUE) |> gaze() scoretest(ml.fdi, nests = TRUE, un.nest.el = FALSE, constPar = c('iv:PT' = 1, 'iv:IE' = 1)) |> gaze() car::lht(nl2.fdi, c("iv:BE = iv:DE", "iv:BE = iv:ES", "iv:BE = iv:FR", "iv:BE = iv:IT", "iv:BE = iv:NL", "iv:BE = iv:UK")) |> gaze() ``` Once again, the three tests strongly reject the hypothesis. ## Bibliography ```{r } #| echo: false options(oopts) ```