--- title: "Genotype by Environment analysis using statgenGxE" author: "Bart-Jan van Rossum" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: false bibliography: bibliography.bib vignette: > %\VignetteIndexEntry{Genotype by Environment analysis using statgenGxE} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.dim = c(7, 4) ) library(statgenGxE) op <- options(width = 90) options("statgen.trialColors" = c("#9E0142FF", "#35B779FF", "#B4DE2CFF", "#006837FF", "#D53E4FFF")) ``` ## The statgenGxE package {-} The statgenGxE package is developed as an easy-to-use package for Genotype by Environment (GxE) analysis for data of plant breeding experiments with many options for plotting and reporting the results of the analyses. This vignette describes how to perform the different types of analysis that are available in the package. The availability of functions in the package is based on the analyses described in @Malosetti2013. Further suggested reading is @vanEeuwijk2016. The following types of analysis can be done using statgenGxE: * [Mixed model analysis of GxE table of means](#vcomp) * [Finlay-Wilkinson Analysis](#fw) * [AMMI Analysis](#am) * [GGE Analysis](#gge) * [Identifying mega environments](#me) * [Stability measures](#st) * [Modeling of heterogeneity of genetic variances and correlations](#vcov) For most of the analyses a pdf report can be created automatically, see [Reporting](#rep). Note that due to technical restrictions the number of significant digits printed in tables throughout this vignette is not always optimal. In practice precision of the output can always be specified by the user. --- ## Data preparation {#dp} The use of the package is demonstrated using maize data from the European Union project DROPS (https://cordis.europa.eu/project/id/244374). The data is available from https://doi.org/10.15454/IASSTN [@Millet2019] and the relevant data set is included as data.frame in the statgenGxE package. The first step is loading the data into R. ```{r loadData} data(dropsPheno) ``` dropsPheno contains data for the genotypic means (Best Linear Unbiased Estimators, BLUEs), with one value per genotype per experiment, for a selection of 10 experiments from the full data set and eight traits. These 10 experiments form a good representation of the full set of experiments covering the five scenarios described in @Millet2016. Throughout this vignette in all examples the trait grain.yield will be analyzed. For a more detailed description of the contents of the data see `help(dropsPheno)`. The input for GxE analysis in the statgenGxE package is an object of class TD (Trial Data). For a detailed description on how to construct such an object see the vignette of the statgenSTA package (`vignette("statgenSTA")`). For GxE analysis it is enough to specify the genotype and trial option of the `createTD` function. ```{r createTD} ## Create a TD object from dropsPheno. dropsTD <- statgenSTA::createTD(data = dropsPheno, genotype = "Variety_ID", trial = "Experiment") ``` Before doing any analysis, we can first have a look at the contents of the data. To explore heterogeneity of genetic variance we can create a box plot. Coloring the boxes by environmental scenario will provide valuable extra information. ```{r TDbox} ## Create a box plot of dropsTD. ## Color the boxes based on the variable scenarioFull. ## Plot in descending order. plot(dropsTD, plotType = "box", traits = "grain.yield", colorTrialBy = "scenarioFull", orderBy = "descending") ``` From the plot it is clear that the trials in the hot, water deficient environments have a lower median and range than the other trials. For further insight into the correlation structure between trials a scatter plot matrix can be made. ```{r TDscatter, fig.dim = c(8.5, 8.5)} ## Create a scatter plot of dropsTD. ## Color the genotypes based on the variable geneticGroup. ## Color the histograms for trials based on the variable scenarioFull. plot(dropsTD, plotType = "scatter", traits = "grain.yield", colorGenoBy = "geneticGroup", colorTrialBy = "scenarioFull", trialOrder = c("Gai12W", "Kar13R", "Kar12W", "Kar13W", "Mar13R", "Mur13W", "Mur13R", "Ner12R", "Cam12R", "Cra12R")) ``` ### Coloring of genotypes and trials In all plots the default colors for both genotype groups and trial groups are chosen from a predefined color palette. For genotype groups the color palette is "Dark 2", for trial groups it is "Alphabet". See [here](https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html) for an overview of these colors. It is possible to specify different colors for genotype groups and trial groups per plot using the options `colGeno` and `colTrial` respectively. Also, more conveniently, the default colors can be set using the options statgen.genoColors and statgen.trialColors. ```{r colorOpts, eval=FALSE} ## Set default colors for genotypes and trials. options("statgen.genoColors" = c("blue", "green", "yellow")) options("statgen.trialColors" = c("red", "brown", "purple")) ``` If a plot has more genotype groups than the number of colors specified as default colors, the default colors will be ignored and `topo.colors` will be used instead. For trial groups this is done in a similar way. --- ## Mixed model analysis of GxE table of means {#vcomp} To investigate the structure of the genotype by environment data various mixed models can be fitted. In the statgenGxE package this can be done using the `gxeVarComp` function. Six different types of models can be fitted depending on the structure of the environments in the data. These models are described in the table below, together with the function parameters used in `gxeVarComp` to fit the model. | Structure of environments | Model | Function parameters | |:-----------|:-------------------|:------------| | Environments correspond to trials | **trait** = trial + **genotype + genotype:trial** | | Trials form a factorial structure of locations x years | **trait** = year + location + year:location + **genotype + genotype:year + genotype:location + genotype:year:location** | `locationYear = TRUE` | | Trials are nested within year | **trait** = year + year:trial + **genotype + genotype:year + genotype:year:trial** | `nestingFactor = "year"` | | Trials are nested within locations | **trait** = location + location:trial + **genotype + genotype:location + genotype:location:trial** | `nestingFactor = "loc"` | | Trials correspond to locations within regions across years | **trait** = region + region:location + year + region:year + region:location:year + **genotype + genotype:region + genotype:region:location + genotype:year + genotype:region:year + genotype:region:location:year** | `regionLocationYear = TRUE` | | Trials are nested within scenarios | **trait** = scenario + scenario:trial + **genotype + genotype:scenario + genotype:scenario:trial** | `nestingFactor = "scenario"` | For data in the form of GxE means, the last random term in all models above will become a residual term. If the GxE means are provided together with weights, then a residual term will be added to the models above. Be aware that when plot data are provided as input data, the mixed model analysis will be based on the assumption of completely randomized trials, which in almost all cases will not be appropriate for multi-environment trials. All models can be fitted using either `lme4` or `asreml`. This can be specified using the `engine` parameter. For diagnostic purposes and to identify the main sources of variation, all models are fitted three times: 1. With all terms fixed 2. With all terms random 3. As a mixed model, with the **bold** terms as random and the others as fixed ### Model fitted with all terms as fixed {-} The function first fits a model where all model terms are included as fixed terms. Based on the ANOVA table of this model, terms in the fixed part of the model that are likely to give a problem when fitting the mixed model are removed because of the reduced connectivity and number of available observations to estimate that model term. Also a warning is printed if the mean sum of squares for a model term points to a possible zero variance component in the mixed model. ### Model fitted with all terms as random {-} Then a model is fitted where all model terms are included as random terms. Based on the variance components in this model the percentage of variance explained by each of the model components is determined. The percentages of variance are printed in the model summary, together with the variance components. The latter are presented on a standard deviation scale. ### Mixed model {-} Finally a mixed model is fitted as specified in the table above. Based on this model, variance components can be extracted, heritabilities on a line mean basis (across all trials) can be computed and predictions can be made. It is also possible to plot the results. ```{r geVarComp, message=FALSE, eval='requireNamespace("Matrix)&&packageVersion("Matrix")>"1.3.0"'} ## Fit a model where trials are nested within scenarios. dropsVarComp <- gxeVarComp(TD = dropsTD, trait = "grain.yield", nestingFactor = "scenarioFull") summary(dropsVarComp) ``` Note that because the model is fitted with lme4, in this and further tables no standard errors are outputted. When using asreml for modeling standard errors will be available. The diagnostics for the fitted model can be printed using the `diagnostics` function. This will print an incidence matrix of missing values for each of the terms in the random part of the fitted model. ```{r diag, eval=FALSE} ## Print diagnostics - output suppressed because of the large number of rows. diagnostics(dropsVarComp) ``` Extracting the variance components from the fitted model can be done using the `vc` function. The heritability (across trials) is computed using `herit`. ```{r vcHerit, R.options=list(digits=4), eval='requireNamespace("Matrix)&&packageVersion("Matrix")>"1.3.0"'} ## Extract variance components. vc(dropsVarComp) ## Compute heritability. herit(dropsVarComp) ``` A plot can be made of the square roots of the variance component estimates. These are based on the fully random model. ```{r VarCompPlot, eval='requireNamespace("Matrix)&&packageVersion("Matrix")>"1.3.0"'} ## Plot the results of the fitted model. plot(dropsVarComp) ``` Predictions for the mixed model can be made for model terms that represent different levels of data aggregation; for the genotype main effect or for genotypic performance in grouped environments, or for genotypic performance in individual trials. These levels can be specified by the parameter `predictLevel`. ```{r predict, R.options=list(digits=4), eval='requireNamespace("Matrix)&&packageVersion("Matrix")>"1.3.0"'} ## Predictions of the genotype main effect. predGeno <- predict(dropsVarComp) head(predGeno) ## predictions at the level of genotype x scenarioFull. predGenoTrial <- predict(dropsVarComp, predictLevel = "scenarioFull") head(predGenoTrial) ``` --- ## Finlay-Wilkinson Analysis {#fw} With the Finlay-Wilkinson Analysis [@Finlay1963] we describe genotype by environment interaction by the heterogeneity of the slopes of a regression of individual genotypic performance on an environmental index. The environmental index is the average of all genotypes in an environment. The intercept expresses general performance across all environments, the slope represents adaptability, and the residuals may indicate a measure for stability. The model fitted in the analysis is $y_{ij} = \mu + G_i + \beta_iE_j + \epsilon_{ij}$, where $y_{ij}$ is the phenotypic value of genotype $i$ in environment $j$, $\mu$ is the general mean, $G_i$ is the genotypic effect, $\beta_i$ a sensitivity parameters, $E_j$ the environment effect and $\epsilon_{ij}$ a residual. In the statgenGxE package this analysis can be done using the `gxeFW` function. The model described above is fitted using an alternating regression algorithm. First, using starting values for $\beta_i$ and $G_i$, $E_j$ is estimated. Next, $E_j$ is assumed known and $\beta_i$ and $G_i$ are estimated. This process is continued until convergence, i.e. until the change in $\beta_i$ between iterations is less then a specified tolerance (default 0.001). When estimating parameters, missing observations are estimated as well. By default all trials in the `TD` object are used in the analysis, but this can be restricted using the parameter `trials`. The genotypes included in the analysis can be restricted using `genotypes`. ```{r geFW, R.options=list(digits=6)} ## Perform a Finlay-Wilkinson analysis for all trials. dropsFW <- gxeFw(TD = dropsTD, trait = "grain.yield") summary(dropsFW) ``` Four types of plots can be made to investigate the output from the analysis. `plotType = "scatter"` creates three scatter plots where genotypic mean, square root of the mean squared deviation and sensitivity are plotted against each other. ```{r plotFWScatter, fig.width=5, fig.height=4} ## Create scatter plot for Finlay Wilkinson analysis. ## Color genotypes by geneticGroup. plot(dropsFW, plotType = "scatter", colorGenoBy = "geneticGroup") ``` With `plotType = "line"` a plot with fitted lines for all genotypes in the analysis is created. ```{r plotFWLine, fig.width=5, fig.height=4} ## Create line plot for Finlay Wilkinson analysis. ## Color genotypes by geneticGroup. plot(dropsFW, plotType = "line", colorGenoBy = "geneticGroup") ``` `plotType = "trellis"` creates a trellis plot with observations and slopes per genotype. At most 64 genotypes are plotted. It is possible to select a subset of genotypes for plotting using the parameter `genotypes`. ```{r plotFWTrellis, fig.width=5, fig.height=4} ## Create trellis plot for Finlay Wilkinson analysis. ## Restrict to first 5 genotypes. plot(dropsFW, plotType = "trellis", genotypes = c("11430", "A3", "A310", "A347", "A374")) ``` `plottType = "scatterFit"` creates a scatter plot of fitted values in the trial with the highest environmental effect against the fitted values in the trial with the lowest environmental effect. ```{r plotFWScatterFit, fig.width=5, fig.height=4} ## Create scatter plot of fitted values for Finlay Wilkinson analysis. ## Color genotypes by geneticGroup. plot(dropsFW, plotType = "scatterFit", colorGenoBy = "geneticGroup") ``` --- ## AMMI Analysis {#am} The Additive Main Effects and Multiplicative Interaction (AMMI) model fits a model which involves the Additive Main effects (i.e. genotype and trial) along with Multiplicative Interaction effects. The additive effects are the classical ANOVA main effects for genotype and environment, the multiplicative effects follow from a principal component analysis on the interaction residuals (= genotype by environment means after adjustment for additive genotype and environment effects). This results in an interaction characterized by Interaction Principal Components (IPCA) enabling simultaneous plotting of genotypes and trials. If the data contains missing values, those are imputed first using an iterative regression algorithm. This algorithm regresses each environment in turn on all others. This process is repeated until the difference between the fitted values in subsequent iterations becomes sufficiently small (see `help(multMissing)` for further details). After imputation the model fitted in the analysis is $y_{ij} = \mu + G_i + E_j + \Sigma_{m =1}^M\gamma_{mi}\delta_{mj} + \epsilon_{ij}$, where $M$ is the number of principal components, $y_{ij}$ is the phenotypic value of genotype $i$ in environment $j$, $\mu$ is the general mean, $G_i$ is the genotypic effect, $E_j$ the environmental effect, $\gamma_{mi}$ the genotypic scores, $\delta_{mj}$ the environmental scores, and $\epsilon_{ij}$ a residual. The AMMI analysis can be performed with the statgenGxE package using the function `gxeAmmi`. ```{r geAmmi, R.options=list(digits=6)} ## Run gxeAmmi for grain.yield. dropsAm <- gxeAmmi(TD = dropsTD, trait = "grain.yield") summary(dropsAm) ``` With the default settings in the AMMI analysis a maximum of two principal components are used. This can be changed using the parameter `nPC` in the function. The number of principal components should be lower than the smallest of the number of genotypes or environments minus one. By specifying `nPC = NULL` the algorithm will determine the number of principal components by a method of forward selection. ```{r geAmmi2, R.options=list(digits=6)} ## Run gxeAmmi. Let algorithm determine number of principal components. dropsAm2 <- gxeAmmi(TD = dropsTD, trait = "grain.yield", nPC = NULL) summary(dropsAm2) ``` From the summary it is clear that the final number of principal components, determined by forward selection, is seven. It is possible to exclude certain genotypes, e.g. outliers, from the analysis using the parameter `excludeGeno`. ```{r geAmmi3} ## Run gxeAmmi with three principal components. ## Exclude genotypes 11430 and A3. dropsAm3 <- gxeAmmi(TD = dropsTD, trait = "grain.yield", nPC = 3, excludeGeno = c("11430", "A3")) ``` If the data contains a column year, it is possible to fit a separate AMMI model for each year. This can be done by specifying the parameter `byYear = TRUE`. This will run an AMMI analysis for all years in the data without taking into account the data for the other years. ```{r geAmmiYear} ## Run gxeAmmi per year in the data. dropsAmYear <- gxeAmmi(TD = dropsTD, trait = "grain.yield", byYear = TRUE) ``` The results of the AMMI analysis can be displayed in two types of plot. "AMMI1" plots the first principal component against the main effects. "AMMI2" plots the second against the first principal component. ```{r plotAmmi1, fig.width=5, fig.height=5, out.width="75%"} ## Create an AMMI1 plot. plot(dropsAm, plotType = "AMMI1") ``` AMMI biplots appear in three forms [@Gower1996]: 1) genotype scaling, with emphasis on approximation of relations between genotypes (scale = 0) 2) environment scaling, with emphasis on approximation of relations between environments (scale = 1) 3) symmetric scaling, a compromise (scale = 0.5) ```{r plotAmmi2, fig.width=5, fig.height=5, out.width="75%"} ## Create an AMMI2 biplot with symmetric scaling. plot(dropsAm, scale = 0.5, plotType = "AMMI2") ``` The AMMI plot function has many options to customize the plot. It is possible to plot different principal components on the axes specifying `primAxis` and `secAxis`. Genotypes can be grouped and colored by a variable in the data using `colorGenoBy`. Specify custom colors in `colGeno`. Similarly for coloring environments use `colorEnvBy` and `colEnv`. ```{r plotAmmiCol, fig.width=5, fig.height=5, out.width="75%"} ## Create an AMMI2 biplot. ## Color genotypes based on variable geneticGroup. Use custom colors. ## Color environments based on variable scenarioFull. plot(dropsAm, scale = 0.4, plotType = "AMMI2", colorGenoBy = "geneticGroup", colGeno = c("red", "blue", "green", "yellow"), colorEnvBy = "scenarioFull") ``` A convex hull can be plotted around the genotypes in an AMMI2 biplot. ```{r plotAmmiConvHull, fig.width=5, fig.height=5, out.width="75%"} ## Create an AMMI2 biplot with convex hull around the genotypes. plot(dropsAm, scale = 0.4, plotType = "AMMI2", plotConvHull = TRUE, colorEnvBy = "scenarioFull") ``` For easier interpretation of the results shown in the biplots, it is possible to rotate the whole plot to a configuration specified by the user. This can be done by setting `rotatePC` to an environment or a genotype that should be aligned with the positive x-axis in the plot. ```{r plotAmmiRotatePC, fig.width=5, fig.height=5, out.width="75%"} ## Create an AMMI2 biplot. ## Align environment Mur13W with the positive x-axis. plot(dropsAm, scale = 0.4, plotType = "AMMI2", colorEnvBy = "scenarioFull", rotatePC = "Mur13W") ``` Genotypes can be left out of the plot completely by setting `plotGeno = FALSE` and similarly `plotEnv = FALSE` assures no environments are plotted. For displaying genotypes by their names instead of points, use `sizeGeno` with a size larger than zero to indicate their text size. `envFactor` can be used to blow up the environmental scores in the plot. A value for `envFactor` between zero and one effectively blows up the genotypic scores. See `help(plot.AMMI)` for full details on all plotting options. --- ## GGE Analysis {#gge} A Genotype plus Genotype by Environment (GGE) analysis is very similar to an AMMI analysis. The difference is in the first step where, instead of genotype and environment, only environment is fitted as a main effect in the model. Therefore, the principal component analysis is performed on the genotype main effect and the interaction jointly. In essence, the GGE analysis fits a principal components model with two components to the two-way genotype by environment table with data centered per environment. The model fitted in the GGE analysis is $y_{ij} = \mu + E_j + \gamma_{1i}\delta_{1j} + \gamma_{2i}\delta_{2j} + \epsilon_{ij}$, where the parameters are as described in the AMMI analysis. The GGE analysis is done in statgenGxE by running the function `gxeGGE` which has identical options as the `gxeAmmi` function. ```{r geGGE, R.options=list(digits=6)} ## Run gxeGGE with default settings. dropsGGE <- gxeGGE(TD = dropsTD, trait = "grain.yield") summary(dropsGGE) ``` Options for plotting results of a GGE analysis are identical to those for an AMMI analysis. When a convex hull is plotted, lines from the origin perpendicular to the edges of the hull are added as well. With the default settings the plot is rotated such that the positive x-axis aligns with the genotypic main effect, the average of the directions of all environments. By doing this the y-axis shows the GxE. ```{r plotGGE, fig.width=5, fig.height=5, out.width="75%"} ## Create a GGE2 biplot. plot(dropsGGE, plotType = "GGE2") ``` --- ## Identifying mega environments {#me} For the computation of mega environments, an AMMI-2 model is fitted and then, using the fitted values from this model, the environments are clustered. Mega environments are created by grouping environments based on their best performing genotype. Environments that share the same best genotype belong to the same mega environment. ```{r geMegaEnv, R.options=list(digits=5)} ## Compute mega environments. dropsMegaEnv <- gxeMegaEnv(TD = dropsTD, trait = "grain.yield") ## Summarize results. summary(dropsMegaEnv) ``` As can be seen in the column Mega factor in the summary, four mega environments have been created, one for each of the winning genotypes "Lo1087", "Lo1251", "Lo1270" and "PHG35". The values for the Best Linear Unbiased Predictors (BLUPs) and associated standard errors for the genotypes based on the calculated mega environments, can be computed using the function `predict` on the output. This can be done using either lme4 [@Bates2015] or asreml [@Gilmour2017] as an engine for fitting the model. ```{r geMegaEnvPred, R.options=list(digits=5)} if (requireNamespace(package = "asreml", quietly = TRUE)) { ## Compute BLUPs. ## Use asreml as engine for fitting model. geMegaEnvPred <- predict(dropsMegaEnv, engine = "asreml") ## Display BLUPs. head(geMegaEnvPred$predictedValue) } if (requireNamespace(package = "asreml", quietly = TRUE)) { ## Display standard errors of the BLUPs. head(geMegaEnvPred$standardError) } ``` The results of the predictions can be visualized in a scatter plot using the `plot` functions. This creates a similar plot to the one described in the [data preparation](#dp) section earlier in the vignette. However, now the predicted values for the genotypes in the different mega environments are displayed. The predictions are calculated inside the plot functions. By default lme4 is used for this. To change to asreml specify `engine = asreml`. Genotypes can be colored by the value of a variable in the TD object used for calculating the mega environments by specifying `colorGenoBy`. ```{r scatterMegaEnv} if (requireNamespace(package = "asreml", quietly = TRUE)) { ## Create a scatter plot of predictions in mega environments. ## Color genotypes based on geneticGroup. plot(dropsMegaEnv, engine = "asreml", colorGenoBy = "geneticGroup") } ``` --- ## Stability measures {#st} Different measures of stability can be calculated using the statgenGxE package, the cultivar-superiority measure of Lin & Binns [@Lin1988], Shukla's stability variance [@Shukla1972] and Wricke's ecovalence [@Wricke1962]. The cultivar-superiority is a function of the sum of the squared differences between a cultivar's mean and the best cultivar's mean, where the sum is across trials.. Genotypes with the smallest values of the superiority tend to be more stable, and closer to the best genotype in each environment. Shukla's stability variance (static stability) is defined as the variance around the genotype's phenotypic mean across all environments. This provides a measure of the consistency of the genotype, without accounting for performance. Wricke's Ecovalence Stability Coefficient is the contribution of each genotype to the GxE sum of squares, in an unweighted analysis of the GxE means. A low value indicates that the genotype responds in a consistent manner to changes in environment; i.e. is stable from a dynamic point of view. To compute the stability measures use `gxeStability`. This will compute all three stability measures described above. ```{r geStab, R.options=list(digits=6)} ## Compute stability measures for dropsTD. dropsStab <- gxeStability(TD = dropsTD, trait = "grain.yield") ## In the summary print the top two percent of the genotypes. summary(dropsStab, pctGeno = 2) ``` Plotting the results yields a scatter plot for the square root of each stability measure, plotted against the genotypic mean. ```{r plotStab} ## Create plots for the different stability measures. ## Color genotypes by geneticGroup. plot(dropsStab, colorGenoBy = "geneticGroup") ``` --- ## Modeling of heterogeneity of genetic variances and correlations{#vcov} The function `gxeVarCov` allows to fit linear mixed models of varying complexity to a set of trials without clear structure. When the trials can be organized by locations, years, regions or scenarios, then the mixed models in section [Mixed model analysis of GxE table of means](#vcomp) are more appropriate. The objective of this function is to identify a model for the genetic variances and correlations across the trials that will help in quantifying genetic similarities between trials and in understanding the structure of genotype by environment interactions. The identified variance-covariance structure can be used as the random background in a multi-environment QTL model. Models can be fitted using either lme4 or asreml as modeling engine. If the engine is set to 'lme4', only a model considering multiple variance components, but with a compound symmetry variance-covariance structure can be fitted. If the engine is set to 'asreml', complex variance-covariance structures will be fitted sequentially to the genotype by environment data. The best fitting model is then identified. The models fitted are of the form $y_{ij} = \mu_j + \epsilon_{ij}$, where $y_{ij}$ is the phenotypic value of genotype $i$ in environment $j$, $\mu_j$ is the environmental mean, and $\epsilon_{ij}$ represents mainly genetic variation, although some non-genetic variation may be included as well. The random term $\epsilon_{ij}$ is modeled in eight ways as described in the table below (see [@Boer2007]). | Model | Description | var($g_{ij}$) | cov($g_{ij}$;$g_{ik}$) | Number of parameters| |:-----------|:-------------------|:------------|:------------|:-----------------| | identity | identity | $\sigma_{GE}^2$ | 0 | 1 | | cs | compound symmetry | $\sigma_G^2 + \sigma_{GE}^2$ | $\sigma_G^2$ | 2 | | diagonal | diagonal matrix (heteroscedastic) | $\sigma_{GE_j}^2$ | 0 | $J$ | | hcs | heterogeneous compound symmetry | $\sigma_G^2+\sigma_{GE_j}^2$ | $\sigma_G^2$ | $J+1$ | | outside | heterogeneity outside | $\sigma_{G_j}^2$ | $\theta$ | $J+1$ | | fa | first order factor analytic | $\lambda_{1j}^2+\sigma_{GE_j}^2$ | $\lambda_{1j}\lambda_{1k}$ | $2J$ | | fa2 | second order factor analytic | $\lambda_{1j}^2+\lambda_{2j}^2+\sigma_{GE_j}^2$ | $\lambda_{1j}\lambda_{1k}+\lambda_{2j}\lambda_{2k}$ | $3J-1$ | | unstructured | unstructured | $\sigma_{G_j}^2$ | $\sigma_{G_{j,k}}^2$ | $J(J+1)/2$ | In this table $J$ is the number of environments, $\sigma_G^2$ the variance component for the genotype main effects, $\sigma_{GE}^2$ the variance component for GxE interactions. $\sigma_{G_j}^2$ and $\sigma_{GE_j}^2$ are the environment specific variance components for the genotype main effects and GxE interaction in environment $j$. $\sigma_{G_{j,k}}^2$ is the genetic covariance between environments $j$ and $k$. $\theta$ is the common correlation between environments and $\lambda_{1j}$ and $\lambda_{2j}$ are environment specific multiplicative parameters. It is possible to to select a subset of the models that will be fitted by specifying the `models` parameter in the `gxeVarCov` function. The best model for the data is selected based on either the Akaike Information Criterion (AIC [@Akaike1974]) or the Baysian Information Criterion (BIC [@Schwarz1978]). Which criterion is used is determined by the parameter `criterion` in the function `gxeVarCov`. ```{r geVClme} ## Use lme4 for fitting the models - only compound symmetry. dropsVC <- gxeVarCov(TD = dropsTD, trait = "grain.yield") summary(dropsVC) ``` ```{r geVCasreml} ## Use asreml for fitting the models - eight models fitted. ## Use AIC as criterion for determining the best model. if (requireNamespace("asreml", quietly = TRUE)) { dropsVC2 <- gxeVarCov(TD = dropsTD, trait = "grain.yield", engine = "asreml", criterion = "AIC") summary(dropsVC2) } ``` `r if (exists("dropsVC2")) {"As becomes clear from the summary, the best model based on AIC is the model with an unstructured variance-covariance structure. This indicates that GxE is present in the form of heterogeneity of genetic variances and correlations.." }` Note that for both factor analytic models to be fitted a minimum of five environments is needed. If the data contains less environments, those two models are skipped. A heat map of the correlation between the environments based on the best fitted model can be plotted. It will be instructive to compare the fitted correlations of this section with the 'observed' correlations in section [Data preparation](#dp) and the GGE biplot in section [GGE analysis](#gge). Comparisons with the mixed models in section [Mixed model analysis of GxE table of means](#vcomp) are also relevant. ```{r geVCPlot} if (requireNamespace("asreml", quietly = TRUE)) { plot(dropsVC2) } ``` In the plot correlations between environments are shown. A dark red color indicates a strong positive correlation between environments, a dark blue color a strong negative correlation. Environments are clustered by their correlations and ordered according to the results of the clustering. --- ## Fitted values and residuals Fitted values and residuals for the models fitted using AMMI, GGE, Finlay-Wilkinson, and the selected variance-covariance model can be extracted using the functions `fitted` and `residuals` respectively. These can be useful for investigating the quality of the fitted model by e.g. creating diagnostic plots of fitted values against residuals. ```{r fitRes} ## Extract the fitted values and residuals for the Finlay-Wilkinson model. fitFW <- fitted(dropsFW) resFW <- residuals(dropsFW) ## Create a diagnostic plot of fitted values against residuals. fitResFW <- merge(fitFW, resFW, by = c("trial", "genotype")) ggplot2::ggplot(fitResFW, ggplot2::aes(x = fittedValue, y = residual, color = trial)) + ggplot2::geom_point() ``` --- ## Reporting{#rep} For most of the analyses described in this vignette a pdf report can be made using the `report` function. Such a report typically consists of an overview of the analysis and parameters used and the most interesting results and plots for the analysis. Except for this pdf report also the .tex file used to generate this report and a folder containing the figures in the report, are generated. This will make it easy to adapt the report to the user's specific needs. Note that all reports are created using pdflatex. If no installation is found by `Sys.which("pdflatex")` the report function will generate an error. For the analysis done in the vignette the reports below can be made. Specify a file with .pdf extension as `outfile` to set the location where the report is written. ```{r reports, eval=FALSE} ## Create a report for the Finlay Wilkinson analysis. report(dropsFW, outfile = "./myReports/FWReport.pdf") ## Create a report for the AMMI analysis. report(dropsAm, outfile = "./myReports/AMMIReport.pdf") ## Create a report for the GGE analysis. report(dropsGGE, outfile = "./myReports/GGEReport.pdf") ## Create a report for the stability analysis. report(dropsStab, outfile = "./myReports/stabReport.pdf") ## Create a report for the analysis of two-way GxE tables. report(dropsVC2, outfile = "./myReports/varCompReport.pdf") ``` ```{r winddown, include = FALSE} options(op) ``` ---- ## References