Site Regression model — GGEmodel • geneticae

The Site Regression model (also called genotype + genotype-by-environment (GGE) model) is a powerful tool for effective analysis and interpretation of data from multi-environment trials in breeding programs. There are different functions in R to fit the SREG model, such as the GGEModel from the GGEBiplots package. However, this function has the following improvements:

Includes recently published robust versions of the SREG model (Angelini et al., 2022).
It can be used for data from trials with repetitions (there is no need to calculate means beforehand).
Other variables not used in the analysis can be present in the dataset.

GGEmodel(
  Data,
  genotype = "gen",
  environment = "env",
  response = "yield",
  rep = NULL,
  model = "SREG",
  SVP = "symmetrical"
)

Arguments

Data: dataframe with genotypes, environments, repetitions (if any) and the phenotypic trait of interest. Additional variables that will not be used in the model may be present in the data.
genotype: column name for genotypes.
environment: column name for environments.
response: column name for the phenotypic trait.
rep: column name for replications. If this argument is NULL, there are no replications in the data. Defaults to NULL.
model: method for fitting the SREG model: `"SREG"`,`"CovSREG"`,`"hSREG"` or `"ppSREG"` (see References). Defaults to `"SREG"`.
SVP: method for singular value partitioning. Either `"row"`, `"column"`, or `"symmetrical"`. Defaults to `"symmetrical"`.

Value

A list of class GGE_Model containing:

model: SREG model version.
coordgenotype: plotting coordinates for each genotype in every component.
coordenviroment: plotting coordinates for each environment in every component.
eigenvalues: vector of eigenvalues for each component.
vartotal: overall variance.
varexpl: percentage of variance explained by each component.
labelgen: genotype names.
labelenv: environment names.
axes: axis labels.
Data: scaled and centered input data.
SVP: name of SVP method.

A biplot of class ggplot

Details

A linear model by robust regression using an M estimator proposed by Huber (1964, 1973) fitted by iterated re-weighted least squares, in combination with three robust SVD/PCA procedures, resulted in a total of three robust SREG alternatives. The robust SVD/PCA considered were:

CovSREG: robust PCA that is obtained by replacing the classical estimates of location and covariance by their robust analogues using Minimum Regularized Covariance Determinant (MRCD) approach;
hSREG: robust PCA method that tries to combine the advantages of both approaches, PCA based on a robust covariance matrix and based on projection pursuit;
ppSREG: robust PCA that uses the projection pursuit and directly calculates the robust estimates of the eigenvalues and eigenvectors without going through robust covariance estimation. It is a very attractive method for bigdata situations, which are very common in METs (a few genotypes tested in a large number of environments), as the principal components can be calculated sequentially.

References

Julia Angelini, Gabriela Faviere, Eugenia Bortolotto, Gerardo Domingo Lucio Cervigni & Marta Beatriz Quaglino (2022) Handling outliers in multi-environment trial data analysis: in the direction of robust SREG model, Journal of Crop Improvement, DOI: 10.1080/15427528.2022.2051217

Examples

 library(geneticae)

 # Data without replication
 library(agridat)
 data(yan.winterwheat)
 GGE1 <- GGEmodel(yan.winterwheat, genotype="gen", environment="env", response="yield")

 # Data with replication
 data(plrv)
 GGE2 <- GGEmodel(plrv, genotype = "Genotype", environment = "Locality",
                  response = "Yield", rep = "Rep")