Imaginings of a Livestock Geneticist

Using Preconditioned Conjugate Gradient (PCG)
to Predict Breeding Values

The ability to efficiently generate breeding values for large animal models in a timely manner is of paramount importance to a breeding program. As described in a previous section, a fast method to create the inverse of the numerator relationship matrix was developed in the 70's, which made it possible to use a traditional animal model in genetic evaluations. Outlined below is a traditional linear mixed model: $$ \mathbf{y} = \mathbf{Xb} + \mathbf{Za} + \mathbf{e} $$ , where y is a vector of phenotypes, b is a vector of fixed effects, a is a vector of random animal effects, e is a vector of random residuals, X is an incidence matrix for the fixed effects and Z is an incidence matrix for the random effects. In the model outlined above four classes of variables, referred to as \(\theta\), need to be estimated and include b and a which pertains to the unique fixed and random effects along with the animal (\(\sigma_a^2\)) and residual (\(\sigma_e^2\)) variance. In the current setting we will assume the variance components are already estimated and we only need to estimate the fixed parameters (e.g. environmental noise) and predict the random effects (e.g. estimated breeding values). To generate solutions for b and a, the mixed model equations (MME; e.g. \(k_a\) = \(\sigma_e^2\) / \(\sigma_a^2\)) needs to be set up, which is outlined below:

$$ \begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + A^{-1}k_a \end{bmatrix} = \begin{bmatrix} X'y \\ Z'y \end{bmatrix} $$ $$ LHS = RHS $$

One could generate the solutions brute force by taking the inverse of the LHS and multiplying that by the RHS. Direct inversion becomes intractable for routine evaluations with millions of animals in the pedigree. It should be emphasized, that setting up the mixed model equations doesn't require any brute force matrix inversions. Furthermore, in the current implementation, the matrix is stored in memory and doesn't take into account that the matrix is symmetric and is sparse (e.g. lots of zeros) when a pedigree-based relationship matrix is used. In routine evaluations carried out by breeding programs a technique called 'iteration on data' is used. The iteration on data technique doesn't set up and store the MME in memory, which allows for multiple trait models and complex models to become feasible for routine evaluations. Nevertheless, a solver that is surprisingly simple to code, called preconditioned conjugate gradient (PCG), can be used to efficiently generation solutions. Once you understand PCG, then all the tricks outlined previously can be incorporated into the algorithm.

The algorithm is comprised of storing four vectors, each being the number of unknowns parameters to solve for (e.g. fixed and random solutions). The for vectors include the residuals, solutions to the unknown parameters, a search direction vector, and a working vector. The algorithm's name comes from the "preconditioner matrix", which in most cases is just the inverse of the diagonal of LHS. More complicated preconditioner matrices can be used in order, to make convergence faster but in most cases a simpler preconditioner matrix is satisfactory. Outlined below is the code for the PCG algorithm. The algorithm stops when the change from the current to the previous iteration is smaller than a certain value.

The following data files that contain pedigree and phenotype information can be utilized to confirm that the PCG solutions are equivalent to the brute force direct inversion. The information was taken from the Geno-Diver software and the simulated additive and residual variance were 0.25 and 0.75, respectively.

References