Imaginings of a Livestock Geneticist

Combined Genomic and Pedigree Matrix (H matrix)

Prior to the mid 2000's pedigree information served as the traditional method to estimate relationships and in turn use these relationship matrices to predict the breeding values of individuals. As a result, pedigree information can be traced back many generations, and the majority of animals can be connected through the relationship matrix. Due to this it is not uncommon for the pedigree of a specific breed or line to contain 100,000's to 1,000,000's of animals. However, compared to pedigree, only some of the animals are genotyped (although the rate of genotyping continues to increase), which makes things complicated when comparing the breeding values for animals that have been genotyped versus un-genotyped. As a result, the current section describes how to generate a joint pedigree and genomic relationship matrix, referred to as H, that is described in Legarra et al. (2009). The H matrix can be seen as a projection of G on the rest of the individuals (i.e. non-genotyped), such that if parents of two animals are related through G they should also be related in A. The H matrix is outlined below and subscripts with a 1 refer to non-genotyped animals and subscripts with a 2 refer to genotypes animals.

$$ H = \begin{bmatrix} A_{11} + A_{12}A_{22}^{-1}(G-A_{22})A_{22}^{-1}A_{21} & G_{12}A_{22}^{-1}G \\ GA_{22}^{-1}A_{21} & G \end{bmatrix} $$

When generating H it is important to note that genotyped animals are usually spread across generations and therefore genotyped animals within the A matrix are not located in the bottom right corner. As a result A matrix can be generated using the algorithm outlined in Recursive Method to Create A section and then reordered so that genotyped animals are after all non-genotyped animals. After the A matrix is created, any type of G matrix can be generated and outlined in the Methods to Create G section. Outlined below is a function that takes in as input a sorted pedigree (i.e. animal, sire and dam), a vector containing whether an animal has a genotype and the column they appear in G and the associated G matrix. The function outputs the H matrix ordered based on pedigree information.

The pedigree file and genomic file from Legarra (2006) can be utilized with the R code above. Outlined below is what the A and H relationship based on the following pedigree and genotype matrix.

A Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 1.00 0.50 0.25 0.12 0.12
2 - 1.00 0.50 0.25 0.12 0.12
3 - - 1.00 0.50 0.25 0.12 0.12
4 - - - 1.00 0.50 0.25 0.50 0.38 0.12
5 - - - - 1.00 0.50 0.25 0.25 0.12 0.12
6 - - - - - 1.00 0.50 0.25 0.25 0.12 0.12
7 - - - - - - 1.00 0.50 0.25 0.12
8 - - - - - - - 1.00 0.50 0.25 0.12
9 - - - - - - - - 1.00 0.50 0.25 0.25
10 - - - - - - - - - 1.00 0.50 0.25 0.38 0.25
11 - - - - - - - - - - 1.00 0.50 0.50 0.25 0.25
12 - - - - - - - - - - - 1.00 0.50 0.25
13 - - - - - - - - - - - - 1.00 0.12 0.56 0.50
14 - - - - - - - - - - - - - 1.00 0.25 0.12 0.50
15 - - - - - - - - - - - - - - 1.0 0.56 0.19
16 - - - - - - - - - - - - - - - 1.06 0.34
17 - - - - - - - - - - - - - - - - 1.00
H Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 1.00 0.18 0.18 0.18 0.18 0.18 0.18 0.50 0.35 0.35 0.35 0.43 0.35 0.26 0.34 0.39
2 - 1.00 0.18 0.18 0.18 0.18 0.18 0.18 0.50 0.35 0.35 0.35 0.43 0.35 0.26 0.34 0.39
3 - - 1.00 0.18 0.18 0.18 0.18 0.35 0.50 0.35 0.35 0.43 0.35 0.18 0.30 0.39
4 - - - 1.00 0.18 0.18 0.18 0.18 0.35 0.50 0.35 0.35 0.43 0.35 0.68 0.55 0.39
5 - - - - 1.00 0.18 0.18 0.35 0.35 0.50 0.35 0.35 0.43 0.34 0.34 0.39
6 - - - - - 1.00 0.18 0.18 0.35 0.35 0.50 0.35 0.35 0.43 0.34 0.34 0.39
7 - - - - - - 1.00 0.35 0.35 0.35 0.50 0.35 0.43 0.26 0.31 0.39
8 - - - - - - - 1.00 0.35 0.35 0.35 0.50 0.35 0.43 0.26 0.31 0.39
9 - - - - - - - - 1.00 0.70 0.70 0.70 0.85 0.70 0.53 0.69 0.78
10 - - - - - - - - - 1.00 0.70 0.70 0.85 0.70 0.60 0.73 0.78
11 - - - - - - - - - - 1.00 0.70 0.70 0.85 0.68 0.69 0.78
12 - - - - - - - - - - - 1.00 0.70 0.85 0.53 0.61 0.78
13 - - - - - - - - - - - - 1.35 0.70 0.56 0.96 1.03
14 - - - - - - - - - - - - - 1.35 0.60 0.65 1.03
15 - - - - - - - - - - - - - - 1.18 0.87 0.58
16 - - - - - - - - - - - - - - - 1.41 0.80
17 - - - - - - - - - - - - - - - - 1.53

Compatibility Between G and A22

When generating the A matrix the base population is assumed to have a mean breeding value of 0.0 and is based on animals that are at the beginning of the pedigree (e.g. parents are unknown). Similarly, when generating the G matrix the mean value of the genotyped population is set to zero if the allele frequencies are based on the observed set of genotypes. As a result, the base generations of G and A are different which results in the so-called compatibility differences between G and A22. Furthermore, parentage errors in the pedigree may cause even more differences in relationships between the two relationship matrices. The compatibility of G and A22 can be improved by making a weighted G that is a function of the original G and A22 and making the diagonals and off-diagonals of G equal A22 (Vitezica et al. 2011; Christensen et al. 2012). This is shown in the associated R code for the section.

References