The introduction of high-throughput genotyping platforms of varying sizes (i.e 384, 50k, 770k; Fan et al., 2010) that evenly covered the entire genome
across all major livestock species has greatly changed the landscape of animal breeding. As outlined in the previous section the use of pedigree information
can be utilized to estimate the expected relatedness between individuals, referred to as the coancestry coefficient or within an individual, referred to as
inbreeding. Since the introduction of inexpensive genotype arrays, multiple methods have been developed to generate similarity [i.e., identical-by-state
(IBS)]-based genomic relationship metrics (G) from marker genotypes (Leutenegger et al., 2003; Amin et al., 2007; VanRaden, 2008; Yang et al., 2010).
The methods outlined in the function below vary according to the weight that each marker gets in the final G matrix. The use of a differential
weighting across markers is due to the variation in information content across markers. The differing levels of information content are in part due to their
differences in allele frequencies within the population and whether that marker is in LD with a mutation that impacts the trait of interest. For example,
the weighting method described by VanRaden (2008) results in rare alleles receiving a higher weight in G than common alleles. One drawback of utilizing all
markers in constructing G is that it implies that all markers contribute to genetic variation for a given trait under the context of predicting breeding values.
Therefore, as an alternative, methods that weight relationships by their expected variance, effect, or statistical significance from a prior analysis have
been developed (Leutenegger et al., 2003; Amin et al., 2007; Zhang et al., 2010; de Los Campos et al., 2013).
Outlined below is a function that takes in as input a SNP genotype matrix with dimension equal to number of animals by number of SNP.
GenomicRel = function(option,snp,inputfreq=c(),inputeffect=c())
{
### need to ensure all snp are either 0 1 or 2 ###
X <- which(snp != 0 & snp != 1 & snp != 2)
if (length(X) > 0){stop("The snp matrix contains values that are not 0 1 or 2.")}
if (length(inputfreq) == 0 & option == 3){stop("Need to provide freq to generate G.")}
if (length(inputfreq) > 0 & option != 3){stop("Don't need to provide freq to generate G.")}
if (length(inputfreq) > 0 & option == 3 & length(inputfreq) != ncol(snp))
{
stop("Dimension of snp and inputfreq size don't match.")
}
## If a SNP is fixed (i.e. freq = 0.0 or 1.0) remove it from the matrix ##
p <- colSums(snp) / (2*nrow(snp))
X <- which(p == 0.0 | p == 1.0)
if(length(X) > 0){snp <- snp[,-X]}
if(option==1)
{
#################################################
## ZZ' / 2sum(pq); Frequency is numerator ##
## and denominator based on observed frequency ##
#################################################
## Calculate Frequency ##
p <- colSums(snp) / (2*nrow(snp))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==2)
{
#######################################################################
## ZZ' / 2sum(pq); Frequency in numerator and denominator set at 0.5 ##
#######################################################################
p <- c(rep(0.5,ncol(snp)))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==3)
{
##################################################################################
## ZZ' / 2sum(pq); Frequency in numerator and denominator based on input values ##
##################################################################################
p <- inputfreq
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==5)
{
#############################################################################
## ZDZ' / n; Frequency in numerator and in D based on observed frequencies ##
#############################################################################
p <- colSums(snp) / (2*nrow(snp))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P);
d <- 1 / (2*p*(1-p)) ## base G scaled based on variance in frequences ##
## Don't want to set up D(m x m)
Zstar2 <- t(t(Z)*sqrt(d));
G <- Zstar2%*%t(Zstar2)
G <- G / ncol(snp)
#diag(G) <- diag(G) + 0.0001
}
if(option==4)
{
p = colSums(snp) / (2*nrow(snp))
G <- matrix(data=NA,nrow=nrow(snp),ncol=nrow(snp),byrow=TRUE)
for(i in 1:nrow(snp))
{
for(j in i:nrow(snp))
{
S = 0;
for(k in 1:ncol(snp))
{
denom = 0.0;
denom = (2 * p[k] * (1- p[k]));
## part 6 of Yang et al. 2011
if(i != j)
{
S = S + ((snp[i,k] - 2 * p[k]) * (snp[j,k] - 2 * p[k])) / denom
}
if(i == j)
{
S = S+((snp[i,k]*snp[i,k])-((1+2*p[k])*snp[i,k])+2*(p[k]*p[k])) / denom
}
}
ifif(i == j){G[i,j] = 1 + S / ncol(snp);}
ifif(i != j){G[i,j] = S / ncol(snp); G[j,i] = G[i,j];}
}
}
}
return(G)
}
- Amin, N., C. M. van Duijn, and Y. S. Aulchenko. 2007. A genomic background based method for association analysis in related individuals.
PLoS One 2:e1274.
- de Los Campos, G., A. I. Vazquez, R. Fernando, Y. C. Klimentidis, and D. Sorensen. 2013. Prediction of complex human traits using the genomic best
linear unbiased predictor. PLoS Genet. 9:e1003608.
- Fan, B., Z. Du, D. M. Gorbach, and M. F. Rothschild. 2010. Development and application of high- density SNP arrays in genomic
studies of domestic animals. Asian-Aust. J. Anim. Sci. 23(7): 833–847.
- Leutenegger, A.L., B. Prum, E. Génin, C. Verny, A. Lemainque, F. Clerget-Darpoux, and E. A. Thompson. 2003. Estimation of the inbreeding
coefficient through use of genomic data. Am. J. Hum. Genet. 73:516–523.
- VanRaden, P. M. 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414–4423.
- Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D. R. Nyholt, P. A. Madden, A. C. Heath, N. G. Martin, G. W. Montgomery,
M. E. Goddard, and P. M. Visscher. 2010. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42:565–569.
- Zhang, Z., J. Liu, X. Ding, P. Bijma, D.-J. de Koning, and Q. Zhang. 2010. Best linear unbiased prediction of genomic breeding values using a trait-specific
marker-derived relationship matrix. PLoS One 5:12648.