Methods to Create A Genomic Relationship

The introduction of high-throughput genotyping platforms of varying sizes (i.e 384, 50k, 770k; Fan et al., 2010) that evenly covered the entire genome across all major livestock species has greatly changed the landscape of animal breeding. As outlined in the previous section the use of pedigree information can be utilized to estimate the expected relatedness between individuals, referred to as the coancestry coefficient or within an individual, referred to as inbreeding. Since the introduction of inexpensive genotype arrays, multiple methods have been developed to generate similarity [i.e., identical-by-state (IBS)]-based genomic relationship metrics (G) from marker genotypes (Leutenegger et al., 2003; Amin et al., 2007; VanRaden, 2008; Yang et al., 2010). The methods outlined in the function below vary according to the weight that each marker gets in the final G matrix. The use of a differential weighting across markers is due to the variation in information content across markers. The differing levels of information content are in part due to their differences in allele frequencies within the population and whether that marker is in LD with a mutation that impacts the trait of interest. For example, the weighting method described by VanRaden (2008) results in rare alleles receiving a higher weight in G than common alleles. One drawback of utilizing all markers in constructing G is that it implies that all markers contribute to genetic variation for a given trait under the context of predicting breeding values. Therefore, as an alternative, methods that weight relationships by their expected variance, effect, or statistical significance from a prior analysis have been developed (Leutenegger et al., 2003; Amin et al., 2007; Zhang et al., 2010; de Los Campos et al., 2013).

Outlined below is a function that takes in as input a SNP genotype matrix with dimension equal to number of animals by number of SNP.

GenomicRel = function(option,snp,inputfreq=c(),inputeffect=c())
{
### need to ensure all snp are either 0 1 or 2 ###
X <- which(snp != 0 & snp != 1 & snp != 2)
if (length(X) > 0){stop("The snp matrix contains values that are not 0 1 or 2.")}
if (length(inputfreq) == 0 & option == 3){stop("Need to provide freq to generate G.")}
if (length(inputfreq) > 0 & option != 3){stop("Don't need to provide freq to generate G.")}
if (length(inputfreq) > 0 & option == 3 & length(inputfreq) != ncol(snp))
{
stop("Dimension of snp and inputfreq size don't match.")
}
## If a SNP is fixed (i.e. freq = 0.0 or 1.0) remove it from the matrix ##
p <- colSums(snp) / (2*nrow(snp))
X <- which(p == 0.0 | p == 1.0)
if(length(X) > 0){snp <- snp[,-X]}
if(option==1)
{
#################################################
## ZZ' / 2sum(pq); Frequency is numerator ##
## and denominator based on observed frequency ##
#################################################
## Calculate Frequency ##
p <- colSums(snp) / (2*nrow(snp))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==2)
{
#######################################################################
## ZZ' / 2sum(pq); Frequency in numerator and denominator set at 0.5 ##
#######################################################################
p <- c(rep(0.5,ncol(snp)))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==3)
{
##################################################################################
## ZZ' / 2sum(pq); Frequency in numerator and denominator based on input values ##
##################################################################################
p <- inputfreq
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P)
ZZt = Z %*% t(Z)
## Scaler
den <- 2*(sum(p*(1-p)))
## Generate Final G
G <- ZZt/den; rm(p,P,Z,ZZt,den)
#diag(G) <- diag(G) + 0.0001
}
if(option==5)
{
#############################################################################
## ZDZ' / n; Frequency in numerator and in D based on observed frequencies ##
#############################################################################
p <- colSums(snp) / (2*nrow(snp))
P = matrix((2*p),byrow=T,nrow=nrow(snp),ncol=ncol(snp))
## Calculate centered Z matrix (snp - P)
Z = as.matrix(snp-P);
d <- 1 / (2*p*(1-p)) ## base G scaled based on variance in frequences ##
## Don't want to set up D(m x m)
Zstar2 <- t(t(Z)*sqrt(d));
G <- Zstar2%*%t(Zstar2)
G <- G / ncol(snp)
#diag(G) <- diag(G) + 0.0001
}
if(option==4)
{
p = colSums(snp) / (2*nrow(snp))
G <- matrix(data=NA,nrow=nrow(snp),ncol=nrow(snp),byrow=TRUE)
for(i in 1:nrow(snp))
{
for(j in i:nrow(snp))
{
S = 0;
for(k in 1:ncol(snp))
{
denom = 0.0;
denom = (2 * p[k] * (1- p[k]));
## part 6 of Yang et al. 2011
if(i != j)
{
S = S + ((snp[i,k] - 2 * p[k]) * (snp[j,k] - 2 * p[k])) / denom
}
if(i == j)
{
S = S+((snp[i,k]*snp[i,k])-((1+2*p[k])*snp[i,k])+2*(p[k]*p[k])) / denom
}
}
ifif(i == j){G[i,j] = 1 + S / ncol(snp);}
ifif(i != j){G[i,j] = S / ncol(snp); G[j,i] = G[i,j];}
}
}
}
return(G)
}