Genomic Selection in R

Similar documents
Combining Ability define by Gene Action

Module 1 Principles of plant breeding

Genomic Prediction and Selection for Multi-Environments

Conifer Translational Genomics Network Coordinated Agricultural Project

BLUPF90 suite of programs for animal breeding with focus on genomics

Integration of Genomic Selection into the University of Florida Strawberry Breeding Program

Subproject 1: Dairy cattle

Quantitative Genetics, Genetical Genomics, and Plant Improvement

Lecture 1 Introduction to Modern Plant Breeding. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Genomic Selection: A Step Change in Plant Breeding. Mark E. Sorrells

Strategy for Applying Genome-Wide Selection in Dairy Cattle

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

Practical integration of genomic selection in dairy cattle breeding schemes

Genomic Selection in Dairy Cattle

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

General aspects of genome-wide association studies

Model comparison based on genomic predictions of litter size and piglet mortality

LINE X TESTER ANALYSIS FOR GRAIN YIELD AND YIELD RELATED TRAITS IN MAIZE VARIETY SARHAD-WHITE

Supplementary figures, tables and note for Genome-based establishment of. a high-yielding heterotic pattern for hybrid wheat breeding

Introduction to Quantitative Genetics

How large-scale genomic evaluations are possible. Daniela Lourenco

Do markers add value?

AlphaSim software for

STUDIES ON COMBINING ABILITY AND HETEROSIS IN FIELD PEA (PISUM SATIVUM L.)

High-density SNP Genotyping Analysis of Broiler Breeding Lines

Genomic Selection with Linear Models and Rank Aggregation

Use of molecular markers to enhance genetic gains in the maritime pine breeding program

Genomic selection and its potential to change cattle breeding

Genome-wide prediction of maize single-cross performance, considering non-additive genetic effects

Quantitative Genetics

Including α s1 casein gene information in genomic evaluations of French dairy goats

Introduction to Quantitative Genomics / Genetics

EFFICIENT DESIGNS FOR FINE-MAPPING OF QUANTITATIVE TRAIT LOCI USING LINKAGE DISEQUILIBRIUM AND LINKAGE

The promise of genomics for animal improvement. Daniela Lourenco

Modern Genetic Evaluation Procedures Why BLUP?

Identifying Genes Underlying QTLs

COMBINING ABILITY STUDIES IN PEARL MILLET [PENNISETUM GLAUCUM (L.) R. BR.]

Combining ability for yield and quality in Sugarcane

The effect of host genetics factors on

FOREST GENETICS. The role of genetics in domestication, management and conservation of forest tree populations

Single- step GBLUP using APY inverse for protein yield in US Holstein with a large number of genotyped animals

Genomic Selection in Cereals. Just Jensen Center for Quantitative Genetics and Genomics

Hybrid wheat heterosis, yield stability and comparsion to line breeding

GenSap Meeting, June 13-14, Aarhus. Genomic Selection with QTL information Didier Boichard

Application of MAS in French dairy cattle. Guillaume F., Fritz S., Boichard D., Druet T.

Genomic selection in American chestnut backcross populations

QTL Mapping, MAS, and Genomic Selection

Single step genomic evaluations for the Nordic Red Dairy cattle test day data

Genomic Estimated Breeding Values Using Genomic Relationship Matrices in a Cloned. Population of Loblolly Pine. Fikret Isik*

QTL Mapping Using Multiple Markers Simultaneously

Supplementary material

Genetics of dairy production

QTL Mapping, MAS, and Genomic Selection

Influence of Cytoplasmic-nuclear Male Sterility on Agronomic Performance of Sorghum Hybrids

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013

Marker-Assisted Selection for Quantitative Traits

Genomic selection applies to synthetic breeds

Genomic Selection in Sheep Breeding Programs

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

TEXAS A&M PLANT BREEDING BULLETIN

Genomic prediction for numerically small breeds, using models with pre selected and differentially weighted markers

Understanding genomic selection in poultry breeding

Implementation of Genomic Selection in Pig Breeding Programs

Genomic-Polygenic and Polygenic Evaluation of Multibreed Angus-Brahman Cattle for Direct and Maternal Growth Traits Under Subtropical Conditions

Course Announcements

A Few Thoughts on the Future of Plant Breeding. Ted Crosbie VP Global Plant Breeding Monsanto Distinguished Science Fellow

Genetics of Beef Cattle: Moving to the genomics era Matt Spangler, Assistant Professor, Animal Science, University of Nebraska-Lincoln

Chapter 1 Molecular Genetic Approaches to Maize Improvement an Introduction

Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Accuracy of whole genome prediction using a genetic architecture enhanced variance-covariance matrix

Strategic Research Center. Genomic Selection in Animals and Plants

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

GCTA/GREML. Rebecca Johnson. March 30th, 2017

The effect of genomic information on optimal contribution selection in livestock breeding programs

Using Triple Test Cross Analysis to Estimates Genetic Components, Prediction and Genetic Correlation in Bread Wheat

Applications of Genomics Introduction of genomics selection procedures in existing genetic improvement programmes

Genomic prediction. Kevin Byskov, Ulrik Sander Nielsen and Gert Pedersen Aamand. Nordisk Avlsværdi Vurdering. Nordic Cattle Genetic Evaluation

Optimal Method For Analysis Of Disconnected Diallel Tests. Bin Xiang and Bailian Li

Genetics Effective Use of New and Existing Methods

MMAP Genomic Matrix Calculations

HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers

Computations with Markers

Managing genetic groups in single-step genomic evaluations applied on female fertility traits in Nordic Red Dairy cattle

Forage Crop Research Division, NARO Institute of Livestock and Grassland Science (Nasushiobara, Tochigi , Japan) 2

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations

Genome-wide association mapping using single-step GBLUP!!!!

Marker Assisted Selection Where, When, and How. Lecture 18

Development of Early Maturing GEM lines with Value Added Traits: Moving U.S. Corn Belt GEM Germplasm Northward

Modeling Wood Quality Using Random Regression Splines. Luis Alejandro Apiolaza

Single-step genomic BLUP for national beef cattle evaluation in US: from initial developments to final implementation

GENETIC SYSTEM CONTROLLING THE YIELD AND ITS COMPONENTS IN THREE BREAD WHEAT (TRITICUM AESTIVUM, L.) CROSSES

Agricultural Outlook Forum Presented: February 17, 2006 STRATEGIES IN THE APPLICATION OF BIOTECH TO DROUGHT TOLERANCE

Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models

Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus 1

Effects of Marker Density, Number of Quantitative Trait Loci and Heritability of Trait on Genomic Selection Accuracy

MARKER-ASSISTED EVALUATION AND IMPROVEMENT OF MAIZE

Genotype Prediction with SVMs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Transcription:

Genomic Selection in R Giovanny Covarrubias-Pazaran Department of Horticulture, University of Wisconsin, Madison, Wisconsin, Unites States of America E-mail: covarrubiasp@wisc.edu. Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. The R package sommer facilitates the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The program contains four algorithms for estimating variance components: Average information (AI), Newton-Raphson (NR), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA; ridge regression). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. sommer can handle more complex problems than regular genomic selection software, and is faster than Bayesian counterparts in the magnitude of hours to days, and can deal with missing data, using a gentle environment such as R. Other software available for genomic selection are:

a)! rrblup (only functional for a single random effect) b)! regress (only NR algorithm available, returns negative variance components) c)! ASReml (not free) d)! BGLR (Bayesian can take a long time) e)! MCMCglmm (Better Bayesian but this takes a long long time 1000 ) Four scenarios of genomic selection are highlighted in this document: PREDICTION OF GENERAL PERFORMANCE OF CROSSES 1)! Genotypic and phenotypic data for the parents is available and we want to predict performance of the possible crosses assuming a purely additive model (species with no heterosis) 2)! Genotypic data for the parents is available and phenotypic data for some of all the possible crosses is available (~10%), and we want to predict performance of the rest of the possible crosses (~90%) assuming an additive-dominant model (species with heterosis) PREDICTION OF SPECIFIC PERFORMANCE OF INDIVIDUALS WITHIN POPULATIONS 3)! Genotypic data for a population of individuals is available and phenotypic data is available only for some (i.e. phenotiping is very expensive) and you aim to predict the rest of the population using a purely additive model. 4)! Genotypic data for a population of individuals is available and phenotypic data is available only for some (i.e. phenotiping is very expensive) and you aim to predict the rest of the population using an additive-dominance-epistatic model. Situation 1) occurs when you work with a species that it reproductive mechanism that is mainly self pollinated, therefore heterosis is strangely encounter. The performance of the cross can be estimated then as the average of parental breeding values (BV). To obtain the breeding value of certain parents, such materials are tested in different locations and years and fitting a mixed model to obtain the genotypic BLUPs. Henderson realized that when some data is missing, the use of the pedigree among individuals could be used to predict the performance of some

individuals in the scenarios where the data was missing. Keeping track of pedigrees is difficult in all breeding programs which has lead to the estimation of relationships using markers. This matrix of relationships based on markers has been named genomic relationship matrix, and is parallel to the additive relationship matrix based on pedigrees. Assume you work at CIMMYT and have genomic information for 599 lines with 1279 SNP markers each. Given they are lines you expect only additive variance to be significant. Now you want to predict the performance of all possible crosses among those 599 lines. Using sommer you would do it this way: #### call the phenotypic and genotypic information library(sommer) data(wheatlines) X <- wheatlines$wheatgeno; X[1:5,1:5]; dim(x) Y <- wheatlines$wheatpheno rownames(x) <- rownames(y) #### select environment 1 and create incidence and additive #### relationship matrices y <- Y[,1] # response grain yield Z1 <- diag(length(y)) # incidence matrix K <- A.mat(X) # additive relationship matrix #### perform the GBLUP pedigree-based approach by ### specifying your random effects (ETA) in a 2-level list ### structure and run it using the mmer function ETA <- list(add=list(z=z1, K=K)) ans <- mmer(y=y, Z=ETA, method="emma") # kinship based summary(ans) #### Predict the progeny by extracting the BV for the lines #### and get the average BV for all possible combinations GEBV.pb <- ans$u.hat # this are the BV

rownames(gebv.pb) <- rownames(y) crosses <- do.call(expand.grid, list(rownames(y),rownames(y))); dim(crosses) cross2 <- duplicated(t(apply(crosses, 1, sort))) crosses2 <- crosses[cross2,]; head(crosses2); dim(crosses2) # get GCA1 and GCA2 of each hybrid GCA1 = GEBV.pb[match(crosses2[,1], rownames(gebv.pb))] GCA2 = GEBV.pb[match(crosses2[,2], rownames(gebv.pb))] #### join everything and get the mean BV for each combination BV <- data.frame(crosses2,gca1,gca2); head(bv) BV$BVcross <- apply(bv[,c(3:4)],1,mean); head(bv) plot(bv$bvcross) Finally, you will get the GEBV for the 179,101 possible crosses from this 599 wheat lines, you can sort them by best performance and in the real world you would do the best crosses predicted. Situation 2) occurs when you work with a species with an outcross reproductive mechanism therefore heterosis is usually encountered. In this example we show how to perform genomic prediction for single crosses that have not occurred yet using information of some of the single crosses available. Assume you work in a corn breeding program and have 40 plants from 2

heterotic groups, 20 in each (Dent and Flint). And you have genotypic data for the 40 parents and phenotypic information from 100 out of the 400 possible crosses evaluated in four environments. You can use this information to predict the other 300 crosses. data(cornhybrid) hybrid2 <- cornhybrid$hybrid # extract cross data A <- cornhybrid$k # Additive relationship matrix for all y <- hybrid2$yield # response ### incidence matrices X1 <- model.matrix(~ Location, data = hybrid2);dim(x1) Z1 <- model.matrix(~ GCA1-1, data = hybrid2);dim(z1) Z2 <- model.matrix(~ GCA2-1, data = hybrid2);dim(z2) Z3 <- model.matrix(~ SCA -1, data = hybrid2);dim(z3) #### Realized IBS relationships for each effect K1 <- A[levels(hybrid2$GCA1), levels(hybrid2$gca1)]; dim(k1) K2 <- A[levels(hybrid2$GCA2), levels(hybrid2$gca2)]; dim(k2) S <- kronecker(k1, K2) ; dim(s) rownames(s) <- colnames(s) <- levels(hybrid2$sca) ### specify random component ETA <- list(list(z=z1, K=K1), list(z=z2, K=K2), list(z=z3, K=S)) ans <- mmer(y=y, X=X1, Z=ETA) summary(ans) Now you have fitted values for all possible 400 single cross hybrids including those missing points and BLUPs for GCA s and SCA s.

Situation 3) occurs when you want to predict the performance of a specific individual that you have genotype but not phenotype. This usually occurs when the phenotyping is expensive and you can only achieve to phenotype some individuals, but genotyping is not limited. Therefore, you can use the all the information to predict the performance of the individuals that are genotyped but not phenotyped. We will predict the color for individuals using a purely additive model in a full sib family. data(cpdata) CPpheno <- CPdata$pheno CPgeno <- CPdata$geno ### look at the data head(cppheno) CPgeno[1:5,1:5] ## fit a model including additive and dominance effects y <- CPpheno$color Za <- diag(length(y))

A <- A.mat(CPgeno) # additive relationship matrix y.trn <- y # copy the response to test prediction accuracy ### delete data for 1/5 of the population ww <- sample(c(1:dim(za)[1]),72) y.trn[ww] <- NA ETA.A <- list(add=list(z=za,k=a)) ans.a <- mmer(y=y.trn, Z=ETA.A) cor(ans.a$fitted.y[ww], y[ww], use="pairwise.complete.obs") Situation 4) is the same than the previous example but adding the dominance and epistatic relationships to the model. Given that this is a full sib family and we know that ¼ of the σ 2 D is shared among individuals of this type of family we include this effects expecting to gain prediction accuracy. Zd <- diag(length(y)) Ze <- diag(length(y)) D <- D.mat(CPgeno) # dominant relationship matrix E <- E.mat(CPgeno) # epistatic relationship matrix ETA.ADE <- list(add=list(z=za,k=a),dom=list(z=zd,k=d),epi=list(z=ze,k=e)) ans.ade <- mmer(y=y.trn, Z=ETA.ADE) cor(ans.ade$fitted.y[ww], y[ww], use="pairwise.complete.obs") summary(ans.ade) As you can see the epistatic variance is usually zero or insignificant to make a difference in the prediction but the addition of dominance relationships definitely increased the prediction accuracy in full sib families as theory states. You may want to check it from time to time and use it for this families or polyploidy organisms.

Figure. Comparison among purely additive versus additive+dominance model showing a prediction increment in a full sib family where dominance relationships are important.