Genomic Prediction and Selection for Multi-Environments

Similar documents
Computations with Markers

Genomic Selection in R

Conifer Translational Genomics Network Coordinated Agricultural Project

OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS

Genome-wide prediction of maize single-cross performance, considering non-additive genetic effects

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Evaluation of random forest regression for prediction of breeding value from genomewide SNPs

Supplementary material

Accuracy of whole genome prediction using a genetic architecture enhanced variance-covariance matrix

General aspects of genome-wide association studies

Package FSTpackage. June 27, 2017

Association Mapping in Wheat: Issues and Trends

Prediction of clinical mastitis outcomes within and between environments using whole-genome markers

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Plant Science 446/546. Final Examination May 16, 2002

Seed Projects in Peru. Strengthening the seed sector improving food security

Developing New GM Products and Detection Methods

Edinburgh Research Explorer

OBJECTIVES-ACTIVITIES 2-4

Training population selection for (breeding value) prediction

Genomic Selection: A Step Change in Plant Breeding. Mark E. Sorrells

MMAP Genomic Matrix Calculations

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Practical integration of genomic selection in dairy cattle breeding schemes

TECHNICAL BULLETIN GENEMAX FOCUS - EVALUATION OF GROWTH & GRADE FOR COMMERCIAL USERS OF ANGUS GENETICS. November 2016

Exploring Similarities of Conserved Domains/Motifs

Biology Genetics Practice Quiz

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Mapping and Mapping Populations

2016 Management Yield Potential

A. COVER PAGE. Oswaldo Chicaiza, Alicia del Blanco (50%), Xiaoqin Zhang (70%), and Marcelo Soria (20%).

Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs

Genome-Wide Association Studies (GWAS): Computational Them

STATISTICAL APPLICATIONS IN PLANT BREEDING AND GENETICS CARL ALAN WALKER

Statistical Methods in Bioinformatics

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Genomic selection in American chestnut backcross populations

TSB Collaborative Research: Utilising i sequence data and genomics to improve novel carcass traits in beef cattle

Advanced breeding of solanaceous crops using BreeDB

Maja Boczkowska. Plant Breeding and Acclimatization Institute (IHAR) - NRI

Application GGE biplot and AMMI model to evaluate sweet sorghum (Sorghum bicolor) hybrids for genotype environment interaction and seasonal adaptation

Quantitative Genetics

Animal breeding for the future

Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus 1

Experimental Design and Sample Size Requirement for QTL Mapping

Fundamentals of Genetics. 4. Name the 7 characteristics, giving both dominant and recessive forms of the pea plants, in Mendel s experiments.

PROPOSAL AND APPLICATION GUIDELINES. International Wheat Yield Partnership 1st Competitive Call

Comparison of bread wheat lines selected by doubled haploid, single-seed descent and pedigree selection methods

Introgression of genetic material from primary synthetic hexaploids into an Australian bread wheat (Triticum aestivum L.)

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

IHIC 2011 Orlando, FL

Introduction to quantitative genetics

Rethinking Realizing Value from Genetic Resources

Funding breeding research in Canadian Pulses Carl Potts Executive Director April 5, /8/2013 1

A Fresh Look at Field Pea Breeding. Dr Garry Rosewarne Senior Research Scientist

Genomic models in bayz

The HL7 Clinical Genomics Work Group

Plant Science 546. Final Examination May 12, Ag.Sci. Room :00am to 12:00 noon

Report to California Wheat Commission: GH Experiments

The Challenges of [high-throughput] Phenotyping

The new infrastructure for cattle and sheep breeding in Ireland.

Bull Selection Strategies Using Genomic Estimated Breeding Values

An economic assessment of the value of molecular markers in plant breeding programs

PoultryTechnical NEWS. GenomChicks Advanced layer genetics using genomic breeding values. For further information, please contact us:

TTT: 7 WT: Text book by N.C.E.R.T. 2. Reference book by Dinesh Publications.

Melding genomics and quantitative genetics in sheep breeding programs: opportunities and limits

CHARACTERIZATION, CHALLENGES, AND USES OF SORGHUM DIVERSITY TO IMPROVE SORGHUM THROUGH PLANT BREEDING

Genome-wide association studies (GWAS) Part 1


A GENOTYPE CALLING ALGORITHM FOR AFFYMETRIX SNP ARRAYS

A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts

Workshop Wheat Production Technologies for farmers to face Climate Change challenges

H3A - Genome-Wide Association testing SOP

Implementation of Genomic Selection in Pig Breeding Programs

Supplementary Note: Detecting population structure in rare variant data

Runs of Homozygosity Analysis Tutorial

Molecular markers in plant breeding

Genome-wide association mapping using single-step GBLUP!!!!

Genomic Estimated Breeding Values Using Genomic Relationship Matrices in a Cloned. Population of Loblolly Pine. Fikret Isik*

Late blight resistance of potato hybrids with diverse genetic background

Accuracy and Training Population Design for Genomic Selection on Quantitative Traits in Elite North American Oats

Crop Science Society of America

DO NOT OPEN UNTIL TOLD TO START

Building Better Algae

Orchardgrass Breeding and Genetics. Joseph G. Robins B. Shaun Bushman Kevin B. Jensen. Forage and Range Research Laboratory

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Measurement error variance of testday observations from automatic milking systems

I See Dead People: Gene Mapping Via Ancestral Inference

Cowpea Breeding. Ainong Shi. University of Arkansas

Genetics of dairy production

Strategy for applying genome-wide selection in dairy cattle

1 why study multiple traits together?

RECOLAD. Introduction to the «atelier 1» Genetic approaches to improve adaptation to climate change in livestock

A journey: opportunities & challenges of melding genomics into U.S. sheep breeding programs

Evaluation of experimental designs in durum wheat trials

ARTICLE ROADTRIPS: Case-Control Association Testing with Partially or Completely Unknown Population and Pedigree Structure

BS 50 Genetics and Genomics Week of Nov 29

Uncertainties and certainties in GMO analytics using qpcr

Modeling and simulation of plant breeding with applications in wheat and maize

Transcription:

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@cgiar.org P. Pérez 2 perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-México 2 ColPos-México 3 Michigan-USA. June, 2015. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 1/24

Contents 1 The problem 2 Models 3 Model fitting 4 Cross validation 5 Application examples (Part 1) 6 Model extensions with environmental covariates CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 2/24

The problem The problem In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 3/24

Continue... The problem CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 4/24

Continue... The problem CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 5/24

Models Models Model 1 (EL, Environment + Line, no pedigree) y ij = µ + E i + L i + e ij Model 2 (EA, Environment + Line, with markers) y ij = µ + E i + g j + e ij Model 3 (Environments, Line and interactions markes and environment) y ij = µ + E i + g j + Eg ij + e ij CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 6/24

Assumptions Models It is assumed that E i N(0, σ 2 E ), g N(0, σ2 gg) with G being the genomic relationship matrix and Eg ij the interaction term between genotypes and environment. Eg N(0, (Z g GZ T g ) Z E Z T E), Z g connects genotypes with phenotypes, Z E connects phenotypes with environments, and stands for Hadamart product between two matrices. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 7/24

Model fitting Description of Data Objects - Y, data frame containing the elements described below; - Y$yield: (nx1), a numeric vector with centered and standardized yield; - Y$VAR (nx1), a factor giving the IDs for the varieties; - Y$ENV (nx1), a factor giving the IDs for the environments; - A, a symmetric positive semi-definite matrix containing the pedigree or marker-based relationships (dimensions equal to number of lines by number of lines). We assume that the rownames(a)=colnames(a) gives the IDs of the lines; CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 8/24

Model fitting Model fitting Model 1 (EL, Environment + Line, no pedigree) library(bglr) # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-factor(x=Y$VAR,levels=rownames(A),ordered=TRUE) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) fm1<-bglr(y=y$yield,eta=eta,saveat="m1_",niter=6000,burnin=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 9/24

Model fitting Model fitting Model 2 (EA, Environment + Line, with markers) X<-scale(X,center=TRUE,scale=TRUE) G<-tcrossprod(X)/ncol(X) G<-G/mean(diag(G)) L<-t(chol(G)) ZL<-ZVAR%*%L ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR") ) fm2<-bglr(y=y$yield,eta=eta,saveat="m2_",niter=6000,burnin=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 10/24

Model fitting Model 3 (Environments, Line and interactions markers and environment) ZGZ<-tcrossprod(ZL) ZEZE<-tcrossprod(ZE) K<-ZGZ*ZEZE diag(k)<-diag(k)+1/200 K<-K/mean(diag(K)) ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR"), EGrm=list(K=K,model="RKHS") ) fm3<-bglr(y=y$yield,eta=eta, saveat= M3_,nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 11/24

Cross validation Cross validation 1 CV1: Prediction of performance of newly developed lines (i.e., lines that have not been evaluated in any field trials). 2 CV2: Prediction in incomplete field trials; here the aim was to predict performance of lines that have been evaluated in some environments but not in others. See Figure in next slide. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 12/24

Continue... Cross validation Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014). CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 13/24

Application examples (Part 1) Example Wheat dataset (CIMMyT) Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers (x ij, i = 1,..., n, j = 1,..., p) (coded as 0,1). The pedigree information is also available. Histogram of Y$yield Yield 1 2 3 4 5 6 7 Frequency 0 100 200 300 400 1 2 4 5 Environment 1 2 3 4 5 6 7 Y$yield Figure 2: Grain yield by environment. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 14/24

Application examples (Part 1) Data preparation... #Load genotypic data load("pedigree_markers.rdata") #Load phenotypic data pheno=read.table(file="599_yield_raw-1.prn",header=true) pheno=pheno[,c(2,5,6)] index=paste(pheno$env,pheno$gen1,sep="@") yavg=tapply(pheno$gy,index,"mean") tmp=names(yavg) tmp2=strsplit(tmp,"@") gen=character() env=character() for(i in 1:length(tmp2)) { env[i]=tmp2[[i]][1] gen[i]=tmp2[[i]][2] } Y=data.frame(yield=yavg,VAR=gen,ENV=env) index=order(as.character(y$env),as.character(y$var)) Y=Y[index,] CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 15/24

Continue... Application examples (Part 1) index=order(colnames(a)) A=A[index,index] X=X[index,] save(y,a,x,file="standarized_data.rdata") CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 16/24

Application examples (Part 1) Code for cross validation schemas... #CV=1: assigns lines to folds #CV=2: assigns entries of a line to folds CV<-2 nfolds<-5 sets<-rep(na,nrow(y)) set.seed(123) IDs<-as.character(unique(Y$VAR)) if(cv==1) { folds<-sample(1:nfolds,size=length(ids),replace=true) for(i in 1:nrow(Y)){ sets[i]<-folds[which(ids==y$var[i])] } } if(cv==2) { IDy<-as.character(Y$VAR) for(i in IDs){ tmp=which(idy==i) ni=length(tmp) tmpfold<-sample(1:nfolds,size=ni,replace=ni>nfolds) sets[tmp]<-tmpfold } } CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 17/24

Application examples (Part 1) Fitting model and extracting results... ################################################### #Model 1 ################################################### # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-as.factor(Y$VAR) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) y=y$yield testing=(sets==1) y[testing]=na fm1<-bglr(y=y,eta=eta,saveat="m1_",niter=6000,burnin=1000) unlink("*.dat") #Extract the predictions predictions=data.frame(env=y$env[testing], Individual=Y$VAR[testing], y=y$yield[testing], yhat=fm1$yhat[testing]) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 18/24

Continue... Application examples (Part 1) #write.table(predictions,file=paste("predictions.csv",sep=""), # row.names=false,sep=",") #doby version predictions=orderby(~env,data=predictions) lapplyby(~env,data=predictions,function(x){cor(x$yhat,x$y)}) > lapplyby(~env,data=predictions,function(x){cor(x$yhat,x$y)}) $ 1 [1] 0.01630911 $ 2 [1] 0.6108203 $ 4 [1] 0.564435 $ 5 [1] 0.289207 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 19/24

Application examples (Part 1) Results for one fold... Correlation 0.0 0.1 0.2 0.3 0.4 M1 M2 M3 Figure 3: Results from CV1 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 20/24

Continue... Application examples (Part 1) Correlation 0.0 0.1 0.2 0.3 0.4 0.5 M1 M2 M3 Figure 4: Results from CV2 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 21/24

Model extensions with environmental covariates Model extensions with environmental covariates This model is obtained by extending model EA by incorporating the environmental covariates. Model 4 (EAW) y ij = µ + E i + a j + t ij + e ij, where t ij = Q q=1 W ijqγ q represent a regression on ECs and W ijq is the evaluation of the q-th EC at the ij-th environmental-line combination and γ q represents the effect of the q-th EC. Assumptions: γ q N(0, σ 2 γ), t = W γ N(0, σ 2 t W W T ). CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 22/24

Model extensions with environmental covariates Continue... Model 5 (EAW-A W) y ij = µ + E i + a j + t ij + at ij + e ij Assumptions: at N(0, (Z p GZ T p ) WW T σ 2 at ) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 23/24

Model extensions with environmental covariates References Burgueño, J., G. de-los-campos, K. Weigel, and J. Crossa. (2012). Genomic prediction of breeding values when modeling genotype environment interaction using pedigree and dense molecular markers. Crop Science, 43: 311-320. Jarquín, D., J. Crossa, X. Lacaze, P. Cheyron, J. Daucourt, J. Lorgeou, F. Piraux, et al. (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics, 127 (3): 595-607. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 24/24