Genomic Selection: A Step Change in Plant Breeding. Mark E. Sorrells

Similar documents
Molecular markers in plant breeding

Association Mapping in Wheat: Issues and Trends

OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS

A. COVER PAGE. Oswaldo Chicaiza, Alicia del Blanco (50%), Xiaoqin Zhang (70%), and Marcelo Soria (20%).

Accuracy and Training Population Design for Genomic Selection on Quantitative Traits in Elite North American Oats

University of Missouri, Agricultural Experiment Station College of Agriculture, Food, and Natural Resources Columbia, Missouri

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations

Conifer Translational Genomics Network Coordinated Agricultural Project

Plant Science 446/546. Final Examination May 16, 2002

Modeling and simulation of plant breeding with applications in wheat and maize

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Mapping and Mapping Populations

Evaluation of random forest regression for prediction of breeding value from genomewide SNPs

Fundamentals of Genomic Selection

Genomic Selection in R

Plant Breeding with Genomic Selection: Gain per Unit Time and Cost

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Genome-wide prediction of maize single-cross performance, considering non-additive genetic effects

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Practical integration of genomic selection in dairy cattle breeding schemes

Strategy for applying genome-wide selection in dairy cattle

Genomic selection for quantitative adult plant stem rust resistance in wheat

Optimal population value selection: A populationbased selection strategy for genomic selection

Association Mapping of Agronomic QTLs in US Spring Barley Breeding Germplasm

Genomic models in bayz

Introgression of genetic material from primary synthetic hexaploids into an Australian bread wheat (Triticum aestivum L.)

General aspects of genome-wide association studies

TEXAS A&M PLANT BREEDING BULLETIN

TEXAS A&M PLANT BREEDING BULLETIN

Genetics of dairy production

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Introduction to quantitative genetics

A Fresh Look at Field Pea Breeding. Dr Garry Rosewarne Senior Research Scientist

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Plant Science 546. Final Examination May 12, Ag.Sci. Room :00am to 12:00 noon

Wheat Breeding Programs

Zahirul Talukder 1, Yunming Long 1, Thomas Gulya 2, Charles Block 3, Gerald Seiler 2, Lili Qi 2. Department of Plant Sciences, NDSU

Quantitative Genetics

Whole-genome strategies for marker-assisted plant breeding

Genomic resources and gene/qtl discovery in cereals

Speeding up discovery in plant genetics and breeding

2018 Brewers Association Funded Research Grants

Tile Theory of Pre-Breeding

USDA-ARS/ U.S. Wheat and Barley Scab Initiative FY16 Final Performance Report Due date: July 28, USWBSI Individual Project(s)


Report to California Wheat Commission: GH Experiments

Genomic Selection with Deep Neural Networks

Orchardgrass Breeding and Genetics. Joseph G. Robins B. Shaun Bushman Kevin B. Jensen. Forage and Range Research Laboratory

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Yield Gains in Sunflower. Brent S. Hulke USDA-ARS-Northern Crop Science Laboratory Larry W. Kleingartner National Sunflower Association

Implementing association mapping and genomic selection using germplasm from Midwest barley breeding programs

Association genetics in barley

Improving yield and quality traits of durum wheat by introgressing chromosome segments from hexaploid wheat

Genomic Selection Using Low-Density Marker Panels

Genetic architecture of grain quality characters in rice (Oryza sativa L.)

STATISTICAL AND COMPUTATIONAL TOOLS IN MOLECULAR BREEDING AND BIOTECHNOLOGY

Use of Doubled Haploids in the Context of Genetic Resource Exploitation in Maize

Mapping and selection of bacterial spot resistance in complex populations. David Francis, Sung-Chur Sim, Hui Wang, Matt Robbins, Wencai Yang.

Statement of the Problem

Optimum Hybrid Maize Breeding Strategies Using Doubled Haploids

Accuracy of whole genome prediction using a genetic architecture enhanced variance-covariance matrix

Training population selection for (breeding value) prediction

A pedigree marker-assisted selection (PMAS) strategy for improvement of Tajan-derived wheat lines in Iran

F 1 TtYy Tall/Resistant

TEXAS A&M PLANT BREEDING BULLETIN

NOTICE OF RELEASE OF MASAMI SOFT WHITE WINTER WHEAT CULTIVAR

Variability and Heritability in Selection Schemes of Desi Chickpea (Cicer arietinum L.)

Pulse Improvement: Chickpea and Field Pea. W. Erskine & T.N. Khan CLIMA/DAFWA

Genomic Selection in Tomato Breeding

Genomic predictability of interconnected bi-parental. maize populations

Genomic Selection using low-density marker panels

QUANTITATIVE INHERITANCE OF SOME WHEAT AGRONOMIC TRAITS

Comparison of bread wheat lines selected by doubled haploid, single-seed descent and pedigree selection methods

Practicality of Managing Mycotoxins in our Grain System. Grain Farmers of Ontario

Cowpea Breeding. Ainong Shi. University of Arkansas

Redacted for privacy

Seed Projects in Peru. Strengthening the seed sector improving food security

Effectiveness of breeding methods for production of superior genotypes and maintenance of genetic variance in Faba Bean (Vicia faba, L.

Relative efficiency of different breeding methods for improvement of yield and yield components in chickpea (Cicer arietinum L.)

Bull Selection Strategies Using Genomic Estimated Breeding Values

VARIATION OF AGRONOMIC TRAITS IN A WORLD COLLECTION OF VETCH (VICIA SATIVA L.)

WORKING GROUP ON BIOCHEMICAL AND MOLECULAR TECHNIQUES AND DNA PROFILING IN PARTICULAR. Eleventh Session Madrid, September 16 to 18, 2008

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

Conifer Translational Genomics Network Coordinated Agricultural Project

Genomic selection in American chestnut backcross populations

JESSE POLAND. Research Geneticist, USDA-ARS Adj. Assistant Professor, KSU.

Package xbreed. April 3, 2017

Managing genetic groups in single-step genomic evaluations applied on female fertility traits in Nordic Red Dairy cattle

Breeding Climate-Smart Cowpeas for West Africa

Implementation of Genomic Selection in Pig Breeding Programs

The following motion and supporting documentation is presented for consideration at the 2005 Cultivar Release and Recommendation Meeting in Bozeman:

Regression and path analysis of oil and seed yield in canola cultivars (Brassica napus L.)

Molecular Marker-Assisted Breeding

GRDC Grains Research Update. Breeding for Grower profit Steve Jefferies CEO, Australian Grain Technologies (AGT)

An economic assessment of the value of molecular markers in plant breeding programs

US Barley Acreage Million Acres

Crop Rotation, Prosaro Fungicide and Cultivar as Management Tools to Control Disease on 2- and 6-Row Barley and Durum Wheat, Langdon, 2007

INTERNATIONAL UNION FOR THE PROTECTION OF NEW VARIETIES OF PLANTS

Supplementary figures, tables and note for Genome-based establishment of. a high-yielding heterotic pattern for hybrid wheat breeding

Transcription:

Genomic Selection: A Step Change in Plant Breeding Mark E. Sorrells

People who contributed to research in this presentation Jean-Luc Jannink USDA/ARS, Cornell University Elliot Heffner Pioneer Hi-Bred International Nicolas Heslot Ph.D. Student, Cornell University Jessica Rutkoski - Ph.D. Student, Cornell University Julie Dawson Post doc, Cornell University Jeff Endelman Post doc, Cornell University

Presentation Overview Genomic (Genome-Wide) Selection Characterizing GxE using marker effects Integration in a breeding program Biparental and Multi-Family predictions Recurrent Genomic Selection

Molecular Breeding Goals Allele discovery Allele characterization & validation Parental & progeny selection for superior alleles at multiple loci to generate transgressive segregation Increase response (R) from selection 1) More accurate selection (r A ) 2) Larger selection differential (S) R = r 2 A S t Increase selection intensity by reducing costs 3) Shorter selection cycle time (t)

GS in a Plant Breeding Program Heffner, Sorrells & Jannink. Crop Science 49:1-12 Training Population Elite lines informative for model improvement Test varieties and release Advance lines with highest GEBV Breeding Population Phenotype (lines have already been genotyped) Model Training Cycle Genomic Selection Line Development Cycle Make crosses and advance generations Train prediction model Genotype New Germplasm Genomic selection reduces cycle time & cost by reducing frequency of phenotyping

Factors Affecting the Accuracy of GEBVs Level and distribution of LD between markers and QTL Meuwissen 2009: Minimum number of markers for across family= Ne*L where Ne is the effective population size and L is the genome size in Morgans E.g. Wheat: 50x30=1500 Marker number must scale with effective population size. Size of training population Larger is better and over time re-training models will be required Meuwissen 2009: Minimum number of records for across family prediction = 2*Ne*L E.g. Wheat: 2X50x30=3000 Heritability of the trait More records are required for low heritability traits Distribution of QTL effects Many small effect QTL or low LD favor BLUP for capturing small effect QTL that may not be in LD with a marker Prediction based on relationship decays faster than prediction based on LD.

Choosing a Statistical Model for GS Problem: Must estimate many QTL effects from a limited number of phenotypes or records (large p small n problem) Previous Approach: Least Squares regression for variable selection Markers are fixed effects and an arbitrary threshold for significance is used to fit the markers Results in overestimation of significant effects and loss of small effects Options: Variable selection or shrinkage estimation can be used to deal with oversaturated regression models Current methods: Linear mixed models, Bayesian estimation, Machine Learning No Testing for significance Many QTL effects, set as random effects, can be estimated simultaneously

Choosing a Statistical Model for GS Model performance is based on correlation between GEBV and True Breeding Value (TBV) Accuracy (r) = Pearson s correlation coefficient using: Breeding values based on phenotype Breeding values estimated based on GS prediction models r is analogous to the square root of the narrow sense heritability

Optimizing Prediction using Marker Effects Nicolas Heslot Limagrain Europe Barley 996 F6 &F7 lines 58 environments 13,682 plots Unbalanced only 18 genotypes present in >50% of the environments Bayesian LASSO GS model Characterize allele-effect estimates for each test location Identify outlier environments Group relevant mega-environments

Use of Marker Effects to Cluster Environments Marker effects for all lines in each environment formed a balanced dataset for computing Euclidean distances Dissimilar Similar 4 clear subgroups but not related to region or year Outlier environments were identified but there was no appreciable gain in accuracy.

Train GS model in each environment Compute mean accuracy of each training environment for predicting line performance in the other environments, rank them Remove environment one at a time starting from the least predictive Optimizing the Composition of the Training Population (Heslot et al TAG in review) Uses the predictive ability of an environment to optimize the composition of the training population derived from the complete dataset Train and cross validate a GS model on the remaining training population: Predictive set Predict the removed data with the GS model: Unpredictive set Use both accuracy measurements to decide the cut-off point

accuracy Accuracy 0.0 0.2 0.4 0.6 0.8 Optimizing the Composition of the Training Population (Heslot et al TAG in review) Remove least predictive environments one at a time, then cross validate Prediction accuracy rose from 0.54 to 0.61 and no change in heritability Some outlier environments were included in the optimal model Accuracy in the validation set increased from 0.279 to 0.292 Only 1 of the 996 barley lines were excluded in the optimal model training set Predictive set legend: Unpredictive set unpredictive set prediction Validation 0 10 20 30 40 50 60 Number of Environments number of environments Excluded excluded from the Training predictive Population set

Genomic Selection Experiments in the Cornell Wheat Breeding Program Elliot Heffner Genomic Selection in Biparental Populations: Heffner et al. 2011. Crop Science 51: doi: 10.2135 Genomic Selection Across Multiple Families in the Breeding Program: Heffner et al. 2011. The Plant Genome 4: 65 Jessica Rutkoski Genomic Selection for Adult Plant Stem Rust Resistance Genomic Selection for Fusarium Head Blight Resistance Recurrent Genomic Selection

Genomic Selection Experiments in the Biparental Populations Cornell Wheat Breeding Program Elliot Heffner Cayuga x Caledonia (Soft White Winter) 209 DH lines; 399 DArT Markers Preharvest Sprouting 16 environments (6 years) 9 Milling and Baking Quality traits - 3 years, one location per year Foster x KanQueen (Soft Red Winter) 175 DH lines; 574 DArT Markers 9 Milling and Baking Quality traits - 3 years, one location per year Across Multiple Families - Master Nursery 400 advanced breeding lines (F7+) Augmented field design Three locations over 3 years, 2007-2009 DArT markers ~ 1500 polymorphisms 13 agronomic traits

Evaluation Methods for GS in Biparental Populations Cross Validation Model Training based on one year, validation on remaining (different) years Lines in Validation Population (VP) are not included in the Training Population (TP) Training Population Sizes = 24, 48, 96 Marker Number = 64, 128, 256, 384, 399, 574 Prediction Models MLR: Multiple Linear Regression - Forward/Backward p<0.2 RR: Ridge Regression BLUP - equal variance, all markers BayesCpi: Equal variance, optimized pi for non-zero markers Phenotypic Prediction Accuracy Based on the Correlation of Phenotypes from TP year with the VP Phenotypes in the Validation Years

Marker PA / Pheno PA 0.90 0.80 0.70 0.60 Relative Marker-Based Prediction Accuracy TP=96 Biparental Populations 0.84 0.82 0.73 0.72 0.61 0.62 0.60 0.58 0.57 0.50 0.40 0.50 0.45 0.45 MLR RR Bcpi 0.30 0.20 0.10 Phenotypic 0.00 PA 0.82 0.73 0.85 0.84 PHS Flour Yield Test Weight Protein

Accuracy Biparental Populations GS accuracy 47% > MLR

Biparental vs. Multi-Family GS Biparental 1) Population specific 2) Reduced epistasis 3) Reduced number of markers required 4) Smaller training populations required 5) Balanced allele frequencies 6) Best for introgression of exotic germplasm Multi-Family 1) Allows prediction across a broader range of adapted germplasm 2) Allows sampling of more environments 3) Cycle duration is reduced because retraining model is on-going 4) Allows larger training populations 5) Greater genetic diversity

Two conventional MAS models With or without the Kinship Matrix as a covariate Two- stage analysis Association analysis - test for significant loci Multiple linear regression estimate effects of significant markers Four GS models Model Marker effects All markers have a non-zero effect Markers have equal variances Association w/o K Fixed NA Association w/ K Fixed NA Ridge Regression Random BayesA Random BayesB Multi-Family GS Models Random BayesCπ Random

Results MAS model accuracy Model w/o Kinship (0.48) was similar to model with Kinship (0.44) Small range in GS model accuracy 0.60 (BA); 0.59 (RR and BB); 0.58 (BC) Method MAS (AA) Low High Mean MarkerSel/ PhenoSel 0.19 0.65 0.48 0.75 r GS was 25% greater than r MAS r PS 33% greater than r MAS GS (BA) 0.22 0.76 0.60 0.95 Pheno 0.21 0.89 0.64 ------- r PS 7% greater than r GS

GS Accuracy/PS Accuracy Relative GS and PS Prediction Accuracy: Multi-Family TP Size = 288 Marker Number =1158 r GS / r PS 1.20 1.00 0.80 0.60 0.40 0.20 0.00 GS/PS 0.84 H E A D D A T E 0.88 H E I G H T 0.92 0.92 F L O U R Y I E L D P R O T E I N 0.96 1.01 NS 1.08 1.09 0.75/0.89 0.75/0.85 0.76/0.83 0.45/0.49 0.59/0.61 0.57/0.57 0.28/0.26 0.22/0.21 P R E H R V S P T T E S T W E I G H T L O D G I N G * Y I E L D NS

Cornell Wheat Breeding Program Recurrent Selection Stage GS Models emphasizing additive genetic variation S 0 Self Winter Crossing Spring Plant & Harvest GH Crossing & Selfing Cycle Fall Plant & Harvest Summer Crossing S 0 Self Evaluation Stage F3 Field Advance Bulk or row-pheno, MAS GEBV + Phenotype F2 GH/Field Advance SSD/bulk, MAS F4 Field Advance Bulk or plot-pheno, MAS Training Population F4:5, Master N., phenotype, select uniform spikes, genotype 1 or 3 Locations, GS+PS Advanced Regional Testing Inbreeding Seed Increase GS Models capturing non-additive genetic variation Planting GS-selected individuals (S 0 ), intermating, self-pollination and S1 harvesting seed occur twice a year F 2 s can be field or GH planted depending on space available. MAS &/or GS can be applied F 3 s will begin seed increases of selected individuals in single rows F 4 s will be phenotyped and 50 F 5 spikes selected for uniformity and for GBS genotyping Selected lines enter the Master Training Nursery Each year selected lines are entered in the regional trials

Genomic selection for quantitative disease resistance in wheat Jessica Rutkoski Stem rust of wheat, Puccinia graminis f.sp. tritici Fusarium head blight of wheat Fusarium

Evaluation of GS Models for Fusarium Head Cooperative Nursery Data 3 years, 3 nurseries, 60 environments Blight Resistance (Rutkoski et al The Plant Genome In press) Traits: Incidence (Inc), Severity (Sev), Fusarium damaged kernels (FDK), Incidence/Severity/Kernel quality index (ISK), Deoxynevalenol (DON), Heading Date (HD) M. McMullen GS Models: Ridge Regression, Bayesian LASSO, Reproducing Kernel Hilbert Spaces, Random Forest, Association mapping with a Q+K matrix+multiple linear regression with markers as fixed effects M. McMullen Population Size: Training 132, Predicted 33 Cross validation: 5 fold, repeated 10 times J. Miller

Accuracy Model Accuracies for Fusarium Head Blight Resistance 0.7 0.6 0.5 0.4 0.3 0.2 RF RKHS RR BL MLR 0.1 0 HD FDK DON Inc Sev

Accuracy Comparison of Prediction Accuracies for DON (Rutkoski et al The Plant Genome In press) 0.7 0.6 0.5 0.479 a 0.522 b 0.483 a 0.467 0.534 b 0.545b 0.571 0.641 0.594 0.4 0.3 0.293 0.2 0.1 0 DArT markers=2,400 QTL=SSR markers=8 (30) linked to 5 QTL Prediction Method Correlated traits: Incidence (INC), fusarium damaged kernels (FDK), severity (SEV), and the index (ISK) Most accurate GS model: Random forest (RF) RF prediction model combining markers and correlated traits

Durable Rust Resistance in Wheat Funded by the Bill & Melinda Gates Foundation Since the 1999 discovery of virulence on Sr31 in Uganda, Ug99 races have overcome Sr36 and Sr24 and spread north to Iran and south to South Africa

Stem Rust Resistance- Two Types: All-stage resistanceconferred by major R- genes that provide complete resistance Adult plant resistance- conferred by additive loci that alone do not provide complete resistance VS.

Recurrent Genomic Selection for Adult Plant Resistance to Stem Rust Jessica Rutkoski

Summary: Genomic Selection GS differs from MAS and Association Breeding in that the underlying genetic control and biological function is not known. Breeders can implement GS without the upfront cost of obtaining that knowledge. GS preserves the creative nature of phenotypic selection to sometimes arrive at solutions outside the engineer s scope. Most important advantages are reductions in the length of the selection cycle and associated phenotyping cost resulting in greater genetic gain per year.

Acknowledgements Special Thanks to Pioneer Hi-Bred for Supporting these Symposia & Graduate Training Bill & Melinda Gates Foundation grant to Cornell University for Borlaug Global Rust Initiative Durable Rust Resistance in Wheat. Bill & Melinda Gates Foundation grant to Cornell University & CIMMYT to implement a pilot project on genomic selection in wheat & maize breeding programs USDA Cooperative State Research, Education and Extension Service, Coordinated Agricultural Project 2005-05130-Wheat Applied Genomics USDA National Institute of Food and Agriculture, NRI Triticeae Coordinated Agricultural Project 2011-68002-30029 Improving Barley and Wheat Germplasm for Changing Environments USDA National Needs Fellowship Grant 2005-38420-15785: Provided Fellowship for Elliot Heffner USDA National Needs Fellowship Grant 2008-38420-04755: Provided Fellowship for Jessica Rutkoski Limagrain Europe (France) provided support for Nicolas Heslot US Wheat & Barley Scab Initiative: Provided funding for the fusarium head blight research

Questions?