OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS

OPTIMIZATION OF BREEDING SCHEMES USING GENOMIC PREDICTIONS AND SIMULATIONS Sophie Bouchet, Stéphane Lemarie, Aline Fugeray-Scarbel, Jérôme Auzanneau, Gilles Charmet INRA GDEC unit : Genetic, Diversity and Ecophysiology of Crops (bread wheat mainly) DGS group : Genetic Diversity and Breeding

pyramid favorable alleles for several traits in one elite line Yield Optimal heading date Disease tolerance Drought and heat tolerance Cold tolerance Lodging tolerance N2 use efficiency High protein content for export Good baking ability for domestic Breeding objectives: Ranking of the lines, based on phenotypic and/or molecular scores Balance genetic gain vs economic gain

BREEDING AND REGISTRATION AT INRA- AGRIOBTENTIONS National Institute for Research in Agronomy

10 ans Schéma de sélection classique blé 300 crosses F1 F2 F3 F2 600 000 pl 10 6 pl 10 5 pl F4 10 4 pl 1 site F5 10 3 pl 3 sites F6 100 lines 5 sites F7 F8 F9 Elite Line 20 lines 5 sites 3 lines 15 sites Registration (10 000 euros)

Trial Network INRA-AgriObtentions Blé tendre d'hiver Sud - Récolte 1999 3 breeders at INRA 1 breeder at AO 15 sites AB AB AB AB F1-F5 AB F6-F7 AB Trials for organic label AB F8-F9 for registration Trials for organic label Superficie (ha) 100000 à 192000 (17) 50000 à 100000 (22) 20000 à 50000 (17) 3000 à 20000 (18) 0 à 3000 (20) Check for homozigosity 3 lines registered each year, including one with the organic label (low inputs)

Registration Seed multiplication by Agri-Obtentions Ranking of lines : public organization (GEVES) and technical experts (CTPS) (http://www.geves.fr/ ) check novelty compared to registered lines Homogeneity Stability Quality and environmental criteria Bread quality class Bread quality score Yield Superior (BPS) Score > 250 > 102% controls (elite references) Good (BP) 220 < score < 250 > 104% controls Intermediate (BAU) Score < 220 > 107% of controls

GENETIC RESSOURCES

20 years data base of phenotypes and genotypes from the breeding program (F8-F9) 2000 15 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2001 13 28 2002 7 13 22 2003 2 3 10 37 2004 1 2 4 8 32 2005 1 1 2 3 9 24 2006 1 1 1 1 6 12 38 2007 1 1 2 2 5 6 17 58 2008 1 1 1 1 3 2 6 21 47 2009 1 1 1 1 2 1 3 5 18 44 2010 1 1 1 1 3 2 3 4 8 16 55 2011 0 0 0 0 1 0 1 2 4 5 17 48 2012 0 0 0 0 1 0 0 1 2 3 6 19 55 2013 0 0 0 0 1 0 0 1 2 3 3 11 29 75

MAXIMIZE GENETIC GAIN

Breeder s equation ΔG = (i * r * σ A )/t Genetic gain Intensity of selection Accuracy of selection Genetic variance Generation intervals GS GS GS GS GS Optimize accuracy of selection (number of markers, population of calibration) Monitor intensity of selection, introduction of genetic diversity (genetic variance), Decrease generation intervals 10

EXPONENTIAL INCREASE OF COMPUTING AND DATA BASES CAPACITY

EXPONENTIAL ACCESS TO GENOTYPES AND SEQUENCES DATA

Genome-wide characterization of genomes at low cost

BREAK DOWN STATISTICAL BARRIERS High dimensional statistics

Genomic prediction Calibration Step Cross validation in a panel phenotyped with the calibration panel k-1 fold 1 fold Validation panel : Different panel or panel phenotyped in a different environment Population of calibration Phenotypes Genotypes Validation Step Population of validation Genotypes Breeding Step Population of candidates Genotypes GS model Calibration of the model Calibrated model GEBVvalidation GEBVcandidate y Z e estimate the vector β of marker effects Predictive ability = cor (y, GEBV) 16 GEBV =Z β ranking of candidates

1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 Yield t/ha NEEDS FOR NEW METHODOLOGIES FOR BREEDING High dimensional statistics 9 40 years evolution of yield in EU 8 7 6 5 4 France Germany United Kingdom 3 Harvest year

sequencing At which relevant step use molecular markers and genomic predictions? Germplasm Elite QTLs/GWAS Detection of traces of selection High-throughput genotyping Double Haploids Crosses Selfing Recombinations Selection Select candidates Virtual crosses Hight-throughput phenotyping Nb lines Preliminary field experiments GS Advanced experiments Advanced experiments GS Precision genotyping / phenotyping Nb sites Elite Trials Elite Trials Elite Trials G x E MAS GS

Comparison of two virtual breeding schemes PS scheme GPS scheme NP NP NC NC Year 1 cross and DH cross and DH N2 N2 Year 2 Multiplication Selection α 2 Multiplication Selection α 2 N3 N3 Year 3 Plots + Multiplication Selection PS α 3 Plots + Multiplication Selection GS α 3 N4 N4 Year 4 Plots + Multiplication Selection PS α 4 Plots + Multiplication Selection PS α 4 NP NP

Evaluation of breeding cost Step Abbreviation Cost (euros) Cross CC 100 DH DH 100 Multiplication year 2 CM2 10 Multiplication year 3 CM3 50 Multiplication year 4 CM4 100 Plot year 3 CP3 100 Plot year 4 CP4 500 Genotyping CG 50 Cost per year CY To be fixed

Parameters of the breeding scheme to ajust Parameter Number of cycles Total cost for K cycles Number of parents Number of crosses Number of plants year2 Number of plants year3 Number of plants year4 Selection rate year 2 Selection rate year 3 Selection rate year 4 Global selection rate Abbreviation K CT NP NC N2 N3 N4 α2 = N3/N2 α3 = N4/N3 α4 = Np/N4 α34 = NP/N3 = α3. α4

Calculate N2, N3 et N4 for a given breeding strategy and a fixed cost per year K.CY= with α 34 = α 3. α 4 α 3 = α 34 λ α 4 = α 34 1-λ

Simulations Data base Nref of 700 lines, genotyped and phenotyped Number of crosses Nmarker=4000 Number of QTLs NQTL=100 Heritability h 2 = 0.4 or h 2 = 0.7 Number of crosses NC=300 Number of parents NP=100 λ = 0.25 or λ = 0.75

Simulation Parameters CY (euros) λ α 2 α 3 PS α 3 GPS α 4 PS α 4 GPS α 34 PS α 34 GPS 3.750.000 0.75 0.2 0.02 0.02 0.26 0.25 0.005 0.004 3.750.000 0.25 0.2 0.28 0.27 0.02 0.02 0.006 0.005

Number of plants per generation λ PS.N2 GPS.N2 PS.N3 GPS.N3 PS.N4 GPS.N4 0.75 105296 122558 21059 24512 381 396 0.25 86487 96626 17297 19325 4770 5183

Number of plants per cross λ PS.n2 GPS.n2 PS.n3 GPS.n3 PS.n4 GPS.n4 0.75 351 409 70 82 1.27 1.32 0.25 288 322 58 64 15.90 17.28

Simulation of crosses and DH progeny with CO Crossing Overs

Simulate crosses and DH progeny with CO

RESULTS

α 3 PS = 0.28 α 4 PS = 0.02 α 3 PS = 0.02 α 4 PS = 0.26

α 3 GPS = 0.27 α 4 GPS = 0.02 α 3 GPS = 0.02 α 4 GPS = 0.26

α 3 GPS = 0.27 α 4 GPS = 0.02 Prediction of the best parent combinations

Conclusions Only 4000 markers: the advantage of GS should increase with the number of markers The best genetic gain for GPS compared to PS is at the first step : we predict the best combination of parents Diversity is decreasing and genetic gain stagnating after 20 years of selection: we should re-introduce some diversity at some point With high number of markers and crosses: need to code functions in C++ instead of R

PERSPECTIVES Predict crosses

Challenges Multi-trait, uncorrelated (yield and quality, protein content) Predict crosses between elites and «exotics» GxE

Predict crosses Increase genetic grain for complex traits while Monitoring diversity

Phenotyping stragegy to estimate marker effects for lines with low agronomic value Breding value per se Haploid value (OHV: Daetwyler et al, 2015) Get favorable major effect alleles from an elite line before phenotyping in order to reveal intermediate effect favorable alleles Longin et al, 2014

Can technology (GS) replace men (breeders)? GS + breeder expertise > breeder expertise >>>>>>>>>>> GS

THANKS TO THE BREEDERS FOR SHARING THEIR DATA AND INTEREST François-Xavier Oury Emmanuel Heumez Bernard Rolland Jérome Auzanneau

Website: https://symposium.inra.fr/eucarpia-cereal2018

2nd International Wheat Innovation Workshop (IWIW)

THANKS FOR YOU ATTENTION