Mapping and selection of bacterial spot resistance in complex populations. David Francis, Sung-Chur Sim, Hui Wang, Matt Robbins, Wencai Yang.

Background Populations Analysis of phenotype (2007 and 2008) Association Mapping Single Marker analysis of variance Two changes to the model used for analysis: Account for population structure Use linked marker haplotypes to gain information (IdBS and IdBD)

Take home messages: A) Genotyping throughput and reagent packaging favors working with very large populations (~480) B) Measuring traits (Phenotyping) is the limiting factor to C) For elite polpulations, marker number and the ability to distinguish IBD from IBS (Linkage phase and LD) are limitations. D) Incorporating pedigree data or population structure data into analysis improves QTL detection and the efficiency of MAS (defined as relative efficiency of selection). E) We can detect some known QTL, but not all known QTL in complex populations. F) Phenotypic selection is effective.

Bacterial Spot is a disease complex caused by ~4 species of Xanthomonas bacteria. There are physiological races. Sources of resistance are mostly close relatives of cultivated tomato Solanum lycopersicum or Solanum pimpinellifolium. Hawaii 7998 (T1) Hawaii 7981 (T3) PI128216 (T3) PI 114490 (T1, T2, T3, T4)

Bacterial spot QTL discovery in IBC Populations Ohio, T2 & T1 FL, T3 and T4 Brasil? In 2001 T3 2002-2004

We have IBC lines and IBC x elite lines that look good and we want to integrate them with the Elite breeding program. Project was designed to: Develop populations to combine QTL for resistance to multiple races Validate Marker-QTL associations in order to assess feasibility of MAS

Genes Parents Rx-3 Rx-4(11) QTL11 QTL11?? OH75 FL82 K64 OH86 OH74 MR13 OH75 FL82 K64 OH74 MR13 OH75 F1-1 F1-2 F1-3 F1-4 FL82 F1-5 F1-6 K64 F1-7 F1-8 OH74 F1-9 F1-1 F1-2 F1-3 F1-4 F1-5 F1-6 F1-7 F1-1 X X X F1-2 X X X X F1-3 X X X X Population consisting of 11 independent crosses, progeny segregate

First segregating generation: grow ~100 plants in the field (total populations size 1,100) and select plants from each extreme (n = 110) 12 10 8 6 4 2 0

Following year: Evaluate plots RCB, two replicates, rating based on a plot (not single plant), scale 1-12.

Phenotypic evaluation (Focus on T1). Selection conducted in 2007 was predictive of plot performance in 2008 based on both nonparametric analysis and analysis of variance (p < 0.0001). Heritability estimated from the parent-offspring regression suggests a narrow sense heritability of 0.32. Plants rated as resistant in 2007 produced plots with an average disease rating of 4.02 in 2008; plants rated as susceptible produced plots with an average disease rating of 5.16 in 2008 (LSD 0.39). Realized gain under selection ~13% decrease in disease OH75 rated 3.5; OH88119 rated 9.0

Marker analysis using The Unified Mixed Model Buckler Lab, TASSEL Y = μ REPy + Qw + Markerα + Zv + Error Sequence variation linked to traits

%macro Mol(mark); proc mixed data = three; class &mark gen rep; model T1 = &mark / solution; random gen rep; %mend; Markerα %Mol(TOM144); %Mol(CT10737I); %Mol(CT20244I); %Mol(pto); %Mol(SL10526); %Mol(rx3);

8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Rx-3 30 40 50 60 70 80 90 single-point analysis

Adding matrix of population structure can correct for background effects and can add insight to which crosses, pedigrees, subpopulations have highest breeding value Y = μ REPy + Qw + Markerα + Zv + Error

Qw Pedigree information Proportion of genome from a parent (pedigree) Designation of cross (0/1) Q Matrix from Structure gen subpop1 subpop2 subpop3 subpop4 subpop5 subpop6 6111R1 0.129 0.128 0.016 0.696 0.016 0.015 6111R2 0.671 0.088 0.016 0.184 0.015 0.026 6111R3 0.934 0.013 0.011 0.015 0.007 0.019 6111S1 0.88 0.051 0.009 0.019 0.009 0.032 6111S2 0.456 0.213 0.048 0.22 0.014 0.049 6115S3 0.077 0.018 0.53 0.027 0.008 0.341 6115S4 0.018 0.016 0.908 0.024 0.008 0.026 6117R1 0.86 0.01 0.012 0.1 0.006 0.012 6117R2 0.392 0.011 0.264 0.055 0.011 0.267 6117S1 0.205 0.016 0.481 0.227 0.008 0.063 6117S2 0.156 0.035 0.193 0.426 0.011 0.179 6117S3 0.016 0.009 0.922 0.029 0.014 0.011 6117S4 0.227 0.015 0.317 0.28 0.009 0.152 6124R1 0.015 0.079 0.766 0.063 0.008 0.069 6124R2 0.016 0.033 0.526 0.4 0.01 0.014

%macro Mol(mark); proc mixed data = three; class &mark gen rep; model T1 = OH75 FL82 K64 OH86 OH74 &mark / solution; random gen rep; %mend; %Mol(TOM144); %Mol(CT10737I); %Mol(CT20244I); %Mol(pto); %Mol(SL10526); %Mol(rx3); Qw Markerα

8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Rx-3 30 40 50 60 70 80 90 single-point analysis single-point analysis corrected for population structure

M1 M2 OH75: 1, R, 1 M1 Rx-3 M2 M1 rx-3 M2 OH86: 0, S, 1 FL82 1, S, 0 M1 rx-3 M2 Reality check: Markers are identical by state but not by descent (presumably because of LD decay). Solution is to use haplotypes.

proc mixed data = three; class mark1 mark2 gen rep; model T1 = mark1*mark2 OH75 FL82 K64 OH86 OH74 / solution; random gen rep; M1 M2 M3 M4 M5 M6 M1*M2, M2*M3, M3*M4, M5*M6 Interactions term defines haplotypes

8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Rx-3 30 40 50 60 70 80 90 single-point analysis single-point analysis corrected for population structure indicates haplotype analysis haplotype analysis corrected for population structure.

Interval P to S L 10526 E s timate S D DF t value Pr > t Pto*SL10526 0 0 3.76 0.531 96 7.09 <.0001 Pto*SL10526 0 2 3.99 0.624 96 6.41 <.0001 Pto*SL10526 2 0 3.22 0.375 96 8.59 <.0001 Pto*SL10526 2 2 6.14 0.501 96 12.26 <.0001 Pto*SL10526 1 0 4.35 0.395 96 11.01 <.0001 Pto*SL10526 1 2 5.48 0.470 96 11.65 <.0001 Pto*SL10526 1 1 7.39 0.975 96 7.59 <.0001

Genome-Wide Scan C hr. Marker F value P r > F 1 S L10945 2.48 0.0894 2 S L10649 0.03 0.974 2 SL10771 0.14 0.869 3 SL10910 0.37 0.6908 3 SL10736 1.33 0.2515 3 SL10494 0.22 0.6385 3 SL10425 1.17 0.3161 3 SSR601 0.05 0.828 4 SL10322 6.03 0.0034 4 SL10888 1.29 0.2803 6 SL10401 0.18 0.8362 6 SL10187 0.11 0.8935 7 SL20017 0.62 0.5377 9 SL10024 1.02 0.3651 9 LEOH8.4 0.25 0.779

We can detect resistance conferred by the Rx-3 locus on chromosome 5 We cannot detect QTL on chromosome 11 We can detect a strong interaction between Ha7998 QTL on 11 and Rx-3 on 5 (data not shown) What needs to happen to improve prospects for whole genome discovery and/or selection? Best More markers Worst (breeding pop) Larger populations F = Gen/Error (non-replicated) F = Gen/Gen(Marker) (replicated) Worst (genetic pop)

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Discovery populations: Magnitude of difference between R and S is large Gen(Marker) variation moderate Breeding populations Difference between R and S is moderate Gen(Marker) variation is moderate Detecting significant marker trait associations is more difficult when magnitude of difference between genotypic classes is reduced

Take home messages: A) Genotyping throughput and reagent packaging favors working with very large populations (~480) B) Measuring traits (Phenotyping) is the limiting factor to (scoring larger populations will minimize Gen(Marker) error) C) For elite populations, marker number and the ability to distinguish IBD from IBS (Linkage phase and LD) are limitations. (haplotypes) D) Incorporating pedigree data or population structure data into analysis improves QTL detection and the efficiency of MAS (defined as relative efficiency of selection). E) We can detect some known QTL, but not all known QTL in complex populations. (Marker analysis is still more descriptive than predictive) F) Phenotypic selection is effective.

Acknowledgments Francis Group Matt Robbins Sung-Chur Sim Troy Aldrich Collaborators, CAU Hui Wang Wencai Yang Collaborators, UFL Jay Scott Sam Hutton Funding USDA/NRI OARDC RECGP matching funds grant; MAFPA Collaborators, OSU Esther van der Knaap Bert Bishop Tea Meulia Sally Miller Melanie Lewis Ivey Collaborators, UCD Allen Van Deynze Kevin Stoffel Alex Kozic