METODOLOGIE INTEGRATE PER LA SELEZIONE GENOMICA DI PIANTE ORTIVE SELEZIONE GENOMICA

Size: px
Start display at page:

Download "METODOLOGIE INTEGRATE PER LA SELEZIONE GENOMICA DI PIANTE ORTIVE SELEZIONE GENOMICA"

Transcription

1 CORSO GENHORT METODOLOGIE INTEGRATE PER LA SELEZIONE GENOMICA DI PIANTE ORTIVE SELEZIONE GENOMICA Marzo 2014 Docente: Pasquale Termolino

2 Selezione assistita (MAS-GWAS) Marker assisted selection (MAS) refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening

3 CONVENTIONAL PLANT BREEDING P 1 x P 2 Recipient Donor F 1 F 2 large populations consisting of thousands of plants PHENOTYPIC SELECTION Salinity screening in phytotron Glasshouse trials Bacterial blight screening Field trials Phosphorus deficiency plot

4 MARKER-ASSISTED BREEDING Susceptible P 1 x P 2 Resistant F 1 F 2 large populations consisting of thousands of plants MARKER-ASSISTED SELECTION (MAS) Method whereby phenotypic selection is based on DNA markers

5 Advantages of MAS Simpler method compared to phenotypic screening Especially for traits with laborious screening May save time and resources Selection at seedling stage Important for traits such as grain quality Can select before transplanting in rice Increased reliability No environmental effects Can discriminate between homozygotes and heterozygotes and select single plants

6 Potential benefits from MAS more accurate and efficient selection of specific genotypes May lead to accelerated variety development more efficient use of resources Especially field trials Crossing house Backcross nursery

7 Overview of marker genotyping (1) LEAF TISSUE SAMPLING (2) DNA EXTRACTION (3) PCR (4) GEL ELECTROPHORESIS (5) MARKER ANALYSIS

8 Considerations for using DNA markers in plant breeding Technical methodology simple or complicated? Reliability Degree of polymorphism DNA quality and quantity required Cost** Available resources Equipment, technical expertise

9 Markers must be tightly-linked to target loci! Ideally markers should be <5 cm from a gene or QTL Marker A RELIABILITY FOR SELECTION 5 cm QTL Using marker A only: 1 r A = ~95% Marker A Marker B 5 cm QTL 5 cm Using markers A and B: 1-2 r A r B = ~99.5% Using a pair of flanking markers can greatly improve reliability but increases time and cost

10 Markers must be polymorphic RM84 RM P 1 P 2 P 1 P 2 Not polymorphic Polymorphic!

11 DNA extractions Mortar and pestles Porcelain grinding plates LEAF SAMPLING Wheat seedling tissue sampling in Southern Queensland, Australia. High throughput DNA extractions Geno-Grinder DNA EXTRACTIONS

12 DNA markers Generated by using Polymerase Chain Reaction Preferred markers due to technical simplicity and cost PCR Buffer + MgCl 2 + dntps + Taq + Primers + DNA template PCR THERMAL CYCLING GEL ELECTROPHORESIS Agarose or Acrylamide gels

13 MAS BREEDING SCHEMES 1. Marker-assisted backcrossing 2. Pyramiding 3. Early generation selection 4. Combined approaches

14 Marker-assisted backcrossing (MAB) MAB has several advantages over conventional backcrossing: Effective selection of target loci Minimize linkage drag Accelerated recovery of recurrent parent Target locus TARGET LOCUS SELECTION RECOMBINANT SELECTION BACKGROUND SELECTION FOREGROUND SELECTION BACKGROUND SELECTION

15 Pyramiding Widely used for combining multiple disease resistance genes for specific races of a pathogen Pyramiding is extremely difficult to achieve using conventional methods Consider: phenotyping a single plant for multiple forms of seedling resistance almost impossible Important to develop durable disease resistance against different races

16 Process of combining several genes, usually from 2 different parents, together into a single genotype Breeding plan Genotypes P 1 Gene A x P 1 Gene B P 1 : AAbb x P 2 : aabb F 1 Gene A + B F 1 : AaBb F 2 MAS Select F2 plants that have Gene A and Gene B F 2 AB Ab ab ab AB AABB AABb AaBB AaBb Ab AABb AAbb AaBb Aabb ab AaBB AaBb aabb aabb ab AaBb Aabb aabb aabb Hittalmani et al. (2000). Fine mapping and DNA marker-assisted pyramiding of the three major genes for blast resistance in ricetheor. Appl. Genet. 100: Liu et al. (2000). Molecular marker-facilitated pyramiding of different genes for powdery mildew resistance in wheat. Plant Breeding 119:

17 Early generation MAS MAS conducted at F2 or F3 stage Plants with desirable genes/qtls are selected and alleles can be fixed in the homozygous state plants with undesirable gene combinations can be discarded Advantage for later stages of breeding program because resources can be used to focus on fewer lines References: Ribaut & Betran (1999). Single large-scale marker assisted selection (SLS-MAS). Mol Breeding 5:

18 Susceptible P 1 x P 2 Resistant F 1 F 2 large populations (e.g plants) MAS for 1 QTL 75% elimination of (3/4) unwanted genotypes MAS for 2 QTLs 94% elimination of (15/16) unwanted genotypes

19 PEDIGREE METHOD P1 x P2 F1 SINGLE-LARGE SCALE MARKER- ASSISTED SELECTION (SLS-MAS) P1 x P2 F1 F2 Phenotypic screening F2 MAS F3 Plants spaceplanted in rows for individual plant selection F3 Only desirable F3 lines planted in field F4 Families grown in progeny rows for selection. F4 Families grown in progeny rows for selection. F5 F5 Pedigree selection based on local needs F6 Preliminary yield trials. Select single plants. F6 F7 Further yield trials F7 F8 F12 Multi-location testing, licensing, seed increase and cultivar release F8 F12 Multi-location testing, licensing, seed increase and cultivar release Benefits: breeding program can be efficiently scaled down to focus on fewer lines

20 Combined approaches In some cases, a combination of phenotypic screening and MAS approach may be useful 1. To maximize genetic gain (when some QTLs have been unidentified from QTL mapping) 2. Level of recombination between marker and QTL (in other words marker is not 100% accurate) 3. To reduce population sizes for traits where marker genotyping is cheaper or easier than phenotypic screening

21 Marker-directed phenotyping (Also called tandem selection ) Recurrent Parent P 1 (S) x P 2 (R) F 1 (R) x P 1 (S) Donor Parent BC 1 F 1 phenotypes: R and S Use when markers are not 100% accurate or when phenotypic screening is more expensive compared to marker genotyping MARKER-ASSISTED SELECTION (MAS) PHENOTYPIC SELECTION SAVE TIME & REDUCE COSTS *Especially for quality traits* References: Han et al (1997). Molecular marker-assisted selection for malting quality traits in barley. Mol Breeding 6:

22 Selezione assistita (MAS-GWAS) Genome-wide Association Studies (GWAS) refers to a genome scale marker scanning for phenotypes

23 GWAS Geno Pheno & Geno Pheno Genome-wide association study (GWAS) Gene identification Finding association signals with a large set of SNPs across diverse germplasm Array-based genotyping & resequencing

24 Opportunities of GWAS An additional strategy/tool in gene identification Initiation/validation QTL cloning Hypothesis of new functions of known genes New genes/pathways The ability of rapidly nailing down genes for human diseases with relatively simple inheritance is impressive! Appreciation of the complexity and beauty of the natural variation Genetic diversity y = Xb + Sa + Qv +Zu + e α σ Σ Methodology development Has the potential to offer a global view of genetic architecture of complex traits r yg o u o s p s M a n a n D p = 1e-7 Genetics of complex traits r2 Many groups A A C T G G Genomic technology

25 Challenges of GWAS Population structure and relative kinship Human genetics, plant and animal genetics Structure; Mixed model QK; Dimension determination/model testing; Accuracy of variance-covariance matrix, R2LR for mixed model Yu et al, 2006.A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38, a False positive rate Simple nmds PCA+K PCA K nmds+k y = Xβ + Sα + Qv + Zu + e 0.00 I II III IV V Sample type Trait values Environments, etc. Q model K model Candidate SNP effects Subpopulation effects Background Genetic effects, Var(u) = 2KVA b Average power Simple nmds PCA+K PCA K nmds+k QK model Simple model I II III IV V Sample type Zhu and Yu 2009, Genetics 182:

26 Challenges of GWAS Computational demand for mixed model P3D, Compression, EMMA, EMMAX, and Fast-LMM GEMMA, MLMM (Online first Nature Genetics) Multiple testing issue and significance threshold Missing heritability Rare allele and epistasis Genome, genetics, gene, coding region, allele, hapoltype, SNP, structural variation, etc. Blame complex biology and genetics

27 GWAS in Plants Genotyping:Atwell et al. Nature : Resequencing: Zhao et al. PLoS Genet :e3 Resequencing: Huang et al. Nat Genet : Genetic design + Resequencing: Tian et al. Nat Genet : Genetic design + Resequencing: Kump et al. Nat Genet :

28 Critical Advances With next generation sequencing technologies, exome sequencing or whole genome resequencing is now possible (Shendure and Ji, 2008; Ansorge, 2009; Ng et al., 2010). Biological functions of nucleotide polymorphisms can be predicted with the context sequence of genes (Ramensky et al., 2002; Kumar et al., 2009). Attention has been given to the rare allele issue (Cohen et al., 2004; Bodmer and Bonilla, 2008; Nejentsev et al., 2009) and some specific statistics have been developed to assess the significance of rare variants (Morgenthaler and Thilly, 2007; Li and Leal, 2008; Madsen and Browning, 2009; Morris and Zeggini, 2010). Genome databases and gene networks have been developed to aid the search and confirmation processes of gene-trait associations (Lee et al., 2008; Lee et al., 2010; Lee et al., 2010).

29

30

31 Marker Assisted Selection QTL analysis has produced great advances in plant breeding Rex Bernardo Crop Sci. 48: (2008). at least 10,000 marker-trait associations in different plant species have been reported exploiting the QTL that have been mapped has not been routinely done R = ihσ g

32 The breeders equation R = h 2 S. R = ihσ g R h 2 S response to selection heritability really just selection differential a regression coefficient i σ g standardised selection differential additive genetic variance

33 Ways to increase response to selection: target increase i reduce time increase h reduce costs increase σ g method test more lines out of season test more plots per line indirect selection (markers or traits) indirect selection (eg grain quality) wide crosses / mutation

34 Size isn t everything #1: increasing i is not cost effective. 5 response to selection log 10 population size 10 à 100 increases response by 63% 100 à 1,000 29% 1,000 à 10,000 19%

35 Size isn t everything #2: increasing scale is not cost effective. Response to selection: Vg= 0.1, Ve = response no. of replicates

36 Double the speed, double the response Methods: 1. Out of season nurseries. 2. Make crosses in advance of results. Breeders already do this, implicitly. Explicit schemes: accelerated recurrent selection 3. Marker assisted breeding Not for polygenic traits so far. Genomic selection?

37 Marker Assisted Selection select on phenotype alone R = select on markers alone R = ih p2 σ p ir g h m h p σ p For MAS to give a greater response than phenotypic selection ir g h m h p σ p > ih p2 σ p r g h m > h p but since h m 2 r g = 1 (assuming no genotype errors) > h p r g 2 > h 2 p The genetic correlation coefficient squared between marker index must be higher that the heritability of the phenotype. and genotype

38 When does selection on markers work well? 2 2 r g > h p Classic MAS selects for specific tagged loci. Good if: most Vg controlled by a very small low heritability trait expensive to score number of tagged QTL trait scored post reproduction quicker GS is more flexible: no requirement for large gene effects no requirement to tag individual QTL but requires many cheap markers

39 Marker Assisted Selection Index selection Combine markers and phenotype information. molecular score Still need accurate assessment of markers or can make things worse. genomic selection. The next thing.

40 MAS on quantitative traits Lande & Thompson Efficiency of Marker-Assisted Selection in the Improvement of Quantitative Traits. Benchmark treatment of MAS for quantitative traits in the context quantitative genetics and selection theory. of Proposed method never caught on: marker density? problems in selecting associated markers?

41 Marker Selection: The winner s curse. (The Beavis effect in linkage analysis.) With multiple QTL of small effect, some get lucky and are detected. These are genuine QTL, but their effect is overestimated. E.g. A mapping experiment with 101 genes & h 2 = Standardised difference between homozygotes Mean of those detected as sig (p<0.05) = = 100%

42 Genomic (or genome-wide) Selection Genome-wide selection (GS) Prediction, breeding, genetic gain Selection of individuals based on predicted phenotypic values using all markers, rather than only significant markers linked to QTL Integration of genomic technology with plant breeding Bernardo and Yu 2007, Crop Science 47:

43 Genomic selection Select on based on the prediction of breeding values from the information of dense markers covering i the whole h l genome

44 Genomic selection Trait effects of all genes or chromosomal positions are estimated simultaneously without significance testing (eliminates bias). High marker density Estimate trait effect for every marker or interval Statistical problem more markers than individuals Predicting Unobserved Phenotypes for Complex Traits from Whole- Genome SNP Data PLoS Genetics Lee et al correlations between predicted and actual phenotypes are in the range of 0.4 to 0.9. The prediction of unobserved phenotypes from highdensity SNP data and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial breeding programs.

45 Genomic selection Proposed 2001: Meuwissen et al Genetics. Prediction of Total Genetic Genome-Wide Dense Marker Maps Value Using Trait effects of all genes or chromosomal positions are estimated simultaneously without significance testing so there is no bias. Requires high marker density. Statistical problem: more markers than individuals. Estimate a trait effect for every marker or interval.

46 GS Traditional plant breeding programs rely mainly on phenotypes being evaluated in several environments; selection and recombination are based solely on the resulting data plus pedigree information, when available. Marker assisted selection (MAS) uses molecular markers in linkage disequilibrium (LD) with QTL. Genomic selection (GS) is a new approach for improving quantitative traits in large plant breeding populations that uses whole genome molecular markers (high density markers and high throughput genotyping). Genomic prediction combines marker data with phenotypic and pedigree data (when available) in an attempt to increase the accuracy of the prediction of breeding and genotypic values.

47 GS In practice, GS is applied in a population that is different from the reference population in which the marker effects were estimated. Genomic selection uses two types of datasets: a training set and a validation set. The training set is the reference population in which the marker effects were estimated; it contains: (1) phenotypic information from relevant breeding germplasm evaluated over a range of environmental conditions; (2) molecular marker scores; and (3) pedigree information or kinship. Hence, marker effects are estimated based on the training set using certain statistical methods to incorporate this information; the genomic breeding value or genetic values of new genotypes are predicted based only on the marker effect. The validation set contains the selection candidates (derived from the reference population) that have been genotyped (but not phenotyped) and selected based on marker effects estimated in the training set.

48 GS For quantitative traits, selection based on marker effects alone has dramatically changed standard practices used in plant and animal breeding. However, in public plant breeding programs, the benefits of GS have been studied only through computer simulation. Since marker technology is continuously reducing the cost per data point and increasing the number of available markers, genotyping is currently less costly than phenotyping in an applied plant breeding program.

49 How it works? - Reference Population - Population with both phenotypes and genotypes - Analysis of the genotype phenotype associations - Estimation of marker effects - Population of candidats for selection - P Population l with the same associations i - Genotypes - Prediction of the breeding value by using marker effects estimated in the reference population

50 Factor of variation of GS efficiency Two big factors : Accuracy of SNP effect estimation size of reference population heritability LD between markers and QTL marker density effective size of the population => number of «indépendants» Hayes et al 2009) segments Relationship between the candidates and the reference population Statistical Methods

51 A number of methods used 1. G-BLUP : in the conventional BLUP (Best Linear Unbiased Prediction ), replace the pedigree based relationships by the marker based relationships 2. Bayesian Methods (Bayes B, C, R ) : tries to find the SNP in association with QTL and to give a zero value to most SNP without effect

52 Best linear unbiased prediction BLUP best linear unbiased prediction (BLUP) is used in linear mixed models for the estimation of random effects

53 Bayesian inference BAYES Bayesian inference is a method of inference in which Bayes' rule is used to update the probability estimate for a hypothesis as additional evidence is acquired.

54 Comparison of methods Marker Density low => little differences between GBLUP and Bayesian methods, accuracy low to moderate high => saturation of efficiency of GBLUP, whereas Bayesian approaches increase in accuracy Genetic Determinism polygenic : some advantage to GBLUP at least partially oligogenic : advantage to methods Bayesian

55 haplotype A haplotype (from the Greek: ἁπλοῦς, haploûs, "onefold, single, simple") in genetics is a combination of alleles (DNA sequences) at adjacent locations (loci) on a chromosome that are inherited together. A haplotype may be one locus, several loci, or an entire chromosome depending on the number of recombination events that have occurred between a given set of loci, if any occurred.

56 SNP or Haplotypes? Most work with individual SNP Loss of efficiency due to incomplete Linkage Disequilibrium Personal point of view : a haplotype with 8-15 alleles is much more informative QTL SNP1 SNP2 + - Haplotype + 0 -

57 SNP or Haplotypes? In France, a method based on haplotypes of 3-6 markers regions targeted on the genome, for each trait QTL-BLUP, including a residual polygenic effect y i = µ + u i + Σ j (h ij1 + h ij2 ) + e i Milk Protein Fat Protein content Fat content Fer7lity BLUP GBLUP QTL- BLUP

58 Major consequences of genomic selection High reliability (R 2 = 0.5 to 0.7) At a early age, before any performance of the candidate For all traits (depends only on the reference population) => More balanced genetic trend A fantastic opportunity to improve functional traits Use of bulls without progeny test note they will get progeny based EBV, but later => Maintaining performance recording is essential!!!

59 Major consequences of genomic selection A nearly doubled potential genetic trend due to a reduced generation interval, combined with a good accuracy and an increased selection intensity A more balanced genetic trend due to a rather homogeneous reliability across traits and EBV available for all animals due to an increase in weight for functional traits in the breeding objective (no increased selection pressure for production) Possibly, a lower inbreeding trend, if many young bulls are used

60 Another challenge : select for new traits Generate the corresponding reference populations S Several l (tens of) thousand d animals i l Female population, with own performances Taking advantage of the large scale genotyping Economic model : who pays for these data? New consortia : breeding company performance recording organizations - farmers

61 Training set (population): Markers Phenotypes. How to do it. Regress phenotypes on markers in the training set Use regression equations to predict phenotypes from markers in novel germplasm. Select, cross and repeat After a few generations, derive a new training set and start again How many depends on LD, population structure, h 2.

62 The calibration phase questions What population to use? How many individuals? How many markers? variables LD h 2 Allele frequencies recalibration interval breeding methods Extreme examples illustrate problems: Use lines from one cross to predict in another? p(locus segregating in both crosses) is ¼ Prediction within a single cross: many linked loci Some loci linked in dispersion, some in repulsion. Calibration will work on the net effect. Validity after recombination?

63 Calibration: statistical methods BLUP (Best Linear Unbiased Prediction ) on trait (selection index): Predicts performance on individuals with no trait datafrom genetic relationship with individuals with trait data. Used in animal breeding for decades (using pedigree relationships) Ridge Regression (BLUP on markers ): Add a common penalty to each marker to reduce its effect. By reducing the influence of every marker, all markers can be fitted. Bayesian & other methods: All methods predict on the basis of kinship to some extent

64 Predictions from kinship are never better than the best observation: Suppose h 2 = 1 for a polygenic trait. Predicted breeding value is a weighted mean of the phenotyped relatives: Source of Prediction Progeny from parents gp from grand parents ggp from great grand parents r 2 ½ ¼ ⅛ We need methods which escape the gravitational pull of kinship. This is not just do to with algorithms. Small training sets, low numbers of markers: the kinship signal is the only thing the markers can hook.

65 Recalibration As cycles of selection proceed: allele frequencies change. recombination acts to reduced selection acts to increase LD LD Minimum no. of generations before recalibration will depend on: initial LD allele frequencies intensity of selection

66 Select within crosses or select between crosses? Between crosses: we will be selecting mainly on kinship nothing wrong with this, but we do it already Within crosses: higher LD, fewer markers, smaller training sets. cannot select on pedigree estimates of kinship But: Only ½ Va is available within crosses. Only ¼ Va is available for GS. High LD in early generations: how long will the predictions last?

67 Breeding wheat / barley with genomic selection. Finding the right balance: Years F2 4-way cross 8-way cross 1 Cross + DH Cross Cross 2 Bulk Cross + DH Cross 3 Trial Bulk Cross + DH 4 Trial Trial Bulk GS GS / / crosses crosses Trial GS GS / / crosses crosses Trial Trial GS / crosses 8 GS / crosses ß à à à LD: fewer markers Diversity: more response Time: lower resp/year Population size: more power ß à à à

68 Conclusions In breeding, speed is more important than size Genomic selection will reduce cycle time.

69 GS Traditional plant breeding programs rely mainly on phenotypes being evaluated in several environments; selection and recombination are based solely on the resulting data plus pedigree information, when available. Marker assisted selection (MAS) uses molecular markers in linkage disequilibrium (LD) with QTL. Genomic selection (GS) is a new approach for improving quantitative traits in large plant breeding populations that uses whole genome molecular markers (high density markers and high throughput genotyping). Genomic prediction combines marker data with phenotypic and pedigree data (when available) in an attempt to increase the accuracy of the prediction of breeding and genotypic values.

70 Conclusion R = h 2 S R h 2 S response to selection heritability selection differential