High density genotyping and phenotyping data: challenges of leveraging novel technologies for the valorization of PGR

Size: px
Start display at page:

Download "High density genotyping and phenotyping data: challenges of leveraging novel technologies for the valorization of PGR"

Transcription

1 High density genotyping and phenotyping data: challenges of leveraging novel technologies for the valorization of PGR Nils Stein, Joel Kuon, Uwe Scholz, Martin Mascher, Benjamin Kilian, Kerstin Neumann, Christian Klukas, Jens Keilwagen & Andreas Graner

2 Conservation of Plant Genetic Resources: mission accomplished genebanks worldwide, 7.4 million accessions

3 Ex situ accessions of the genus Hordeum Hordeum accessions in 204 collections Hor

4 Physical/Genetic Map + Integrated Gene Space Integrated Resources 4,556 BAC contigs (3.9 Gbp) 374 Mbp BES 6,322 sequenced BACs 312 Mbp WGS contigs 3,276 directly anchored markers 498,165 marker sequences anchored 26,256 high confidence genes 57% anchored Intl. Barley Sequencing Consortium, (2012) Nature 491,:711-

5 Towards a barley reference map POPSEQ Mascher et al Plant J, 76, MTP-sequencing

6 Ariyadasa R et al. Plantphysiol 2014;164: Map based gene isolation

7 Map based cloning : mnd gene MHOR474 MHOR474 Barke F2 (100 Individuals) x 79 wild-type 21 mnd-mutants 3:1 segregation mnd is a recessive gene

8 Allele frequency Bulked segregant analysis (exome capture) mnd-mutant pool 1H 2H 3H 4H 5H 6H 7H wild type pool 1H 2H 3H 4H 5H 6H 7H

9 con17813 con2679 mnd con2786 con2697 con55926 From map to gene Initial low resolution map A Bulked segregant analysis, present/absent B Bradi4g Bradi4g Bradi4g Bradi4g Bradi4g Bradi4g Contigs on physical map C FPC FPC Sequence of BAC clones representing mnd region D 1540 kb E BW519 Ala/Gly BW520 Ala/Lys Confirmation by analysis of (TILLING) mutants ATG TGA Deleted: BW522

10 Automated Phenotyping: imaging 520 carriers with RFID-chip Automated imaging and watering Conveyer belt system for transport VIS NIR Fluo

11 Correlation of fresh weight vs. digital biomass 100 genotypes, 5 plants each, 3 reps Time Point: 58 days after sowing

12 GWAS for final biomass Sdw 1 (denso) tillering PpdH1

13 Valorizing legacy data: since 1946: Normalized Rank Product

14 Re-sequencing: GBS of 165 barley cultivars Released Objective: reveal trends in genetic diversity

15 Re-sequencing: Exome Capture 61 Mbp of exome space based on physical map genome assembly 250,000-1,000,000 SNPs between Morex and other cultivars re-sequencing of 250 landraces and wild barley in progress Mascher et al Plant J, 76,

16 A peek into the future: how far can we reach to catalogue genebank collections? Andreas Graner GBS, WGS Exome capture Imaging

17 Re-sequencing Characterization of IPK barley collection ( accessions) Illumina HiSeq2000 technology *) 200 cpu cores; 50TB disc space; storage costs: 500 /TB*yr)

18 Phenotyping bottlenecks Imaging system, capacity 1000 pots 1 experiment # plants days size (GB) # files 4000 (= 800 accessions) # plants years size (GB) # files 100,000 (= 22,000 accessions) ~240 ( = 4 runs p.a.) IPK barley collection ~4 TB ~2,500,000 ~25 ~100 TB ~62,500,000 Use Cases Regular Experiment daily imaging visible / fluorescence / near-infrared use of full capacity of the system

19 From challenge to strategy: customized resequencing 1. GBS of complete collection - Catalogue Diversity - Identify subgroups (high LD) - Duplicate accessions Subpopulation 1 3. Data Warehouse - Passport data - Conservation management - Sequences - Phenotypic data Subpopulation 2 Subpopulation 3 2. Select representative members of subgroups - Deep sequencing (EC) - Genome evolution - Allele mining - targeted phenotyping Subpopulation n 4. Use Cases - Analysis tools - Visualization

20 Conclusions Value of PGR = f (availability, information phenotype/genotype) Genebanks have done an almost perfect job on seed conservation Genomic sequence perfect resource for trait mapping and gene cloning but reference sequence yet to be completed Resequencing and phenomics of entire collections require adequate IT resources Need for customized approaches: technologiy continuously develops; no one size fits all solution..

21

22 SNP frequency Basic approach : bulk segregant analysis combination of bulk segregant technique with next generation sequencing technology 1. Bulk segregant analysis 2. Read mapping & SNP calling A T A A T T A T A T A REF 50 % SNP frequency 1 0,5 Schneeberger and Weigel, mnd MB Prerequisite: Deep sequencing Limitation: Barley (5.1Gb) + high coverage

23 Collection management legacy data since 1946 during seed multiplication observations/measurements, Ø every 20 years Unreplicated and non orthogonal "experiments" environmental cues - agricultural practices - disease pressure - climate change Jens Keilwagen et al. (submitted)

24 Legacy data of IPK genebank 1. Accession Number 2. Year of multiplication 3. Life form (spring/winter) 4. Sowing date 5. Emergence date 6. State before winter (+date of scoring) 7. State after winter (+date of scoring) 8. Growth habit (e.g. erect, prostrate) 9. Heading date 10. Flowering date 11. Wax cover of plant (+ date of scoring) 12. Tillering (+ date of scoring) 13. Plant height (in cm) 14. Culm length (in cm) 15. Lodging (+ date of scoring) scored up to 2 times during one season 16. Milk ripeness date 17. Yellow ripeness date 18. Full maturity date 19. Begin of harvest date 20. End of harvest date 21. Mildew (+ date of scoring) scored up to 3 times during one season 22. Yellow rust (+date of scoring) scored up to 3 times during one season 23. Leaf rust (+date of scoring) scored up to 3 times during one season 24. Leaf sheath pubescence (+date of scoring) 25. Thousand-grain weight

25 Plant height Trait analysis by "normalized rank product" ranking of of 6,959 wheat accessions by legacy data ( datapoints) validation by 2011 field trial 4 contrasting groups à 15 accessions Flowering Time

26 Variation in flowering time: Re-sequencing of Ppd-D1 Flowering time: 60 contrasting accessions 8 haplotypes 3 novel alleles

27 Ex situ Resources of selected Cereal Crops (Poaceae) Joinvillea Anomochloa ex situ accessions Oryza (rice) 774, Brachypodium Avena (oats) 148,000 Hordeum (barley) 470,000 Triticum (wheat) 858, Setaria (foxtail millet) 47, Pennisetum (pearl millet) 65,000 9 Sorghum 236,000 Zea (maize) 328,000 3 millions (adapted from Gaut 2002, New Phytol. FAO WIEWS, 2014)

28 Broad sense heritabilties Trait H 2 Mean Min - Max K max (x 10 7 ) (x 1 r IP (DAS) (DAS DB (x 10 7 ) (x 1 DB (x 10 7 ) (x 10 DB (x 10 7 ) (x 1 K max : maximum biomass (modelled); r: average growth rate (modelled); IP: Inflexion point (modelled); DB: digital biomass (estimated)

29 Biomass accumulation Screening 100 two-rowed spring barley genotypes 5 plants per genotype = 500 plants 3 independent experiments with 15 h light Experiment Date of sowing DAS Daily imaging and watering Watering to target weight = 90% Field capacity (FC) Last imaging on DAS 58 Manual measurement of fresh and dry weight

30 Top view images Digital biomass Plant height and width Compactness Parameters of convex hull Colour classification 3 side views, one and top view Side view images Image analysis based on Integrative Analysis Platform IAP (Klukas et al. 2012, Journal of Integrative Bioinformatics 9, e191) Focus: digital biomass

31 Composition of diverse barley collection Region No. of GT No. of countries EU WANA 15 5 AM 6 5 EA 2 2 Total GT advanced cultivars 8 GT breeding material 14 GT landraces Neighbour joining tree Split tree, uncorrected P

32 Using daily imaging data to model biomass growth Single plant modelling revealed logistic growth (always best fit): three traits extracted from logistic growth model Model fit Mean Min - Max Example: single plant of cv. Apex R K max maximum (vegetative) biomass Inflexion point I t<i: increase of growth, t=i maximal growth; t>i: slow down of growth Intrinsic growth rate r Speed of growth

33 GWAS for tiller number

34 3.9% (growth rate) 12.3% (DB27) Indicates the marker with the highest additive effect (in %) for that trait 34

35 35

36 18.4% (DB58) 18.4% (Kmax) 16.3% (DB45) 1.1% (IP) Indicates the marker with the highest additive effect (in %) for that trait 36