Possibilities & Limitations of Plant Genome Sequences for Plant Breeding

Size: px
Start display at page:

Download "Possibilities & Limitations of Plant Genome Sequences for Plant Breeding"

Transcription

1 Possibilities & Limitations of Plant Genome Sequences for Plant Breeding Richard GF Visser, Wageningen UR Plant Breeding NGGI-Breeding for Crop Improvement, February 2014

2 Overview Sequencing and breeding Genotyping by Sequencing in Tomato Genome re-sequencing in Tomato Genomics meets genebanks Virtual Lab of Plant Breeding BreedDb Precision Breeding

3 THE MAIZE GENOME THE BRASSICA GENOME THE MELON GENOME THE CUCUMBER GENOME THE TOMATO GENOME And now what?

4 Is genome sequence finished? No, because not everything is sequenced of the chosen genotype Yes, with available sequence sufficient leads can be found to sequence Your Favourite Region design primers and start In order to get a better idea about differences in genotypes or rather haplotypes more sequencing is required deep versus broad sequencing

5 Genomics and (orphan) crops All kinds of genomics technologies are great for crops with little research efforts and/or large genomes Eg. Ornamentals like tulip, chrysanthenum and lily, orphan crops like yam, papaya; finding and using SNPs Comparative genomics enables search for known pathways in under-researched crops Eg. Colour and size of fruits and or flowers

6 Polyploid crops aided by sequencing...

7 What to do with the sequence/how useful is the sequence in breeding? Next to genomics other ~omics techniques are and can be deployed as well in plant research Transcriptomics Metabolomics Proteomics Storage of data in DataBases and tools (software) to use the data

8 What needs to be done further? Phenotype your genotypes extremely well for traits of interest Descriptive traits Metabolomics Proteomics Etc, etc. Automation! Prediction from ~omics data

9 Breeding is an art and a science Genetic improvement by phenotypic selection Progress is slow, needs green fingers & breeders eye Excellent varieties by chance Change this in targeted breeding for specific markets with higher (predictable) chance of excellent varieties genomics enables this Breeding: more technology based... Precision Breeding

10 Phenotype ánd genotype your crossing parents!

11 Genotyping by Sequencing (in Tomato)

12 S. lycopersicum Moneymaker S. pimpinellifolium G1.1554

13 Recombinant Inbred Lines Via single seed descent starting with 100 F2 plants 100 Unique genotypes homozygous

14 A first look at the sequence of RILs RILs from Moneymaker x S. pimpinellifolium Question: is it possible to determine the boundaries of the S. pimpinellifolium introgressions? Challenges: Relatively low coverage data (from ~ 3x) Low confidence/missing calls Therefore more advanced filtering: Variable coverage along chromosome Variable coverage between genotypes (partial) inference of missing calls Thresholding

15 Results RILs Red/green for low/high feature rate (Heinz = reference) Red segments derived from Moneymaker Correlates almost perfectly with markers

16 Results RILs Highlights assembly artifacts Visualizes S. pimpinellifolium introgression(s) in Heinz General tool for visualizing introgressions

17

18 Tomato Genome (Re-)Sequencing Project 150 tomato genome sequencing project Identify alleles underpinning phenotypic diversity across the entire genome and entire tomato clade

19 (re-)sequencing collection Lycopersicon group Arcanum group Eriopersicon group Neolycopersicon group Tree according to Anderson et al. (2010), redrawn from Moyle 2008 Average coverage 41x

20 Initial analysis strategies Without assumption K-mer based analysis With assumption Reference genome of S. lycopersicum cv. Heinz Will yield snp s Info on evolution Etc, etc

21 Number of SNPs

22 Detection of Introgressions

23 haplotype diversity within S. lycopersicum

24 Tomato fruit color Accession color allele chr gene id mutation effect Heinz 1706 red r 3 Psy1 wt Lys 389 Galina yellow r y 3 psy1 G>del Lys 389 >Ser, stop Taxi orange r 3 Psy1 wt Lys 389 Iidi yellow r y 3 psy1 G>del Lys 389 >Ser, stop RF17 yellow r y 3 psy1 G>del Lys 389 >Ser, stop Black Cherry purple og c 6 b A>del Lys 35 >Asn, fs

25 Exploring the FW9-2-5 locus (Lin5) Sucrose synthase gene Cloned from S. pennellii amino acid substitutions: 2878 (Asp in LP to Glu in LE) 2932 (Asp to Asn) 2953 (Val to Leu) Fridman et al. Proc Natl Acad Sci U S A Apr 25;97(9):

26 FW9-2-5 variation (Lin5) S. galapagense

27 Genomics Meets Genebank

28 SolRgenes Flow

29 Summary at Accession Not only info about which accession but also which crossable accession contains similar R gene!

30 Summary Tools, pipelines, strategies for diploid and polyploid crops: Allele dosage calling Genotyping-by-sequencing Statistical data integration approaches Database integration approaches (Reduced) Data visualization techniques Multi ~omics / Multi trait modeling Several lines of research aimed at development of breeding support systems in which genome sequences play an important role

31 Virtual lab of Plant Breeding A pre-competitive Public Private Partnership linked to domains of e-bioscience, bio-informatics and ICT-infra A dry-lab infrastructure securing innovation in the Plant Breeding industry Translation of academic knowledge to escience environments relevant for industrial & academic partners Secure real-life use by end-users via a.o. demanddriven development strategy

32 PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- Users from VLPB commercial & academic partners problems questions answers tools data Tool-a Tool-b Tool-c Tool-d Tool-e VLPB workshop PSE-1 PSE-2 PSE-... R&D construction control panel VLPB PSE-N PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 VLPB user X BreeDB VLPB user Y BreeDB VLPB user Z BreeDB PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 PSE-4 PSE-5 PSE-6 company tool propriety data repository Lowbandwith public data repository Tool-f Tool-. Tool-z BreeDB OS (eg Linux) Softwareenvir. (eg R) support PSE-1 PSE-2 PSE-3 PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 Highbandwith VLPB data repository * VLPB-III Figure 2: VLPB concept

33 Analysis & Visualisation tool development Examples: Allelic variation, align many genome sequences Haplotype predictor Incorporate IBD, IBG Structural variation / Synteny S. lycopersicum Pan-Genome Specific tools for cross pollinating crops and selfpollinating crops; and polyploids versus diploids Etc.

34 VLPB I, genomes, VLPB II & III VLPB I: tool development 150 tomato genomes +100 Brassica genomes VLPB II: combination with NL-Escience >20 cucumber diseases, RNA seq +100 Brassica genomes VLPB III: Link with iplant 500 melon genomes (about to start) 50 potato genomes (just started) Data repository

35 BreeDB Interface

36 BreeDB Navigation (1)

37 BreeDB Navigation (2)

38 Data exploration

39 Prediction of phenotype from ~omics data Breeding or systems biology approach Try to incorporate all data and design best variety Web application to do that: Omics Fusion Top metabolites: linked to the carotenoid pathway

40 Kruskal-Wallis & Interval mapping

41 QTL region Genome annotation Chibon et al. (2012). Marker2sequence, mine your QTL regions for candidate genes. Bioinformatics 28: 1921 Chibon et al. (2013). MQ 2 : Visualize multi-trait mapped QTL results. Mol. Breeding 32:981

42 Annotex (Semantic Integration of data)

43 Flowchart Precision Breeding Field trial Observations ~omics analysis Estimate genotypic means Genotyping Genotypingbysequencing SNP arrays Dosage information fittetra Optional: Population Structure correction LD analysis / genomic selection Marker selection Research: in sillico cultivar design BreeDB

44 Concept of Precision Breeding Design a high yielding genotype with excellent quality which gives stable yields under different conditions in a sustainable fashion for: Environment X (e.g. South Spain) Trait Y (e.g. disease resistance, salt tolerance & good quality) Based on a cross between two parents which are known thru and thru Makes use of all available knowledge (in databases and where ever more retrievable) Valorise better all investments

45 Precision Breeding Approach: Well characterized (elite) germplasm collections Design the desired genotype behind the computer Breeding goals as input Obtain the most effective breeding strategy, starting with your current (elite) germplasm Cross (and if necessary genotype the offspring) Select candidates similar to the in silico designed line Characterize and add to germplasm collection Re-train predictive algorithms

46 Components of Precision Breeding All different types of HTP phenotyping (phenomics, transcriptomics, metabolomics, proteomics) High throughput genotyping (sequencing based techniques; especially those which can make distinctions between haplotypes, homologs and paralogs) Ontology (re-)sequencing of whole genomes (150+ tomato; 100 Brassica genome re-sequencing projects, etc) Statistical and bioinformatic methods and visualization tools Data management, storage & retrieval Biological Networks

47 Acknowledgements Wageningen UR Plant Breeding: Richard Finkers, Theo Borm, Pierre-Yves Chibon, Heleen de Weerd, Animesh Acharjee, Sjaak van Heusden, Roeland Voorrips, Vivianne Vleeshouwers, Yuling Bai, Chris Maliepaard BioScience & Bioinformatics WUR: Sander Peters, Andries Koops, Sandra Smit University of Amsterdam: Han Rouwerda, Timo Breit Centre for Genetic resources Netherlands: Theo van Hintum Many breeding companies

48 Questions? Sponsors NWO Technologie stichting STW EU SOL Ministry of Economic Affairs, Agriculture & Innovation Netherlands Genomics Initiative Many different breeding companies