Possibilities & Limitations of Plant Genome Sequences for Plant Breeding

Similar documents
Transcription:

Possibilities & Limitations of Plant Genome Sequences for Plant Breeding Richard GF Visser, Wageningen UR Plant Breeding NGGI-Breeding for Crop Improvement, February 2014

Overview Sequencing and breeding Genotyping by Sequencing in Tomato Genome re-sequencing in Tomato Genomics meets genebanks Virtual Lab of Plant Breeding BreedDb Precision Breeding

THE MAIZE GENOME THE BRASSICA GENOME THE MELON GENOME THE CUCUMBER GENOME THE TOMATO GENOME And now what?

Is genome sequence finished? No, because not everything is sequenced of the chosen genotype Yes, with available sequence sufficient leads can be found to sequence Your Favourite Region design primers and start In order to get a better idea about differences in genotypes or rather haplotypes more sequencing is required deep versus broad sequencing

Genomics and (orphan) crops All kinds of genomics technologies are great for crops with little research efforts and/or large genomes Eg. Ornamentals like tulip, chrysanthenum and lily, orphan crops like yam, papaya; finding and using SNPs Comparative genomics enables search for known pathways in under-researched crops Eg. Colour and size of fruits and or flowers

Polyploid crops aided by sequencing...

What to do with the sequence/how useful is the sequence in breeding? Next to genomics other ~omics techniques are and can be deployed as well in plant research Transcriptomics Metabolomics Proteomics Storage of data in DataBases and tools (software) to use the data

What needs to be done further? Phenotype your genotypes extremely well for traits of interest Descriptive traits Metabolomics Proteomics Etc, etc. Automation! Prediction from ~omics data

Breeding is an art and a science Genetic improvement by phenotypic selection Progress is slow, needs green fingers & breeders eye Excellent varieties by chance Change this in targeted breeding for specific markets with higher (predictable) chance of excellent varieties genomics enables this Breeding: more technology based... Precision Breeding

Phenotype ánd genotype your crossing parents!

Genotyping by Sequencing (in Tomato)

S. lycopersicum Moneymaker S. pimpinellifolium G1.1554

Recombinant Inbred Lines 1993-1996 Via single seed descent starting with 100 F2 plants 100 Unique genotypes homozygous

A first look at the sequence of RILs RILs from Moneymaker x S. pimpinellifolium Question: is it possible to determine the boundaries of the S. pimpinellifolium introgressions? Challenges: Relatively low coverage data (from ~ 3x) Low confidence/missing calls Therefore more advanced filtering: Variable coverage along chromosome Variable coverage between genotypes (partial) inference of missing calls Thresholding

Results RILs Red/green for low/high feature rate (Heinz = reference) Red segments derived from Moneymaker Correlates almost perfectly with markers

Results RILs Highlights assembly artifacts Visualizes S. pimpinellifolium introgression(s) in Heinz General tool for visualizing introgressions

Tomato Genome (Re-)Sequencing Project 150 tomato genome sequencing project Identify alleles underpinning phenotypic diversity across the entire genome and entire tomato clade

(re-)sequencing collection 51 4 6 2 3 2 2 1 3 2 7 2 Lycopersicon group Arcanum group Eriopersicon group Neolycopersicon group Tree according to Anderson et al. (2010), redrawn from Moyle 2008 Average coverage 41x

Initial analysis strategies Without assumption K-mer based analysis With assumption Reference genome of S. lycopersicum cv. Heinz Will yield snp s Info on evolution Etc, etc

Number of SNPs

Detection of Introgressions

haplotype diversity within S. lycopersicum

Tomato fruit color Accession color allele chr gene id mutation effect Heinz 1706 red r 3 Psy1 wt Lys 389 Galina yellow r y 3 psy1 G>del Lys 389 >Ser, stop Taxi orange r 3 Psy1 wt Lys 389 Iidi yellow r y 3 psy1 G>del Lys 389 >Ser, stop RF17 yellow r y 3 psy1 G>del Lys 389 >Ser, stop Black Cherry purple og c 6 b A>del Lys 35 >Asn, fs

Exploring the FW9-2-5 locus (Lin5) Sucrose synthase gene Cloned from S. pennellii amino acid substitutions: 2878 (Asp in LP to Glu in LE) 2932 (Asp to Asn) 2953 (Val to Leu) Fridman et al. Proc Natl Acad Sci U S A. 2000 Apr 25;97(9):4718-23.

FW9-2-5 variation (Lin5) S. galapagense

Genomics Meets Genebank

SolRgenes Flow

Summary at Accession Not only info about which accession but also which crossable accession contains similar R gene!

Summary Tools, pipelines, strategies for diploid and polyploid crops: Allele dosage calling Genotyping-by-sequencing Statistical data integration approaches Database integration approaches (Reduced) Data visualization techniques Multi ~omics / Multi trait modeling Several lines of research aimed at development of breeding support systems in which genome sequences play an important role

Virtual lab of Plant Breeding A pre-competitive Public Private Partnership linked to domains of e-bioscience, bio-informatics and ICT-infra A dry-lab infrastructure securing innovation in the Plant Breeding industry Translation of academic knowledge to escience environments relevant for industrial & academic partners Secure real-life use by end-users via a.o. demanddriven development strategy http://vlbp.nl

PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- PSE-7 PSE-8 PSE-9 PSE-10 PSE-11 PSE- Users from VLPB commercial & academic partners problems questions answers tools data Tool-a Tool-b Tool-c Tool-d Tool-e VLPB workshop PSE-1 PSE-2 PSE-... R&D construction control panel VLPB PSE-N PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 VLPB user X BreeDB VLPB user Y BreeDB VLPB user Z BreeDB PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 PSE-4 PSE-5 PSE-6 company tool propriety data repository Lowbandwith public data repository Tool-f Tool-. Tool-z BreeDB OS (eg Linux) Softwareenvir. (eg R) support PSE-1 PSE-2 PSE-3 PSE-4 PSE-5 PSE-6 PSE-1 PSE-2 PSE-3 Highbandwith VLPB data repository * VLPB-III Figure 2: VLPB concept

Analysis & Visualisation tool development Examples: Allelic variation, align many genome sequences Haplotype predictor Incorporate IBD, IBG Structural variation / Synteny S. lycopersicum Pan-Genome Specific tools for cross pollinating crops and selfpollinating crops; and polyploids versus diploids Etc.

VLPB I, 100 + genomes, VLPB II & III VLPB I: tool development 150 tomato genomes +100 Brassica genomes VLPB II: combination with NL-Escience >20 cucumber diseases, RNA seq +100 Brassica genomes VLPB III: Link with iplant 500 melon genomes (about to start) 50 potato genomes (just started) Data repository

BreeDB Interface

BreeDB Navigation (1)

BreeDB Navigation (2)

Data exploration

Prediction of phenotype from ~omics data Breeding or systems biology approach Try to incorporate all data and design best variety Web application to do that: Omics Fusion Top metabolites: linked to the carotenoid pathway

Kruskal-Wallis & Interval mapping

QTL region Genome annotation Chibon et al. (2012). Marker2sequence, mine your QTL regions for candidate genes. Bioinformatics 28: 1921 Chibon et al. (2013). MQ 2 : Visualize multi-trait mapped QTL results. Mol. Breeding 32:981

Annotex (Semantic Integration of data)

Flowchart Precision Breeding Field trial Observations ~omics analysis Estimate genotypic means Genotyping Genotypingbysequencing SNP arrays Dosage information fittetra Optional: Population Structure correction LD analysis / genomic selection Marker selection Research: in sillico cultivar design BreeDB

Concept of Precision Breeding Design a high yielding genotype with excellent quality which gives stable yields under different conditions in a sustainable fashion for: Environment X (e.g. South Spain) Trait Y (e.g. disease resistance, salt tolerance & good quality) Based on a cross between two parents which are known thru and thru Makes use of all available knowledge (in databases and where ever more retrievable) Valorise better all investments

Precision Breeding Approach: Well characterized (elite) germplasm collections Design the desired genotype behind the computer Breeding goals as input Obtain the most effective breeding strategy, starting with your current (elite) germplasm Cross (and if necessary genotype the offspring) Select candidates similar to the in silico designed line Characterize and add to germplasm collection Re-train predictive algorithms

Components of Precision Breeding All different types of HTP phenotyping (phenomics, transcriptomics, metabolomics, proteomics) High throughput genotyping (sequencing based techniques; especially those which can make distinctions between haplotypes, homologs and paralogs) Ontology (re-)sequencing of whole genomes (150+ tomato; 100 Brassica genome re-sequencing projects, etc) Statistical and bioinformatic methods and visualization tools Data management, storage & retrieval Biological Networks

Acknowledgements Wageningen UR Plant Breeding: Richard Finkers, Theo Borm, Pierre-Yves Chibon, Heleen de Weerd, Animesh Acharjee, Sjaak van Heusden, Roeland Voorrips, Vivianne Vleeshouwers, Yuling Bai, Chris Maliepaard BioScience & Bioinformatics WUR: Sander Peters, Andries Koops, Sandra Smit University of Amsterdam: Han Rouwerda, Timo Breit Centre for Genetic resources Netherlands: Theo van Hintum Many breeding companies

Questions? Sponsors NWO Technologie stichting STW EU SOL Ministry of Economic Affairs, Agriculture & Innovation Netherlands Genomics Initiative Many different breeding companies