The tomato genome re-seq project

Similar documents
Possibilities & Limitations of Plant Genome Sequences for Plant Breeding

NGS developments in tomato genome sequencing

Advanced breeding of solanaceous crops using BreeDB

Introgression Browser tutorial

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

The Human Genome and its upcoming Dynamics

Mapping and Mapping Populations

Genomic resources. for non-model systems

100 GENOMES IN 100 DAYS: THE STRUCTURAL VARIANT LANDSCAPE OF TOMATO GENOMES

Supplementary Information. The flowering gene SINGLE FLOWER TRUSS drives heterosis for yield in tomato

Identifying Genes Underlying QTLs

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

The Diploid Genome Sequence of an Individual Human

Application of next generation sequencing of a begomovirusresistant. KASPar assay for SNP detection of the Ty1-Ty3 region

100 Genomes in 100 Days: The Structural Variant Landscape of Tomato Genomes

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

De novo assembly in RNA-seq analysis.

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple

NEXT GENERATION SEQUENCING. Farhat Habib

HCS806 Summer 2010 Methods in Plant Biology: Breeding with Molecular Markers

De Novo Assembly of High-throughput Short Read Sequences

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

SUPPLEMENTARY INFORMATION

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Hans Merensky Avocado Genomics Project. at the University of Pretoria

Exploring structural variation in the tomato genome with JBrowse

De novo whole genome assembly

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe

De novo whole genome assembly

Map-Based Cloning of Qualitative Plant Genes

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

Exploiting novel rice baseline datasets: WGS, BAC-based platinum genome sequencing and full-length transcriptomics

DE NOVO GENOME ASSEMBLY OF THE AFRICAN CATFISH (CLARIAS GARIEPINUS)

Fruit and Nut Trees Genomics and Quantitative Genetics

Current Applications and Future Potential of High Resolution Melting at the National Clonal Germplasm Repository in Corvallis, Oregon

Begomovirus resistance z Resistance TYLCV ToMoV Yield Fruit size Designation Source Spring Fall Spring Fall (kg/plant) (g)

Lecture 1 Introduction to Modern Plant Breeding. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Mate-pair library data improves genome assembly

Proceedings of the World Congress on Genetics Applied to Livestock Production,

Domestic animal genomes meet animal breeding translational biology in the ag-biotech sector. Jerry Taylor University of Missouri-Columbia

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Mapping strategies for sequence reads

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

Using molecular marker technology in studies on plant genetic diversity Final considerations

RNA-SEQUENCING ANALYSIS

Dissecting the genetic basis of grain size in sorghum. Yongfu Tao DO NOT COPY. Postdoctoral Research Fellow

SNP calling and VCF format

Genomics and Transcriptomics of Spirodela polyrhiza

Chapter 3: Evolutionary genetics of natural populations

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Progress in genomics applications in investigating abiotic stresses influencing perennial forage and biomass grasses

Direct determination of diploid genome sequences. Supplemental material: contents

De Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

Sequencing and assembly of the sheep genome reference sequence

Prioritization: from vcf to finding the causative gene

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.

Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms.

Gene Mapping in Natural Plant Populations Guilt by Association

Technologies, resources and tools for the exploitation of the sheep and goat genomes.

Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding.

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline

Contact us for more information and a quotation

De novo whole genome assembly

Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Funded by the Overseas Development Administration (ODA)

Nature Biotechnology: doi: /nbt.3943

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

Phenotypic response conferred by the Lr22a leaf rust resistance gene against ten Swiss P. triticina isolates.

Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were

I.1 The Principle: Identification and Application of Molecular Markers

Supplementary Data 1.

Linkage Disequilibrium

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler

CSE182-L16. LW statistics/assembly

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Genome Assembly of the Obligate Crassulacean Acid Metabolism (CAM) Species Kalanchoë laxiflora

High density genotyping and phenotyping data: challenges of leveraging novel technologies for the valorization of PGR

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Bioinformatic analysis of Illumina sequencing data for comparative genomics Part I

De novo Genome Assembly

Towards Personal Genomics

The international effort to sequence the 17Gb wheat genome: Yes, Wheat can!

Transcription:

The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden

Rationale Genetic diversity in commercial tomato germplasm relatively narrow Unexploited genetic diversity available in land races and old varieties? Cultivated tomato has lost valuable traits during domestication Wild species - source of genetic diversity Diverse habitat Variation in flowers and fruits Variation in mating systems Most wild species can be crossed with cultivated tomato (introgression breeding)

Rationale Tomato Genome (Re-) Sequencing Project Identify alleles underpinning phenotypic diversity across the entire genome and entire tomato clade

Acknowledgement: Sjaak van Heuden, Paris market

Tomato fruit shape variation Rodríguez et al (2011) Plant physiology 156: 275-85

EU-SOL core collection 1000 landraces > 7000 landraces 200 landraces Selected landraces for (re-)sequencing https://www.eu-sol.wur.nl Information: Marker data Phenotype data Passport data Markers 20 (7000 -> 1000) 384 (1000 -> 200) 7500 ( 200 -> 34) Acknowledgement: Dani Zamir et al. & Keygene N.V.

Landraces & old cultivar collection

Fruit phenotypes EU-SOL collection

Improving with exotic genetic libraries Wild tomato species are valuable candidate for novel alleles Dani Zamir, Nature Reviews Genetics 2, 983-989 (December 2001)

Improving with exotic genetic libraries Phylogenetic relationships in the Solanum clade Moyle 2008

(re-)sequencing collection 51 4 6 2 3 2 2 1 3 2 7 2 Lycopersicon group Arcanum group Eriopersicon group Neolycopersicon group Tree according to Anderson et al. (2010), redrawn from Moyle 2008

Genome Alignment Read mapping to cv. Heinz Genome structure wild tomato relatives?

Reference genomes: De novo assembly selection Heinz1706 Lycopersicon group LA 2157 Arcanum group LYC 4 Eriopersicon group LA 716 Neolycopersicon group

Data production 84 Resequenced genomes 500 bp, 2x100 bp Paired-end Illumina Average coverage 41x 3 de novo genomes (S. arcanum, S. habrochaites, S. pennellii) 170 bp, 2x 100 bp Paired end Illumina 2 kb, 2 x 100 bp Mate-paired end Illumina 8 kb matepair (454) 20 kb matepair (454) Average coverage 205x

Genomic sequencing libraries

K-mer graph 31-mer volume Millions 1000 900 800 700 600 500 400 300 200 100 31-mer histogram '001' FIT '045' FIT '046' FIT '053' FIT '054' FIT '058' FIT '072' FIT '074' FIT 0 0 10 20 30 40 31-mer 50 frequency 60 70 80 90 100 Data: 500 bp, 2x100 bp Paired-end Illumina Acknowledgement: Theo Borm

K-mer exploration Fitted modi Homozygous Heterozygous Duplicated (2x) Conclusions % heterozygosity is neglectable Duplicated portion is not neglectable Millions 31-mer volume 300 250 200 150 31-mer histogram 100 50 0 30 50 31-mer frequency 70 90 '001' FIT '045' FIT '046' FIT '053' FIT '054' FIT '058' FIT '072' FIT '074' FIT

Genome size estimates Genomic K-mer based estimate Ignores differences GC-AT ratio Underestimation Nr Specie s Est. Size (Mb) Draft Size (Mb) %CP 01 SL 723 1.9 Heinz 760 45 SP 749 1.9 46 SP 775 6.3 LA1589 739 53 SG 728 4.4 54 SC 760 6.2 58 SA 830 3.0 72 SH 779 7.1 74 SP 962 8.6 Acknowledgement: Theo Borm The Tomato Genome Consortium Nature 485, 635 641 (2012)

Optimizing assembly strategy

Checking assebly integrity Average completeness per 10 contigs: ALL-PATHS (96.62%) CLC-BIO (74.62%) Heinz dot plot SL2.40 ch11 region (1 Mbp)

Status de novo assembly genomes

Status de novo assembly genomes N50 N90 Longest Shortest Mean Median N Contigs Total length Heinz 1706 reference 16,467,796 3,041,128 42,121,211 2000 242,428 2,847 3,223 781,345,411 S. habrochaites_allpaths 90,424 12,290 990,035 902 43,409 20,461 16,935 735,128,396 S. habrochaites_scaf 515,730 104,925 3,252,897 902 130,475 9,758 5,873 766,277,628 S. pennellii_allpaths 64,671 7,460 627,722 887 27,680 11,008 26,589 735,990,792 S. pennellii_scaf 206,135 38,969 1,269,801 887 49,209 5,932 15,886 781,730,072 S. arcanum_clc 18,651 2,524 241,690 200 2,869 428 290,145 832,461,203

Conclusions Sequencing completed Quality and coverage threshold satisfied Cleaning resequencing data completed De novo assembly of S. habrochaites and S. pennelli comparable with tomato reference De novo assembly of S. arcanum in progress Read mapping and SNP analysis finished

And now the fun begins...

Average SNP rate/kb (vs. SL2.40)

Homozygous vs Heterozygous feature rate

Exploring the FW9-2-5 locus (Lin5) Sucrose synthase gene Cloned from S. pennellii amino acid substitutions: 2878 (Asp in LP to Glu in LE) 2932 (Asp to Asn) 2953 (Val to Leu) Fridman et al. Proc Natl Acad Sci U S A. 2000 Apr 25;97(9):4718-23.

FW9-2-5 variation (Lin5) S. galapagense

Needs Whole genome variant catalogue Annotation for the three wild species genomes Pan genome reconstruction How good is our sampling?

Perspectives Direct application for Reverse genetics studies Use identified allelic variation Calculate distance based on all genes? Better understanding of genome organization Improve introgression breeding Homozygous vs. hetrerozygous features Scan for inversions Diamond jewelry?

150 tomato genome consortium

Questions Project site: http://www.tomatogenome.net Phenotype data & Images: https://www.eu-sol.wur.nl SOL100: http://solgenomics.net or http://solgenomics.wur.nl

Acknowledgments Data production Elio Schijlen Bas te Lintel Hekkert Quality control Saulo Aflitos Huanwen Zhu Minling Xiao Tao Ma Xiaoli Wang Data management and assembly Sandra Smit Jan van Haarst Henri van de Geest Lars Smits Jiumeng Min Jie Chen Xiaoli Wang Project management Sander Peters Richard Finkers Andries Koops Jianbo Jian Yadan Luo Li Liao Tina(Na) Xu