NGS developments in tomato genome sequencing

Similar documents
Transcription:

NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC CTTAGCGGAGTTTTCATACAATAATAGCTATCACTCAAGCATTGATATGGCTCCA TTTGAAGCAGTGTATGGTAGGAGATGTAGGTCTCCCATTTGTTGGTTTGATGCAT TTGAGGTTAGACCTTGGGGCACTGATCTCTTGAGGGATTCGATGGAAAAAGTGAA GTCTATTAAAGAAAAGCTTCTAGCGGCGCAAAGTAGACAAAAAGAATATGCAGAT CGAAAGGTTAGAGACTAAGAGTTCATGGAGGGTGAACAAGTCTTGTTGAAAGTTT CACCAAAGAAAGGGGTGATGCGGTTTGGTAAAAGGGGTAAACTTAGCCCAAGGTA TATTGGTCCATTCGATGTACTTAAGCGAGTAGGGGAGGTGGCTTATGAGTTAGCC TTGACTCCAGGGCTGTCCGGAGTGCATCCGGTATTCCATGTGTCTATGTTGAAAA GATATCATGGGGATGGAAATTACATTATCCGTTGGGATTCAGTGTTGCTTGATGA GAACTTGTCTTATGAGGAGGAGCCTGTTGCTATTTTAGATAGAGAAGTTCGCAAG TTGAGGTCAAGAGAGATTTCATCCATCAAGGTGTAATGGAAGAATCGACCGGTTG AAGAAGCCACTTGGGAGAAGGAGGCAGATATGCAAGAAAGAAACCCACATCTGTT TACAGATTCAGGTACTCCTTTTCGCCCGTGTTTTCCTTCTTTTGATCGTTTGGGG ACGAACGATGGGTAAATTGGTATCTATTGTAATGACCTGTTTAGTCGTTTTGAGC AACAAACTTCAATTCTGGAAAAACTGGCTGAGGCGACGGACCAAACGACGATCCG TCATGGGCACGACGGACCGTCGCAGGGTCTCGTTTCAAAACACTTAGAAAATCTA AAATTGGGTACTGAAAATCGACTCTTTGAACTTCGGGACAGAATGGCAGCACGGA CCGTCACAGGCGTGACAGACCGTCATAGATTGTTCAGTGGAAGTTGACTCTCTGA CCCTTGCGACGACCTGCAGGACGGACCGTCGCAGGCACGACGGCCCGTCATAGGT TGCGCAAATCCCAGGCAGAATCGGATTTTCTTACACGTTTTAAGGGACGTTTTTG GACTATTCTTTCCTTAATTATAGATTTCGTGGGTTTATATTAATAACTCAAATTC TTGGGGGTTAAAAGAGGTAACCCTAAGTTAATTAGTGGGGTATTATTGCCATCTT TTATTCTTAATTATATACTAATTAGGGTAAAAGAAAGAGTGTTTGAATAAGAAAA TAGAAAGAAAAAGAAGGGAGAGAGAGAAACGATCGAGAAGAAGAGGAAAACACCA AGCTTTGAGGATTAACTTGCTTGATTTCAATTCTTCGGTGGAGGTAGGTTATGGT TTTCATGCTTCATAAGTAAACTCTTAATAGTGAATGATATGTATTGGTAGTATTG TAAACCCTACTATATGCTTAATGGTATGTTTGTATGAATATGATTATATGATTGT GATAAGATAAGCATGATGAAAATATTGAATCCCAAATCTTGAAAAGAAACTTTAA TATACATTATTAATGATGATGCCTTGGTATAGAAGAAGGCTTGATGAATTAAAGT AATGGGATTGATGATGCCTTGGAATAGAGAAGGCTTGATGATTTACAGAATGATA TTAGTGGATCGGAGTGTCACGTTCCGACACATAGTATTAGTGGATCGGCGTGTCA CGTTCCGACACATAGTATTAGTGGATTGGAGTGTCACGTTCCGACACATGTAGGG GATCGGAGTGTCACGTTCTGACACATGTAGGGGATCGAAGTGTCACGTACCAACA TATGTAGGGGATCGGAGTGTCACGTTCCGACACATGTAGGGGATCGGAGTGTCAC GTACCGACACATGTAGGGGATCGGACCCC

Solanaceae genome sequencing

Solanaceae genome sequencing

Tomato Solanaceae family Diploid genome 12 chromosome pairs Genome size: 950 Mb Euchromatin size: 220 Mb Approx. 35,000 genes

Tomato genome sequencing project 2004: Hierarchical BAC-by-BAC approach The International Tomato Genome Sequencing Consortium

Tomato genome sequencing project 2009: NGS approach 454 31x Sanger 3.6x Illumina 82x SOLiD 140x Shotgun Matepair BACs Paired-end Fosmid ends BAC ends Paired-end Matepair Shotgun Matepair

Tomato genome assembly 454 shotgun 454 matepair Sanger matepair 31x 22x 3.3x De novo assembly - newbler - CABOG Illumina paired-end SOLiD matepair 70x 42x 118x 61x Base error correction - k-mer correction - read(-pair) alignment Sanger clone ends 0.3x Long-range scaffolding Sanger BACs 117 Mb Gap filling

Tomato genome build SL2.40 781 Mb assembled ~900 Mb genome 97% anchored ~7 scf per chromosome 34,727 genes 30,855 supported by RNAseq

So, are we done? Is it complete? Is it perfect? Standard Draft High-Quality Draft Improved High-Quality Draft Annotation-Directed Improvement Noncontiguous Finished Finished Chain et al. 2009

So, are we done? Is it complete? Is it perfect? Standard Draft High-Quality Draft Improved High-Quality Draft Annotation-Directed Improvement Noncontiguous Finished Finished Chain et al. 2009

So, are we done? Is it complete? Is it perfect? Standard Draft High-Quality Draft Improved High-Quality Draft Annotation-Directed Improvement Noncontiguous Finished Finished Chain et al. 2009

BAC sequencing for gap closure Sequencing 1000 BACs (EUSOL)

BAC sequencing for gap closure

High-throughput small gap closure Contig A 120 nt probes Gap Contig B

High-throughput small gap closure Contig A 120 nt probes Gap Contig B

High-throughput small gap closure Contig A 120 nt probes Gap Contig B Consensus sequence to fill the gap Read assembly CCGATATTTAGCTCTAGGGAA

A single reference genome is not enough

150 tomato genome project Public-private partnership initiated by TTI green genetics and BGI China (Re)sequencing 150 tomato accessions Cultivated tomatoes Land races Wild tomatoes Herbarium material RIL population De novo sequencing of 3 genomes

NGS facility

Acknowledgements Wageningen UR International tomato genome sequencing consortium SOL EUSOL CBSG 150 tomato genome project KeyGene and CAT-AgroFood