A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II

Similar documents
De novo assembly of complex genomes using single molecule sequencing

The Resurgence of Reference Quality Genomes

Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab'

Hybrid Error Correction and De Novo Assembly with Oxford Nanopore

The Resurgence of Reference Quality Genome Sequence

Looking Ahead: Improving Workflows for SMRT Sequencing

De Novo and Hybrid Assembly

Exploiting novel rice baseline datasets: WGS, BAC-based platinum genome sequencing and full-length transcriptomics

Single cell & single molecule analysis of cancer

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Error correction and assembly complexity of single molecule sequencing reads

Long read sequencing technologies and its applications

Long Read Sequencing Technology

SMRT-assembly Error correction and de novo assembly of complex genomes using single molecule, real-time sequencing

Optimizing Blue Pippin Size- Selection for Increased SubRead Lengths on the RSII Applications in Clinical Genomics

Reference-quality diploid genomes without de novo assembly

SMRT-assembly Error correction and de novo assembly of complex genomes using single molecule, real-time sequencing

Building a platinum human genome assembly from single haplotype human genomes. Karyn Meltz Steinberg PacBio UGM December,

Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology

Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica

RNA sequencing with the MinION at Genoscope

Analysis of Structural Variants using 3 rd generation Sequencing

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

The Human Genome and its upcoming Dynamics

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

Using New ThiNGS on Small Things. Shane Byrne

Structural Variant Detection in SMRT Link 5 with pbsv

Structural Variant Detection in SMRT Link 5 with pbsv

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

DNA Sequencing and Assembly

Comprehensive Views of Genetic Diversity with Single Molecule, Real-Time (SMRT) Sequencing

Emerging applications of SMRT Sequencing

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

De novo whole genome assembly

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k

Applications of PacBio Single Molecule, Real- Time (SMRT) DNA Sequencing

Anker P Sørensen Crop innovation through novel NGS applications

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Heterozygosity, Phased Genomes, and Personalized-omics

Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms.

De novo whole genome assembly

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

100 GENOMES IN 100 DAYS: THE STRUCTURAL VARIANT LANDSCAPE OF TOMATO GENOMES

Genome Assembly With Next Generation Sequencers

Matthew Tinning Australian Genome Research Facility. July 2012

MinION, GridION, how does Nanopore technology meet the needs of our users?

A Roadmap to the De-novo Assembly of the Banana Slug Genome

De Novo Assembly of High-throughput Short Read Sequences

2 nd Genera-on ( NextGen ) Sequencing Technologies

Detecting Structural Variants in PacBio Reads Tools and Applications

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Genome Assembly of the Obligate Crassulacean Acid Metabolism (CAM) Species Kalanchoë laxiflora

Implementation and Evaluation of 10X Genomics Chromium technology

100 Genomes in 100 Days: The Structural Variant Landscape of Tomato Genomes

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

How much sequencing do I need? Emily Crisovan Genomics Core

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

Understanding the science and technology of whole genome sequencing

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Sequencing techniques

Overview of Next Generation Sequencing technologies. Céline Keime

Genomics and Transcriptomics of Spirodela polyrhiza

SMRT Analysis Barcoding Overview

The international effort to sequence the 17Gb wheat genome: Yes, Wheat can!

NGS technologies approaches, applications and challenges!

De novo Genome Assembly

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Understanding, Curating, and Analyzing your Diploid Genome Assembly

NGS developments in tomato genome sequencing

Supplementary Data 1.

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly

2nd (Next) Generation Sequencing 2/2/2018

Alignment and Assembly

Wheat CAP Gene Expression with RNA-Seq

Genome Sequencing and Assembly

Taking Advantage of Long RNA-Seq Reads. Vince Magrini Pacific Biosciences User Group Meeting September 18, 2013

De novo genome assembly with next generation sequencing data!! "

Supplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads

SMRT Analysis Barcoding Overview (v6.0.0)

Cod: first version scaffolds. 35% gap bases. < ENSGMOT protein coding. contig > contig > contig >

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Guidelines for Preparing 20 kb SMRTbell Templates

Opportunities offered by new sequencing technologies

Whole Genome Assembly and Alignment Michael Schatz. Nov 5, 2013 SBU Graduate Genetics

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

The Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Genome Assembly Workshop Titles and Abstracts

Bionano Access : Assembly Report Guidelines

Transcription:

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II W. Richard McCombie Disclosures

Introduction to the challenge Short read NGS has revolutionized resequencing De novo assembly is possible but not optimal with short reads Long reads improve the ability do de novo assembly dramatically Even in organisms with a good reference, such as humans, resequencing misses some structural differences relative to the reference Plant genomes are very large in general There are significant structural differences between different strains of the same plant such as rice These structural differences contribute to salient biological differences

Potential uses for rice genomes Understand basis of differences among subpopulations and varieties (duplications, CNVs, etc.)that lead to important phenotypic differences - this requires de novo assemblies - not simple resequencing Many of these differences relate to ability to grow in less than optimum conditions Low phosphorus Submergence Drought Disease exposure

A test genome yeast: S. cerevisiae W303 PacBio RS II sequencing at CSHL Size selection using an 7 Kb elution window on a BluePippin device from Sage Science Over 175x coverage in 16 SMRT cells using P5-C3 Mean: 5910 83x over 10kbp 8.7x over 20kb Max: 36,861bp

S. cerevisiae W303 S288C Reference sequence 12.1Mbp; 16 chromo + mitochondria; N50: 924kbp PacBio assembly using HGAP + Celera Assembler 12.4Mbp; 21 non-redundant contigs; N50: 811kbp; >99.8% id

S. cerevisiae W303 S288C Reference sequence 12.1Mbp; 16 chromo + mitochondria; N50: 924kbp PacBio assembly using HGAP + Celera Assembler 12.4Mbp; 21 non-redundant contigs; N50: 811kbp; >99.9% id 35kbp repeat cluster Near-perfect assembly: All but 1 chromosome assembled as a single contig

S. pombe dg21 PacBio RS II sequencing at CSHL Size selection using an 7 Kb elution window on a BluePippin device from Sage Science Mean: 5170 Over 275x coverage in 5 SMRTcells using P5-C3 103x over 10kbp 7.6x over 20kb Max: 35,415bp

S. pombe dg21 ASM294 Reference sequence 12.6Mbp; 3 chromo + mitochondria; N50: 4.53Mbp PacBio assembly using HGAP + Celera Assembler 12.7Mbp; 13 non-redundant contigs; N50: 3.83Mbp; >99.98% id Near perfect assembly: Chr1: 1 contig Chr2: 2 contigs Chr3: 2 contigs MT: 1 contig +

A. thaliana Ler-0 http://blog.pacificbiosciences.com/2013/08/new-data-release-arabidopsis-assembly.html A. thaliana Ler-0 sequenced at PacBio Sequenced using the previous P4 enzyme and C2 chemistry Size selection using an 8 Kb to 50 Kb elution window on a BluePippin device from Sage Science Total coverage >119x Genome size: 124.6 Mbp Chromosome N50: 23.0 Mbp Corrected coverage: 20x over 10kb Sum of Contig Lengths: 149.5Mb N50 Contig Length: 8.4 Mb Number of Contigs: 1788 High quality assembly of chromosome arms Assembly Performance: 8.4Mbp/23Mbp = 36% MiSeq assembly: 63kbp/23Mbp =.2%

ECTools: Error Correction with pre-assembled reads https://github.com/jgurtowski/ectools Short Reads - > Assemble Uni5gs - > Align & Select - > Error Correct Can Help us overcome: 1. Error Dense Regions Longer sequences have more seeds to match 2. Simple Repeats Longer sequences easier to resolve However, cannot overcome Illumina coverage gaps & other biases

A. thaliana Ler-0 http://blog.pacificbiosciences.com/2013/08/new-data-release-arabidopsis-assembly.html

O. sativa pv Indica (IR64) PacBio RS II sequencing at PacBio Size selection using an 10 Kb elution window on a BluePippin device from Sage Science Over 14.1x coverage in 47 SMRTcells using P5-C3 12.34x over 10kbp Mean: 10,232bp 4.1x over 20kb Max: 54,288bp

O. sativa pv Indica (IR64) Genome size: Chromosome N50: ~370 Mb ~29.7 Mbp Assembly Contig NG50 MiSeq Fragments 19,078 25x 456bp (3 runs 2x300 @ 450 FLASH) ALLPATHS-recipe 18,450 50x 2x100bp @ 180 36x 2x50bp @ 2100 51x 2x50bp @ 4800 ECTools 10.7x @ 10kbp 271,885 ECTools Read Lengths Mean: 9,348 Max: 54,288bp 10.75x over 10kbp

Next steps Optimization of large fragment isolation and purification Optimization of loading SMRT cells efficiency and consistency Target other yeast genomes fermentation strains and various mutation containing strains Complete coverage of rice IR64 Complete coverage of rice DJ123 (aus group)

Summary Long read sequencing of eukaryotic genomes is here Technologies are quickly improving, exciting new scaffolding technologies Recommendations < 100 Mbp: HGAP/PacBio2CA @ 100x PB C3-P5 expect near perfect chromosome arms < 1GB: HGAP/PacBio2CA @ 100x PB C3-P5 expect high quality assembly: contig N50 over 1Mbp > 1GB: hybrid/gap filling expect contig N50 to be 100kbp 1Mbp Ø 5GB: Email mschatz@cshl.edu Ø Poster Schatz Poster #221 @ 5:00pm

Acknowledgements McCombie Lab Panchajanya Deshpande Senem Mavruk Eskipehlivan Melissa Kramer Scott Ethe Sayers Sara Goodwin Eric Antoniou Cornell University Susan McCouch Lyza Maron Jim Hicks CSHL Rob Martienssen - CSHL Pacific Biosciences Schatz Lab James Gurtowski Hayan Lee Shoshana Marcus PacBio Cheryl Heiner Greg Khitrov