Genome Assembly Workshop Titles and Abstracts

Size: px
Start display at page:

Download "Genome Assembly Workshop Titles and Abstracts"

Transcription

1 Genome Assembly Workshop Titles and Abstracts TUESDAY, MARCH 15, :15 AM Richard Durbin, Wellcome Trust Sanger Institute A generic sequence graph exchange format for assembly and population variation Although the inputs and standard outputs of assemblers are sets of sequences (reads and contigs, respectively), all modern assemblers internally use a sequence graph representation that allows sequence segments to connect in multiple different ways. Furthermore, assembly is a multi-step process, but because sequence graph representations are private to assemblers, each software package typically must implement all steps. In addition, essentially the same sequence graph representation is natural for representing population variation in a species, with any individual genome composed of a walk through segments representing all the sequence observed in the population. I will discuss a generic exchange format and interface for sequence graphs, with initial draft implementation SQG (for SeQuence Graph), to support development of modular software. Sequence is attached to nodes, and arbitrary additional information can be attached to nodes or edges and carried through processing steps. There is a standard notation for walks that correspond to finite sequences that are consistent with the graph. As well as supporting modular assembly components, an aspiration is that availability of a toolkit including efficient search will enable users of assemblies to use the resulting graph in place of the set of contigs, since the graph is richer, and also support a population reference sequence representation that allows more accurate and complete alignment of reads than a single linear reference. If time is available I will show how the FM index used by BowTie, BWA, and SOAP can be extended to sequence graphs, supporting an efficient search process on SQG graphs. 08:45 AM Ian Korf, University of California, Davis Dent Earl, University of California, Santa Cruz Results of the Assemblathon 10:00 AM David B. Jaffe, Broad Institute High-quality draft assemblies of a dozen vertebrate genomes from massively parallel sequence data Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100 base) sequence reads at very low cost. While such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich Page 4

2 vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but 1000 times more expensive) capillary-based sequencing approach. We report the development of a new algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from a dozen vertebrate genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high ( 99.95%) and the scaffold sizes (e.g., N50 size = 11.5 Mb for human and 17.4 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of new sequencing technology and new computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at 10:30 AM Sante Gnerre, Broad Institute ALLPATH-LG algorithms for large genome assembly ALLPATHS-LG is a new algorithm to assemble low-cost ~100 base reads, producing high quality assemblies of genomes from megabase-sized bacteria to gigabase-size vertebrates. This process includes a series of innovations and optimizations. 1) Error correction. For each 24-mer, the algorithm examines the stack of reads containing it and then proposes edits to the reads, in cases where individual read bases differ from the overwhelming consensus of the stack. Read bases having conflicting status (from membership in multiple stacks) are not edited, thus avoiding false corrections. 2) Read doubling. Read pairs whose ends overlap or have only a small gap are merged together using a third read from some other pair, thus yielding double reads of size ~180 bases and thereby enabling use of a large minimum overlap K. This increases resilience to repeats. 3) Local assembly of low coverage regions. Despite sequencing at ~100x coverage, parts of the genome recalcitrant to sequencing can be poorly covered. We handle such data by allowing much smaller K in localized regions. 4) Optimized use of jumping reads. Assembly quality depends on linking from jumping libraries, yet their read pairs have artifacts: they contain circularization junctions and are polluted with nonjump pairs in reverse orientation. The algorithm handles these by locating junction points and by treating each pair as belonging to two libraries. 5) Computational performance. Finally, ALLPATHS-LG has been optimized and parallelized to reduce run time and memory usage, and for mammaliansize genomes it can be run on a commercial server (Dell R815, $39,000). Page 5

3 11:00 AM Daniel Zerbino, University of California, Santa Cruz Columbus: Templated assembly of partially mapped reads Since the advent of high-throughput sequencing, analysis tools had to be adapted to deal with the exponential increase in the quantity of data and the greater ambiguity posed by the shorter reads. This led to specialized mapping and de novo assembly tools that are now routinely used in large-scale projects. The mapping tools are extremely efficient computationally and can process large amounts of data, whereas the de novo assembly are more flexible with respect to new sequence structures. We therefore extended the Velvet de novo assembler with a module named Columbus which accepts the output of a generic read mapper (using SAM/BAM files) and helps Velvet anchor its assembly onto a known template, while still keeping all of its de novo assembly capacities. Columbus was tested on 17 mouse strain resequencing projects and was shown to resolve many more complex sequences than Velvet alone could. We expect that Columbus will allow users to easily design and implement novel analysis pipelines, which combine the computational efficiency and biological a priori of read mapping with the flexibility of de novo assembly. 11:30 AM Yingrui Li, BGI-Shenzhen NGS de novo assembly: Progresses and challenges Assembling new genomes from scratch has often been a hotspot for bioinformatics development. The issue becomes especially attractive when Now-Generation Sequencing (NGS) is available to provide large-scale but short-in-read-length data at a significantly lower per-base cost comparing with Sanger sequencing technology. Several types of algorithms have been applied to prove the concept that NGS could do de novo assembly for large genomes, yet the quality and continuity are always of great concern for annotations and follow-up studies. Here we present our progress and discuss explorations of challenging issues in NGS de novo assembly methodologies, especially on genomes with different levels of complexity. We believe that NGS assembly still has large potential to achieve more satisfactory results that bases further studies of any species. 01:00 PM Aaron Klammer, Pacific Biosciences, Inc., Menlo Park CA De novo assembly of Vibrio cholerae using Pacific Biosciences SMRT DNA sequencing technology We present the de novo assembly of the genome of the bacteria Vibrio cholerae using reads from Pacific Biosciences single-molecule real-time (SMRT ) DNA sequencing technology combined with Illumina short reads. Our approach uses the open source scaffolder Bambus along with other elements of AMOS assembly software package and employs several novel algorithms tailored to Pacific Biosciences reads. Using this suite of algorithms we are able to produce an assembly of the V. cholerae genome from 30X sequence coverage of PacBio long reads with significantly longer contig N50s than a comparable assembly using Illumina Page 6

4 reads alone. In addition, we scaffolded the V. cholerae assembled contigs using 20X sequence generated by the PacBio strobe sequencing technology a sequencing protocol that allows the linkage of multiple reads across large distances, in a fashion similar to mate-pair sequencing. The addition of strobe reads further increases the scaffold N50 for the V. cholerae genome by spanning of large repeats on the order of several kilobases. The use of PacBio long and strobe reads shows high promise for simplifying the completion of draft and finished bacterial genomes. 01:30 PM Jared Simpson, Sanger Centre Efficient assembly algorithms using the FM-index The assembly of large genomes from short reads remains a computational challenge. Currently available assemblers require either very large amounts of memory, typically in the hundreds of gigabytes, or a large compute cluster to assemble a human genome. To address this challenge, we have developed a set of efficient algorithms based on the FM-index data structure. As the FMindex is compressed, our method has a very low memory footprint. Using this data structure, we have designed parallel algorithms for error correction, read filtering, and string graph construction. We have packaged these algorithms into an assembler called SGA (for String Graph Assembler) which is opensource and available at github.com/jts/sga. In our talk, we will present the algorithms and results for a recent human genome assembly from 40X sequence data which required less than 60GB of memory. We will also discuss our experience with the assemblathon competition. WEDNESDAY, MARCH 16, :00 AM Jason Rafe Miller, J. Craig Venter Institute, Rockville MD HMP assembly analysis at JCVI The Human Microbiome Project (HMP) seeks to characterize the microbial load carried by healthy people. Bacterial populations were sampled from multiple individuals at several body sites and at up to three time points. Samples were analyzed by either 16S sequencing or metagenomics sequencing. Select bacterial strains are being cultured and sequenced to generate a reference genome collection. As part of the reference genome effort, our institute has put over 200 bacterial genomes through a high-throughput whole-genome shotgun pipeline based on next-generation sequencing technology. Reference strains are sequenced by Illumina paired end, 454 paired end, 454 unpaired, or some combination of those. Sequence coverage is adjusted using pooled libraries of bar-coded samples. Sequence data is assembled by several assembly programs. Assembled results are reviewed for completeness, accuracy, and signs of contamination, and compared with each other. At most one assembly per genome is submitted to the public databases. We will present analysis that was aimed at optimizing this pipeline. We will rate the utility of various measures of assembly quality and list features that ideal assemblers would self-report so as to facilitate assembly comparison. Page 7

5 08:30 AM Jay Shendure, University of Washington Experimental approaches to massively parallel contiguity mapping Massively parallel technologies have reduced the per-base cost of DNA sequencing by several orders of magnitude. However, limited read lengths and a lack of methods to establish contiguity over even modest distances have prevented these technologies from achieving the high-quality, low-cost de novo assembly of mammalian genomes. Even as revolutionary sequencing technologies further mature, it may continue to be the case that the best technologies in terms of cost-per-base yield reads that are of an insufficient length or quality for the effective de novo assembly of large genomes. To meet this need, we are exploring novel experimental strategies to facilitate the massively parallel recovery of contiguity information at different scales. 09:00 AM Jim Knight, Roche Newbler and large genome assembly This talk describes recent updates to the Newbler assembler for large genomes, including support for FASTQ files and hybrid assemblies of 454, Sanger, and/or Illumina sequences, as well as algorithms for handling diploid genome assembly. Updates on the new 454 long reads will also be presented. 09:30 AM Graham Ruby, University of California, San Francisco De novo genome assembly from metagenomic mixtures using PRICE Many organisms cannot be collected or cultured independently from their ecological surroundings. This is particularly true of disease-causing pathogens that directly depend on host biology to persist and replicate. The presence of large quantities of irrelevant sequence in metagenomic shotgun datasets poses a particular challenge to the assembly of pathogen genomes. In order to address this challenge, we have devised and implemented a strategy for genome assembly using paired-end reads and iterative contig extension (PRICE). We have applied this strategy to the targeted de novo assembly of novel viral genomes from complex metagenomic samples that were sequenced using low-cost, high-throughput, short-read DNA sequencing technology. We have also successfully applied PRICE to conventional (nonmeta) genome sequencing and de novo assembly. 10:15 AM Michael Schatz, Cold Spring Harbor Laboratory Assembly and validation of large genomes from short reads During my presentation I ll describe the short-read genome assembly pipeline developed in conjunction with the University of Maryland, the National Biodefense Analysis and Countermeasures Center, and the J. Craig Venter Institute. This pipeline includes the new algorithm Quake for pre-assembly sequence error correction and quality trimming, the Celera Assembler enhanced for Illumina sequences, and other related tools for post-assembly contig and scaffold refinement. I will describe the effectiveness of this pipeline for assembling short reads using four recently sequenced genomes ranging in size from 2 Mbp to 3 Gbp: Page 8

6 Staphylococcus aureus, Bombus impatiens (a species of bee), Linepithema humile (the Argentine ant), and human. The results of these assemblies, along with detailed comparisons to the assemblies of these data with other leading assemblers, are posted as part of our Genome Assembly Goldstandard Evaluations (GAGE), available at It is our hope with GAGE to produce a realistic assessment of the current state of the art in genome assembly software using real data in the rapidly changing field of next-generation sequencing. I will conclude my presentation by describing our genome assembly forensics pipeline for validating assemblies and discovering mis-assemblies. The pipeline includes various statistical tests for recognizing abnormal variations in depth of coverage, read heterogeneity, sequence composition, mate-pair placement, and read-breakpoint analysis. We find these mis-assembly signatures have nearly perfect sensitivity for detecting mis-assemblies, which can be used to guide assembly repair routines or reconcile differences between alternate assemblies. 10:45 AM Joan Pontius, National Cancer Institute, Frederick, Maryland A call for standardization of physical markers for use in the analysis of genome assemblies Physical maps of genomic markers of unique sequence (Sequence Tagged Sites, STSs) allow scaffolds from genome assemblies to be assigned to chromosome positions. The STSs are mapped using radiation hybrid (RH) experiments, analysis of genetic linkage, and cytogenetic analysis of chromosomes using fluorescent in situ hybridization (FISH). Physically mapped markers can also be used to assess the accuracy of the final assembly, for example, by helping to detect chimeric scaffolds which include markers derived from more than one chromosome. Chimeric scaffolds can also be detected when a scaffold sequence aligns to more than one chromosome of the genome assembly of a closely related species. Although some of these scaffolds may represent real rearrangements that have occurred over the course of evolution, others may uncover assembly artifacts (or physical map inaccuracies) that can be remedied. The importance of physical markers call for standardization of data describing, namely: 1) The primer sequences and the sequence of their PCR products should be documented so that accurate computational mapping to the assembly can be confirmed. 2) Ideally, markers should map to one and only one locus in the genome and also map to unique and orthologous regions in a second genome. 3) The genomics community would benefit by adopting a standard for mapping information, so that efficient computational methods may be developed for their use. Here we present a QC analysis of several vertebrate species genome sequences, with attention to scaffold chimerism, syntenic orthology, and physical map concordance with other genome assemblies. Page 9

7 11:15 AM David C. Schwartz, University of Wisconsin, Madison Optical mapping and nanocoding systems for genome assembly and analysis Modern sequence data acquisition and assembly techniques are rapidly increasing the quality of genome assemblies while decreasing their cost. Consequently, these developments are fueling the basis for efforts like the Genome 10K Project which will require equally innovative ways to effectively complete and validate genome assemblies as references for comparative studies. This problem becomes more acute as increasingly obscure species are sequenced and analyzed in the absence of associated scientific communities, detailed knowledge of life cycles, and/or genetic resources venerable elements used for the completion of genomes. Irrespective of any sequencing technology, multiple data types must be used for genome assembly and validation, since all measurements and analysis schemes have errors requiring complementary approaches for accurate, comprehensive mediation. In this regard, the Optical Mapping System, a purely singlemolecule platform, has cost-effectively complemented over 80 sequencing efforts through high-resolution physical maps scaffolding nascent sequence assemblies allowing comprehensive and independent validation across entire genomes. These genomes have included human, mouse, rat, rice, maize, and numerous fungal and bacterial genomes. We are now complementing Optical Mapping with a newer approach Nanocoding which is promising higher resolution, higher throughput, and lower costs. These advancements are providing the means for the broad dissemination of this new technology for genome assembly and analysis (structural variation) through greatly simplified systems. 11:45 AM Can Alkan, University of Washington "Dark side" of genomes: What s missing in current sequence assemblies? The advances in genome sequencing technology have opened the way to analyze genomes at a previously unimaginable pace. Building a draft reference genome assembly previously cost billions of dollars and took years. Now these can be done for a fraction of the cost and within a very short time frame. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short-sequence reads. We recently compared the recent de novo assemblies of human genomes using the short sequence reads generated using the Illumina platform and found that megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Recent improvements in sequence quality, larger insert sizes (or "jump libraries"), and algorithmic innovations promise to ameliorate this effect to generate better assemblies. In this talk, I will present genome quality comparisons, mainly based on the segmental duplication content, and compare the clone-by-clone sequencing (NCBI Build 36) with capillary-based WGS assemblies (Celera), short read sequencing (YH assembly with SOAP, and NA12878 with ALLPATHS-LG). I will also present similar analyses on non-human genome assemblies such as the Page 10

8 bonobo (454), gorilla (capillary and Illumina), and mouse (Illumina), and describe what we can expect" to miss in our analyses. 01:15 PM Deanna M. Church, Genome Reference Consortium and NCBI Assembly Groups Modernizing and managing genome assembly data As we celebrate the publications of the first draft human assemblies, it is useful to review what we have learned over the last decade. During this time we ve seen a dramatic improvement in the quality of the human reference as the public assembly continues to be improved and challenging regions finished. The availability of this data has increased our understanding of genomic biology and caused us to rethink the models we must use to represent an organism s genome. As part of our curation of the human genome, the Genome Reference Consortium (GRC) has helped propose a more robust assembly model that represents complex allelic variants in a way that facilitates annotation. Additionally, we have seen an explosion of genome assemblies from multiple species. With over 2,400 assemblies available in GenBank, robust management of assembly data is needed. While GenBank is well suited for tracking the history of a single sequence, most genome assemblies represent a collection of sequences, and there is a need to track both the relationship of these sequences as well as any metadata that is associated with the assembly. To this end we have developed an assembly database to manage assembly submission and retrieval. Finally, we are developing tools to allow for assembly comparison and quality assurance. 01:45 PM Federica DiPalma, Broad Institute What quality do we need to achieve for Genome 10K genomes? The Broad Institute has been involved in the sequencing of >30 vertebrate genomes. Our goal has always been to design genome projects of high scientific merit, produce high quality reference sequence, and to ensure that the community s needs are met. Genomes have been sequenced for various scientific reasons, including the generation of reference sequences for biomedical models, to study vertebrate evolution, and to generate a better annotation of the human genome through comparative sequence analysis. Different types of projects require different levels of accuracy and continuity in the assemblies, which in turn require different amounts and quality of DNA input. We will discuss these needs and how to achieve them. 02:15 PM Laurie Goodman, BGI- Shenzhen Page 11

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k Assembly and Validation of Large Genomes from Short Reads Michael Schatz March 16, 2011 Genome Assembly Workshop / Genome 10k A Brief Aside 4.7GB / disc ~20 discs / 1G Genome X 10,000 Genomes = 1PB Data

More information

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and

More information

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

The Diploid Genome Sequence of an Individual Human

The Diploid Genome Sequence of an Individual Human The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore

More information

Looking Ahead: Improving Workflows for SMRT Sequencing

Looking Ahead: Improving Workflows for SMRT Sequencing Looking Ahead: Improving Workflows for SMRT Sequencing Jonas Korlach FIND MEANING IN COMPLEXITY Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

de novo paired-end short reads assembly

de novo paired-end short reads assembly 1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads

More information

NGS developments in tomato genome sequencing

NGS developments in tomato genome sequencing NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC

More information

Genome Projects. Part III. Assembly and sequencing of human genomes

Genome Projects. Part III. Assembly and sequencing of human genomes Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences

More information

De Novo and Hybrid Assembly

De Novo and Hybrid Assembly On the PacBio RS Introduction The PacBio RS utilizes SMRT technology to generate both Continuous Long Read ( CLR ) and Circular Consensus Read ( CCS ) data. In this document, we describe sequencing the

More information

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation

More information

DNA Sequencing and Assembly

DNA Sequencing and Assembly DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

De novo Genome Assembly

De novo Genome Assembly De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece

More information

Understanding Accuracy in SMRT Sequencing

Understanding Accuracy in SMRT Sequencing Understanding Accuracy in SMRT Sequencing Jonas Korlach, Chief Scientific Officer, Pacific Biosciences Introduction Single Molecule, Real-Time (SMRT ) DNA sequencing achieves highly accurate sequencing

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

Bioinformatics for Microbial Biology

Bioinformatics for Microbial Biology Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:

More information

CBC Data Therapy. Metagenomics Discussion

CBC Data Therapy. Metagenomics Discussion CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

2nd (Next) Generation Sequencing 2/2/2018

2nd (Next) Generation Sequencing 2/2/2018 2nd (Next) Generation Sequencing 2/2/2018 Why do we want to sequence a genome? - To see the sequence (assembly) To validate an experiment (insert or knockout) To compare to another genome and find variations

More information

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

PacBio. The world s first single molecule, real-time DNA sequencer

PacBio. The world s first single molecule, real-time DNA sequencer PacBio The world s first single molecule, real-time DNA sequencer A revolutionary third generation DNA sequencing system incorporating novel single molecule sequencing with unprecedented readlengths to

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Alignment and Assembly

Alignment and Assembly Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

Bioinformatics for Genomics

Bioinformatics for Genomics Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. When I was young my Father

More information

From Infection to Genbank

From Infection to Genbank From Infection to Genbank How a pathogenic bacterium gets its genome to NCBI Torsten Seemann VLSCI - Life Sciences Computation Centre - Genomics Theme - Lab Meeting - Friday 27 April 2012 The steps 1.

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

Genome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA.

Genome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA. Genome Assembly Software for Different Technology Platforms PacBio Canu Falcon 10x SuperNova Illumina Soap Denovo Discovar Platinus MaSuRCA Experimental design using Illumina Platform Estimate genome size:

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Next-generation sequencing technologies

Next-generation sequencing technologies Next-generation sequencing technologies NGS applications Illumina sequencing workflow Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read

More information

Lecture 7. Next-generation sequencing technologies

Lecture 7. Next-generation sequencing technologies Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively

More information

GENES & GENOME DATABASES

GENES & GENOME DATABASES GENES & GENOME DATABASES BME 110/BIOL 181 Computational Biology Tools Prof. Todd Lowe April 5, 2012 ADMIN Discuss Fun Quiz Readings: Dummies Chapters 1, 2 (pp. 29-56), Ch 3; NYTimes piece on Jim Kent Assigned

More information

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law

More information

Microbiome: Metagenomics 4/4/2018

Microbiome: Metagenomics 4/4/2018 Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms

More information

Using New ThiNGS on Small Things. Shane Byrne

Using New ThiNGS on Small Things. Shane Byrne Using New ThiNGS on Small Things Shane Byrne Next Generation Sequencing New Things Small Things NGS Next Generation Sequencing = 2 nd generation of sequencing 454 GS FLX, SOLiD, GAIIx, HiSeq, MiSeq, Ion

More information

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel. DNA Sequencing T TM variation DNA amplicon mendelian trio genomics NGS bioinformatics tumor-normal custom SNP resequencing target validation de novo prediction personalized comparative genomics exome private

More information

Matthew Tinning Australian Genome Research Facility. July 2012

Matthew Tinning Australian Genome Research Facility. July 2012 Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909

More information

Genetics Lecture 21 Recombinant DNA

Genetics Lecture 21 Recombinant DNA Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of

More information

Building a platinum human genome assembly from single haplotype human genomes. Karyn Meltz Steinberg PacBio UGM December,

Building a platinum human genome assembly from single haplotype human genomes. Karyn Meltz Steinberg PacBio UGM December, Building a platinum human genome assembly from single haplotype human genomes Karyn Meltz Steinberg PacBio UGM December, 2015 @KMS_Meltzy Single haplotype from hydatidiform mole Enucleated egg (no maternal

More information

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased exponentially since the 1990s. In 2005, with the introduction

More information

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001 Pharmacogenetics: A SNPshot of the Future Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001 1 I. What is pharmacogenetics? It is the study of how genetic variation affects drug response

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

De novo meta-assembly of ultra-deep sequencing data

De novo meta-assembly of ultra-deep sequencing data De novo meta-assembly of ultra-deep sequencing data Hamid Mirebrahim 1, Timothy J. Close 2 and Stefano Lonardi 1 1 Department of Computer Science and Engineering 2 Department of Botany and Plant Sciences

More information

CloG: a pipeline for closing gaps in a draft assembly using short reads

CloG: a pipeline for closing gaps in a draft assembly using short reads CloG: a pipeline for closing gaps in a draft assembly using short reads Xing Yang, Daniel Medvin, Giri Narasimhan Bioinformatics Research Group (BioRG) School of Computing and Information Sciences Miami,

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence

More information

CSE182-L16. LW statistics/assembly

CSE182-L16. LW statistics/assembly CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 Introduction to Short Read Alignment UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 1 short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 2 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:

More information

Finishing of Fosmid 1042D14. Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae

Finishing of Fosmid 1042D14. Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae Schefkind 1 Adam Schefkind Bio 434W 03/08/2014 Finishing of Fosmid 1042D14 Abstract Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae genomic DNA. Through a comprehensive analysis of forward-

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book Prof. Tesler Math 186 & 283 Winter 2019 Prof. Tesler 5.1 Shotgun Sequencing Math 186 & 283 / Winter 2019

More information

Workflow of de novo assembly

Workflow of de novo assembly Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:

More information

Genome Assembly, part II. Tandy Warnow

Genome Assembly, part II. Tandy Warnow Genome Assembly, part II Tandy Warnow How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable

More information

de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ

de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ de novo transcriptome assembly de novo from the Latin expression meaning from the beginning In bioinformatics, we often use

More information

Analysis of Structural Variants using 3 rd generation Sequencing

Analysis of Structural Variants using 3 rd generation Sequencing Analysis of Structural Variants using 3 rd generation Sequencing Michael Schatz January 12, 2016 Bioinformatics / PAG XXIV @mike_schatz / #PAGXXIV Analysis of Structural Variants using 3 rd generation

More information

NEXT GENERATION SEQUENCING. Farhat Habib

NEXT GENERATION SEQUENCING. Farhat Habib NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Representing Errors and Uncertainty in Plasma Proteomics

Representing Errors and Uncertainty in Plasma Proteomics Representing Errors and Uncertainty in Plasma Proteomics David J. States, M.D., Ph.D. University of Michigan Bioinformatics Program Proteomics Alliance for Cancer Genomics vs. Proteomics Genome sequence

More information

Bionano Access : Assembly Report Guidelines

Bionano Access : Assembly Report Guidelines Bionano Access : Assembly Report Guidelines Document Number: 30255 Document Revision: A For Research Use Only. Not for use in diagnostic procedures. Copyright 2018 Bionano Genomics Inc. All Rights Reserved

More information

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database

More information

Assemblathon Summary Report

Assemblathon Summary Report Assemblathon Summary Report An overview of UC Davis results from Assemblathon 1: 2010/2011 Written by Keith Bradnam with results, analysis, and other contributions from Ian Korf, Joseph Fass, Aaron Darling,

More information

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome See the Difference With a commitment to your peace of mind, Life Technologies provides a portfolio of robust and scalable

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department

More information

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome Also: Sunaina Melissa Gardiner UTS Catherine Burke UTS Michael Liu UTS Chris Beitel UTS, UC Davis Matt DeMaere UTS Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin

More information

SMRT Analysis Barcoding Overview (v6.0.0)

SMRT Analysis Barcoding Overview (v6.0.0) SMRT Analysis Barcoding Overview (v6.0.0) Introduction This document applies to PacBio RS II and Sequel Systems using SMRT Link v6.0.0. Note: For information on earlier versions of SMRT Link, see the document

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

Conditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015

Conditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015 Conditional Random Fields, DNA Sequencing Armin Pourshafeie February 10, 2015 CRF Continued HMMs represent a distribution for an observed sequence x and a parse P(x, ). However, usually we are interested

More information

A Roadmap to the De-novo Assembly of the Banana Slug Genome

A Roadmap to the De-novo Assembly of the Banana Slug Genome A Roadmap to the De-novo Assembly of the Banana Slug Genome Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America April 6th-10th, 2015 Outline

More information

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11

N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 twitter: @assemblathon web: assemblathon.org Should N50 die in its role as a frequently used measure of genome assembly quality? Are there other

More information

Genome Sequencing and Assembly

Genome Sequencing and Assembly Genome Sequencing and Assembly History of Sequencing What was the first fully sequenced nucleic acid? Yeast trna (alanine trna) Robert Holley 1965 Image: Wikipedia History of Sequencing Sequencing began

More information

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

Genome 373: Mapping Short Sequence Reads II. Doug Fowler Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half

More information

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN ... 2014 2015 2016 2017 ... 2014 2015 2016 2017 Synthetic

More information

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation Assembly of the Human Genome Daniel Huson Informatics Research Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment

More information

Developing Tools for Rapid and Accurate Post-Sequencing Analysis of Foodborne Pathogens. Mitchell Holland, Noblis

Developing Tools for Rapid and Accurate Post-Sequencing Analysis of Foodborne Pathogens. Mitchell Holland, Noblis Developing Tools for Rapid and Accurate Post-Sequencing Analysis of Foodborne Pathogens Mitchell Holland, Noblis Agenda Introduction Whole Genome Sequencing Analysis Pipeline Sequence Alignment SNPs and

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Bioinformatics Advance Access published October 8, 2012 COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Binghang Liu 1,2,, Jianying Yuan 2,, Siu-Ming Yiu 1,3,

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

State of the art de novo assembly of human genomes from massively parallel sequencing data

State of the art de novo assembly of human genomes from massively parallel sequencing data State of the art de novo assembly of human genomes from massively parallel sequencing data Yingrui Li, 1 Yujie Hu, 1,2 Lars Bolund 1,3 and Jun Wang 1,2* 1 BGI-Shenzhen, Shenzhen, Guangdong 518083, China

More information

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA Whole Human Genome Sequencing Report This is a technical summary report for PG0002601-DNA Physician and Patient Information Physician name: Vinodh Naraynan Address: Suite 406 222 West Thomas Road Phoenix

More information

Next G eneration Generation Microbial Microbial Genomics : The H uman Human Microbiome P roject Project George Weinstock

Next G eneration Generation Microbial Microbial Genomics : The H uman Human Microbiome P roject Project George Weinstock Next Generation Microbial Genomics: The Human Microbiome Project George Weinstock San Rocco: Protector from Infectious Diseases Large genome centers All have metagenomics programs Baylor College of Medicine

More information

Using LTC Software for Physical Mapping and Assisting in Sequence Assembly

Using LTC Software for Physical Mapping and Assisting in Sequence Assembly Using LTC Software for Physical Mapping and Assisting in Sequence Assembly Z. Frenkel, V. Glikson, A. Korol Institute of Evolution, University of Haifa See also poster P1194 PAGXXIII, San Diego, January

More information