Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation
|
|
- Loren Betty Mosley
- 5 years ago
- Views:
Transcription
1 Assembly of the Human Genome Daniel Huson Informatics Research
2 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at
3
4 Overview Why Sequence Assembly? A quick look at current sequencing technology BAC-by-BAC Sequencing vs Whole Genome Shotgun brief comparison Compartmentalized Assembly Interim approach to get the best of both worlds Mate-Pair based Contig Assembly Problem and the Greedy Path-Merging Algorithm
5 Goal: Determine DNA Sequence base pairs of DNA (x150000)
6 DNA Sequencing Technology
7 DNA Sequencing Technology 110 capillary array with detection and load bar interface.
8 DNA Sequencing Technology
9
10 Sequencing DNA Current sequencing technology is highly automated and can determine large numbers of base pairs very quickly However, less than 900 base pairs can be read consecutively Hence, large pieces of contiguous DNA ( contigs ) must be assembled from such fragments
11 Three Stages of Genome Sequencing Shotgun Sequence Assembly
12 Human Genome Project 18 countries have human genome research programs. Australia, Brazil, Canada, China, Denmark, European Union, France, Germany, Israel, Italy, Japan, Korea, Mexico, The Netherlands, Russia, Sweden, United Kingdom, and the United States. ~1100 scientists involved The 5 major sequencing centers (USA/UK) are: DOE Joint Genome Institute Baylor College of Medicine Sanger Centre Washington University Genome Sequencing Center Whitehead Institute/MIT Center for Genome Research
13 Clone-by-Clone (Human Genome Project) Genome Physical mapping Minimal tiling set BAC clones ~150kbp For each BAC in tiling: (~ for human) Shotgun sequence Fragment assembly
14 BAC Clone Data from the HGP As of August 2000, GenBank contained ~33500 BAC clones in different phases of assembly: ~3000 phase-0 data sets (~100 pieces of size 1000) ~21000 phase-1/2 data sets (~10-20 pieces of size 5000~25000) ~9500 phase-3 data sets ( finished ~150kbp) Full sequence for chromosomes 21 and 22 published
15 Celera s Sequencing Factory 300 ABI 3700 DNA sequencers 50 production staff 40 support staff 20,000 sq. ft. of wet lab 20,000 sq. ft. of sequencing space Large group of computer scientists 1000 processors Close to 1 Terabyte of main memory 100 Terabytes of disk storage
16 Whole Genome Shotgun (Celera) Genome Shotgun Sequencing Assembly
17 Genome Assembly Two Approaches Clone-by-Clone: Take 7 copies of a page of an encyclopedia and randomly shred them into small pieces. Using the overlaps of pieces, reconstruct the page! Whole Genome Shotgun: Take 7 copies of a whole encyclopedia and randomly shred them. Using the overlap of pieces, reconstruct the whole book!
18 Comparison Clone-by-Clone: + Assembly problem easy and well understood 2 separate processes Clone libraries unstable, maps hard to complete Whole Genome Shotgun: + Single process, few library constructions Computationally harder Assembly of Drosophila proved the feasibility of WGS + Assembly Sequencing libraries must be made for every clone
19 Double-Barreled Shotgun Genome Shotgun Sequence m ~600bp ~600bp Fragments are read in mate pairs of known length m, typically 2k, 10k or 150k m
20 Happy Mates contig AGCTTGCATATAGCCATCNNNNCTACAAAC f g d Fragments that hit (align well to) contig Two mates f,g are happy,, if they are facing each other and d ~ library mean +/- 3*stddev They are unhappy,, if the distance is not correct, or if they are not facing each other
21 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Unitigger Scaffolder RepeatRez Consensus
22 Whole Genome Shotgun Assembly Pipeline Screener Mask contaminants and known repeats Overlapper Unitigger Scaffolder RepeatRez Consensus Repeats in the human genome: Microsatellites of the form x n where x is 3-6bp long and n is very large with 1-2% variation (telomeric and centromeric regions) 1 million Alus of length ~300 with 5-15% variation LINE repeats of length ~ kbp-long RNA pseudo-gene arrays in tandem number of kbp genome duplications many genes are present in multiple copies Repeats make up to 30% of the whole genome
23 Whole Genome Shotgun Assembly Pipeline Screener Find all overlaps 40bp allowing 6% mismatch. (1000X Blast) Overlapper Unitigger True vs repeat-induced overlaps: A B implies true A Scaffolder or B RepeatRez repeat - induced A repeat repeat B Consensus Use breakpoint detection to identify repeat induced overlaps
24 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Assembler Core Compute all consistent sub-assemblies = unitigs Identify those that cover unique DNA = U-unitigs Unitigger Scaffolder Uniquely Assemble-able Contig RepeatRez Consensus Typically we see a 30-fold reduction in pieces and a 100-fold reduction in overlaps.
25 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Assembler Core Compute all consistent sub-assemblies = unitigs Identify those that cover unique DNA = U-unitigs Unitigger Scaffolder RepeatRez Consensus
26 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Unitigger Scaffolder Build Scaffolds Mated reads unitigs Two unitigs are scaffolded if they are joined by at least 2 mates and if one is a U-unitig (error probability is 1 in ) scaffold RepeatRez Consensus Sequence gaps or repeat gaps (with estimated distances)
27 Whole Genome Shotgun Assembly Pipeline Screener Build Scaffolds Overlapper Unitigger Scaffolder RepeatRez Consensus
28 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks,, Stones and Pebbles Unitigger Scaffolder U-unitig Unitig>0 U-unitig RepeatRez 2 confirming mates or 3 confirming, 1 conflicting Consensus
29 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks,, Stones and Pebbles Unitigger Scaffolder RepeatRez Consensus
30 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks, Stones and Pebbles Unitigger Scaffolder RepeatRez Stones Confirming o-path Anchored mate and confirming overlap path Consensus
31 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks, Stones and Pebbles Unitigger Scaffolder RepeatRez Consensus
32 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks, Stones and Pebbles Unitigger Scaffolder RepeatRez Greedy overlap path between placed pieces, using quality values for scoring. Consensus
33 Whole Genome Shotgun Assembly Pipeline Screener Overlapper Fill Gaps using Rocks, Stones and Pebbles Unitigger Scaffolder RepeatRez Consensus
34 Whole Genome Shotgun Assembly Pipeline Screener Repeat Rez: 50kbp gap illustration Overlapper Unitigger Scaffolder RepeatRez Consensus
35 Whole Genome Shotgun Assembly Pipeline Screener Compute consensus sequence from fragments (using Baysian SNP consensus) Overlapper Unitigger Scaffolder RepeatRez Consensus
36 Whole Genome Shotgun of Drosophila Screener 8:37 Input: Overlapper Unitigger Scaffolder RepeatRez Consensus 86:25 38:29 4:12 30:53 25:00 Total: ~180 CPU hours millions Celera reads in 2kbp and 10kbp pairs 12,152 BAC-end pairs from the BDGP & EDGP Output: 114mbp of sequence Published in Science (March 2000)
37 Scaffold Lengths for Drosophila 25 scaffolds >100kbp (total = 116mbp) 813 mini-scafs: hetero-chromatic, breakup, gap-filling 97.7% contiguous
38 Contig Lengths for Drosophila There are 312 contigs >100kbp totalling 81mbp Mean contig length is 50kbp
39 Gap Lengths for Drosophila 3.04 Mbp in 1645 intra-scaffold gaps Only 13 larger than 10kbp Only 455 larger than 1kbp 75% are less than 1kbp long ~ 75% are sequencing gaps
40 Validation Against STS-map for Drosophila 50 scaffolds (114.8mbp) were aligned against the BDGP STS- content map All scaffolds spanning 2 or more STSs were checked for order discrepancies. 17 STS sites out of 2167 (.78%) were out of order, well within the estimated error rate of the STS map. 9 have been determined to be incorrect. BDGP STS Order 4 3L 3R 2L 2R X Celera Scaffold and STS Order
41 Whole Genome Shotgun of Human Incremental design Distributed design End-to-end run takes > CPU hours High water-mark of 32 GB of main memory
42 Compartmentalized Assembler ( Best of Both Worlds ) Interim approach using both the public clone-by-clone and Celera s WGS data Produces good results earlier than either individual strategy Started in January (2-5 people) Reuse of WGS code base (in C) Use of LEDA (C++)
43 Compartmentalized Assembler HGP BAC data 1 million contigs, average 4kbp clone-by-clone contigs fragment recruiter WGS mate pairs Celera data 28 million fragments, 70% in pairs Run overlapper on contigs and fragments, looking for good overlap alignments ( hits ) Assemble unfinished BAC clones using fragment matepairs Curate tiling path and assemble components of tiling recruited frags frag-contig hits combine assembler BAC clone assemblies Tiling and WGA on components Unrecruited fragments WGS assembler additional assemblies 5 million fragments
44 Combining Assembly Given a collection of contigs and a collection of recruited fragments Use mate-pair information to order and orient contigs into scaffolds Use mates to extend and merge contigs Main application: Phase-1/2 BAC data
45 Path-Merging Algorithm Mate links Contigs of BAC clone a:5000 c:5000 e:5000 d:1000 b:1000 Fragments that hit contigs
46 Path-Merging Algorithm Mate links m:500,w:2 Contigs of BAC clone a:5000 c:5000 e:5000 d:1000 b:1000 Bundle mate links Fragments that hit contigs
47 Path-Merging Algorithm Mate links a:5000 c:5000 e:5000 d:1000 b:1000 m:2000,w:4 m:500,w:2 Contigs of BAC clone Bundle mate links Fragments that hit contigs
48 Path-Merging Algorithm m:500,w:2 Mate links a:5000 c:5000 m:2000,w:4 m:2000,w:4 Contigs of BAC clone c:5000 e:5000 d:1000 b:1000 Bundle mate links Fragments that hit contigs
49 Path-Merging Algorithm m:500,w:2 Mate links a:5000 c:5000 m:2000,w:4 m:2000,w:4 Contigs of BAC clone c:5000 e:5000 d:1000 b:1000 m:500,w:2 Fragments that hit contigs Bundle mate links
50 Path-Merging Algorithm m:500,w:2 Mate links a:5000 c:5000 m:2000,w:4 m:2000,w:4 Contigs of BAC clone c:5000 e:5000 d:1000 b:1000 m:500,w:2 Fragments that hit contigs Bundle mate links m:500,w:2
51 Path-Merging Algorithm m:500,w:2 a:5000 c:5000 m:2000,w:4 m:2000,w:4 c:5000 e:5000 d:1000 b:1000 m:500,w:2 Bundle mate links m:500,w:2 Merge fragments into contigs
52 Path-Merging Algorithm m:500,w:2 a:5000 c:5000 m:2000,w:4 m:2000,w:4 c:5000 e:5000 d:1000 b:1000 m:500,w:2 Bundle mate links m:500,w:2 Merge fragments into contigs Greedily consider mate edges
53 Path-Merging Algorithm m:500,w:2 a:5000 c:5000 m:2000,w:4 b:1000 e:5000 d:1000 m:2000,w:4 m:500,w:2 Bundle mate links Merge fragments into contigs Greedily consider mate edges m:500,w:2
54 Path-Merging Algorithm m:500,w:2 a:5000 c:5000 m:2000,w:4 b:1000 e:5000 d:1000 m:2000,w:4 Bundle mate links m:500,w:2 Merge fragments into contigs Greedily consider mate edges m:500,w:2 m:500,w:2
55 Path-Merging Algorithm m:500,w:2 b:1000 a:5000 c:5000 m:2000,w:4 e:5000 d:1000 m:500,w:2 m:2000,w:4 m:500,w:2 Bundle mate links Merge fragments into contigs Greedily consider mate edges m:500,w:2
56 Path-Merging Algorithm m:500,w:2 a:5000 b:1000 c:5000 m:2000,w:4 e:5000 d:1000 m:500,w:2 m:2000,w:4 m:500,w:2 Bundle mate links Merge fragments into contigs Greedily consider mate edges
57 Path-Merging Algorithm a:5000 m:500,w:2 b:1000 m:500,w:2 c:5000 d:1000 e:5000 m:500,w:2 m:2000,w:4 m:2000,w:4 Bundle mate links Merge fragments into contigs Greedily consider mate edges Scaffolds are represented by yellow-blue paths in contig graph
58 Path-Merging Algorithm Bundle mate links Merge fragments into contigs Greedily consider mate edges Final result is scaffolding of contigs
59 Path-Merging Algorithm: Interleaving
60 Path-Merging Algorithm: Interleaving u w 0 e 0 v 0
61 Path-Merging Algorithm: Interleaving u e 0 w 0 v 0
62 Path-Merging Algorithm: Interleaving u w 1 v 0 e 0 w 0 v 1 e 1
63 Path-Merging Algorithm: Interleaving u w 1 v 1 e 1
64 Path-Merging Algorithm: Interleaving u e w 1 1 v 1
65 Path-Merging Algorithm: Interleaving u w 2 e 1 w 1 e 2 v 1 v 2
66 Path-Merging Algorithm: Interleaving u w 2 e 2 v 2
67 Path-Merging Algorithm: Interleaving u e 2 w 2 v 2
68 Path-Merging Algorithm: Interleaving u e 2 v 2 w 2
69 Path-Merging Algorithm: Interleaving u
70 Path-Merging Algorithm: Interleaving w 0 e 0 v 0
71 Path-Merging Algorithm: Interleaving w 0 e 0 v 0
72 Path-Merging Algorithm: Interleaving w 1 e 1 v 1 w 0 e 0 v 0
73 Path-Merging Algorithm: Interleaving w 1 e 1 v 1
74 Path-Merging Algorithm: Interleaving w 1 e 1 v 1
75 Path-Merging Algorithm: Interleaving e 1 w 1 v 1
76 Path-Merging Algorithm: Interleaving
77 When Do We Accept a Merge? Increase happy mates > increase unhappy mates f g Fragments that hit (align well to) contig d Two mates f,g are happy,, if they are facing each other and d ~ library mean +/- 3*stddev
78 When Do We Accept a Merger? All induced must -overlaps are found Increase in happiness must be larger than increase in unhappiness,, I.e.: H(P)-H(P 1 )+H(P 2 ) > U(P)-U(P 1 )-U(P 2 ) This can be based on: bundled mate edges, or individual mates.
79 Mate-Pair Based Contig Ordering Problem The problem Does scaffolding exist that has at least t happy mates? is NP-complete Reduction of BANDWIDTH (D.H.H., Knut Reinert and Gene Myers, RECOMB 01)
80 Tiling Graph Definition: Nodes are all public BACs and all unrecruited scaffolds Two nodes are connected by an edge if they represent overlapping or adjacent sequence Manually curate the graph to distinguish between true edges and repeat induced ones or ones caused by chimeric BACs
81 Tiling Graph
82 Compartmentalized Assembly 8.5mbp scaffold on chromosome 20
83 Reassembly of Chromosomes Input: Sequence of whole chromosome e.g. as assembled by Celera e.g. as assembled by Haussler ( golden path ) Recruit fragments (whole genome takes weeks on compute farm) Reassemble chromosome using path merging algorithm (whole genome: overnight on 16 processor machine)
84 Results The Sequence of the Human Genome is in press. J.Craig Venter et al. (approx. 280 authors)
85 Assembly Comparison: Chromosome 15 Publicly funded Project Fragment-based dot plot Celera Assembly
86 Assembly Comparison: Chromosome 15 Publicly funded Project 0 Celera Assembly 80 Mbp Fragment-based rearrangement plot
87 Assembly Comparison: Chromosome Mbp Mate pair coverage
88 Assembly Comparison: Chromosome Mbp Unhappy coverage
89 Assembly Comparison: Chromosome Mbp Unhappy coverage
90 Assembly Comparison: Chromosome Mbp Breakpoint detection
91 Assembly Comparison: Chromosome Mbp Close up
92 Chromosome Mbp
93 Chromosome Mbp
94 Assembly Comparison: Chromosome Mbp
95 Assembly Comparison: Chromosome Mbp Finished sequence
96 The Mouse as a Comparative Model Celera is currently sequencing mouse (4.5x coverage) Syntenic humanized-mousemouse WGS mouse Will reveal many genes not detectable by standard methods
97 PAX 6 affects eye development The fly is blind... The mouse is blind... The child is blind.
98 Sequencing the genome is just the beginning... DNA RNA Proteins Modified Proteins Biological Function Transcription Translation Post-Translation Modification 20-80,000 Genes > 1,000,000 Proteins
99 Credits WGS Team Regional Team Map Team Granger Sutton, Randy Bolanos, Ian Dew, Art Delcher, Michael Flanigan, Dan Fasulo, Saul Kravitz, Clark Mobarry, Knut Reinert, Karin Remington, Gene Myers Daniel Huson, Knut Reinert, Aaron Halpern, Saul Kravitz, Karin Remington, Art Delcher, Gene Myers Qing Zhang, Ellen Beasley, Rhonda Brandon, Lin Chen, Pat Dunn, Aaron Halpern, Zhongwu Lai, Yong Liang, Deborah Nusskern, Ming Zhan, Holly Zheng and many others...
100
Sequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationSequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements
More informationGenome Projects. Part III. Assembly and sequencing of human genomes
Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences
More informationAlignment and Assembly
Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which
More informationCourse summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.
Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More information10/20/2009 Comp 590/Comp Fall
Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments
More informationLecture 14: DNA Sequencing
Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing
More informationEach cell of a living organism contains chromosomes
COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More information3) This diagram represents: (Indicate all correct answers)
Functional Genomics Midterm II (self-questions) 2/4/05 1) One of the obstacles in whole genome assembly is dealing with the repeated portions of DNA within the genome. How do repeats cause complications
More informationThe Diploid Genome Sequence of an Individual Human
The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationChromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material
Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department
More informationDe novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationBarnacle: an assembly algorithm for clone-based sequences of whole genomes
Gene 320 (2003) 165 176 www.elsevier.com/locate/gene Barnacle: an assembly algorithm for clone-based sequences of whole genomes Vicky Choi*, Martin Farach-Colton Department of Computer Science, Rutgers
More informationDe novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club
De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation
More informationFinishing Drosophila ananassae Fosmid 2410F24
Nick Spies Research Explorations in Genomics Finishing Report Elgin, Shaffer and Leung 23 February 2013 Abstract: Finishing Drosophila ananassae Fosmid 2410F24 Finishing Drosophila ananassae fosmid clone
More informationBiol 478/595 Intro to Bioinformatics
Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationImproving Genome Assemblies without Sequencing
Improving Genome Assemblies without Sequencing Michael Schatz April 25, 2005 TIGR Bioinformatics Seminar Assembly Pipeline Overview 1. Sequence shotgun reads 2. Call Bases 3. Trim Reads 4. Assemble phred/tracetuner/kb
More informationAMOS Assembly Validation and Visualization
AMOS Assembly Validation and Visualization Michael Schatz Center for Bioinformatics and Computational Biology University of Maryland August 13, 2006 University of Hawaii Outline AMOS Validation Pipeline
More informationTowards Personal Genomics
Towards Personal Genomics Tools for Navigating the Genome of an Individual Saul A. Kravitz J. Craig Venter Institute Rockville, MD Bio-IT World 2008 Introduce yourself Relate our experience with individual
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationTruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)
tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law
More informationARACHNE: A Whole-Genome Shotgun Assembler
Methods Serafim Batzoglou, 1,2,3 David B. Jaffe, 2,3,4 Ken Stanley, 2 Jonathan Butler, 2 Sante Gnerre, 2 Evan Mauceli, 2 Bonnie Berger, 1,5 Jill P. Mesirov, 2 and Eric S. Lander 2,6,7 1 Laboratory for
More informationA draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries.
A draft sequence of bread wheat chromosome 7B based on individual MTP BAC sequencing using pair end and mate pair libraries. O. A. Olsen, T. Belova, B. Zhan, S. R. Sandve, J. Hu, L. Li, J. Min, J. Chen,
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationGenomics AGRY Michael Gribskov Hock 331
Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will
More informationBioinformatic analysis of Illumina sequencing data for comparative genomics Part I
Bioinformatic analysis of Illumina sequencing data for comparative genomics Part I Dr David Studholme. 18 th February 2014. BIO1033 theme lecture. 1 28 February 2014 @davidjstudholme 28 February 2014 @davidjstudholme
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationBioinformatics for Genomics
Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. When I was young my Father
More informationChapter 5. Structural Genomics
Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationادخ مانب DNA Sequencing یرهطم لضفلاوبا دیس
بنام خدا DNA Sequencing سید ابوالفضل مطهری Sequencing Problem Sequencing Problem S = X 1 X 2 X 3 X 4...X G 1 X G Sequencing Problem S = X 1 X 2 X 3 X 4...X G 1 X G finding the sequence Sequencing Problem
More informationSupplementary Materials and Methods
Supplementary Materials and Methods Scripts to run VirGA can be downloaded from https://bitbucket.org/szparalab, and documentation for their use is found at http://virga.readthedocs.org/. VirGA outputs
More informationSUPPLEMENTARY MATERIAL FOR THE PAPER: RASCAF: IMPROVING GENOME ASSEMBLY WITH RNA-SEQ DATA
SUPPLEMENTARY MATERIAL FOR THE PAPER: RASCAF: IMPROVING GENOME ASSEMBLY WITH RNA-SEQ DATA Authors: Li Song, Dhruv S. Shankar, Liliana Florea Table of contents: Figure S1. Methods finding contig connections
More informationNGS developments in tomato genome sequencing
NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC
More informationBENG 183 Trey Ideker. Genome Assembly and Physical Mapping
BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms
More informationPrimePCR Pricing and Bulk Discounts
Assay and Control Bulk Discounts 25% off 5 or more assays Primer Assay Pricing Product Catalog # Description List Price (EUR) Primer Assays (desalted) 10025636 Primer assay desalted, 200 reactions 96 10025637
More informationFinishing of Fosmid 1042D14. Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae
Schefkind 1 Adam Schefkind Bio 434W 03/08/2014 Finishing of Fosmid 1042D14 Abstract Project 1042D14 is a roughly 40 kb segment of Drosophila ananassae genomic DNA. Through a comprehensive analysis of forward-
More informationNext Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park
Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast
More informationPrimePCR Pricing and Bulk Discounts
Assay and Control Bulk Discounts 25% off 5 or more assays Primer Assay Pricing Product Catalog # Description List Price (CHF) Primer Assays (desalted) 10025636 Primer assay desalted, 200 reactions CHF
More informationPrimePCR Pricing and Bulk Discounts
Assay and Control Bulk Discounts 25% off 5 or more assays Primer Assay Pricing Product Catalog # Description List Price (DKK) Primer Assays (desalted) 10025636 Primer assay desalted, 200 reactions DKK
More informationN50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11
N50 must die!? Genome assembly workshop, Santa Cruz, 3/15/11 twitter: @assemblathon web: assemblathon.org Should N50 die in its role as a frequently used measure of genome assembly quality? Are there other
More informationFinishing Drosophila Ananassae Fosmid 2728G16
Finishing Drosophila Ananassae Fosmid 2728G16 Kyle Jung March 8, 2013 Bio434W Professor Elgin Page 1 Abstract For my finishing project, I chose to finish fosmid 2728G16. This fosmid carries a segment of
More informationGreene 1. Finishing of DEUG The entire genome of Drosophila eugracilis has recently been sequenced using Roche
Greene 1 Harley Greene Bio434W Elgin Finishing of DEUG4927002 Abstract The entire genome of Drosophila eugracilis has recently been sequenced using Roche 454 pyrosequencing and Illumina paired-end reads
More informationGENES & GENOME DATABASES
GENES & GENOME DATABASES BME 110/BIOL 181 Computational Biology Tools Prof. Todd Lowe April 5, 2012 ADMIN Discuss Fun Quiz Readings: Dummies Chapters 1, 2 (pp. 29-56), Ch 3; NYTimes piece on Jim Kent Assigned
More informationLander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book
Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book Prof. Tesler Math 186 & 283 Winter 2019 Prof. Tesler 5.1 Shotgun Sequencing Math 186 & 283 / Winter 2019
More informationIntroduction to Bioinformatics. Genome sequencing & assembly
Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationA Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology
Send Orders for Reprints to reprints@benthamscience.ae 210 The Open Biotechnology Journal, 2015, 9, 210-215 Open Access A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationAssembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k
Assembly and Validation of Large Genomes from Short Reads Michael Schatz March 16, 2011 Genome Assembly Workshop / Genome 10k A Brief Aside 4.7GB / disc ~20 discs / 1G Genome X 10,000 Genomes = 1PB Data
More informationConnect-A-Contig Paper version
Teacher Guide Connect-A-Contig Paper version Abstract Students align pieces of paper DNA strips based on the distance between markers to generate a DNA consensus sequence. The activity helps students see
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationA Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin.
1 A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. Main Window Figure 1. The Main Window is the starting point when Consed is opened. From here, you can access
More informationFinishing of DFIC This project sought to finish DFIC , the terminal 45 kb of the Drosophila
Lin 1 Kevin Lin Bio 434W Dr. Elgin 26 February 2016 Finishing of DFIC6622001 Abstract This project sought to finish DFIC6622001, the terminal 45 kb of the Drosophila ficusphila dot chromosome. The initial
More informationMI615 Syllabus Illustrated Topics in Advanced Molecular Genetics Provisional Schedule Spring 2010: MN402 TR 9:30-10:50
MI615 Syllabus Illustrated Topics in Advanced Molecular Genetics Provisional Schedule Spring 2010: MN402 TR 9:30-10:50 DATE TITLE LECTURER Thu Jan 14 Introduction, Genomic low copy repeats Pierce Tue Jan
More informationFrom Infection to Genbank
From Infection to Genbank How a pathogenic bacterium gets its genome to NCBI Torsten Seemann VLSCI - Life Sciences Computation Centre - Genomics Theme - Lab Meeting - Friday 27 April 2012 The steps 1.
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 0: Bioinformatics and the human health Cuncong Zhong Department of EECS University of Kansas The human genome project Watch video Hierarchical approach used
More informationState of the art de novo assembly of human genomes from massively parallel sequencing data
State of the art de novo assembly of human genomes from massively parallel sequencing data Yingrui Li, 1 Yujie Hu, 1,2 Lars Bolund 1,3 and Jun Wang 1,2* 1 BGI-Shenzhen, Shenzhen, Guangdong 518083, China
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationABSTRACT. We present a reliable, easy to implement algorithm to generate a set of highly
ABSTRACT Title of dissertation: IMPROVING GENOME ASSEMBLY Cevat Ustun, Doctor of Philosophy, 2005 Dissertation directed by: Professor Jim Yorke and Brian Hunt Department of Physics and Math We present
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationSupplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads
Supplementary Data for Hybrid error correction and de novo assembly of single-molecule sequencing reads Online Resources Pre&compiledsourcecodeanddatasetsusedforthispublication: http://www.cbcb.umd.edu/software/pbcr
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationFinishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome
Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Ruth Howe Bio 434W 27 February 2010 Abstract The fourth or dot chromosome of Drosophila species is composed primarily of highly condensed,
More informationChIP-seq and RNA-seq
ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)
More informationHigh quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple
High quality reference genome of the domestic sheep (Ovis aries) Yu Jiang and Brian P. Dalrymple CSIRO Livestock Industries on behalf of the International Sheep Genomics Consortium Outline of presentation
More informationDNA Sequencing and Assembly
DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their
More informationIQFISH on Dako Omnis. Panel for Lung Cancer. Dako FAST RESULTS. ALK, ROS1, RET and MET IQFISH. Dako Omnis. Agilent Pathology Solutions
PRODUCT INFORMATION Dako Omnis ALK, ROS1, RET and MET IQFISH Dako Agilent Pathology Solutions IQFISH on Dako Omnis Panel for Lung Cancer FAST RESULTS Fast, high-quality FISH Integrated into your IHC workflow
More informationPERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM and Look Ahead Approach
Title for Extending Contigs Using SVM and Look Ahead Approach Author(s) Zhu, X; Leung, HCM; Chin, FYL; Yiu, SM; Quan, G; Liu, B; Wang, Y Citation PLoS ONE, 2014, v. 9 n. 12, article no. e114253 Issued
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationCSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa
CSE/Beng/BIMM 182: Biological Data Analysis Instructor: Vineet Bafna TA: Nitin Udpa Today We will explore the syllabus through a series of questions? Please ASK All logistical information will be given
More informationApplied bioinformatics in genomics
Applied bioinformatics in genomics Productive bioinformatics in a genome sequencing center Heiko Liesegang Warschau 2005 The omics pyramid: 1. 2. 3. 4. 5. Genome sequencing Genome annotation Transcriptomics
More informationThe first generation DNA Sequencing
The first generation DNA Sequencing Slides 3 17 are modified from faperta.ugm.ac.id/newbie/download/pak_tar/.../instrument20072.ppt slides 18 43 are from Chengxiang Zhai at UIUC. The strand direction http://en.wikipedia.org/wiki/dna
More informationChIP-seq and RNA-seq. Farhat Habib
ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions
More informationAnalysis of RNA-seq Data
Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationde novo paired-end short reads assembly
1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation
More informationResearch Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence
http://genomebiology.com/2002/3/12/research/0079.1 Research Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence Susan E Celniker*, David A Wheeler, Brent
More informationWhole-Genome Sequencing and Assembly with High- Throughput, Short-Read Technologies
Whole-Genome Sequencing and Assembly with High- Throughput, Short-Read Technologies Andreas Sundquist 1 *, Mostafa Ronaghi 2, Haixu Tang 3, Pavel Pevzner 4, Serafim Batzoglou 1 1 Department of Computer
More informationCloG: a pipeline for closing gaps in a draft assembly using short reads
CloG: a pipeline for closing gaps in a draft assembly using short reads Xing Yang, Daniel Medvin, Giri Narasimhan Bioinformatics Research Group (BioRG) School of Computing and Information Sciences Miami,
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationTransposable Elements. (or, Jumping Genes)
Transposable Elements (or, Jumping Genes) Barbara McClintock (1902-1992) She and Marie Curie are the only sole female recipients of the Nobel Prize. She discovered and described the principles of transposable
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Glomus intraradices: Initial Whole-Genome Shotgun Sequencing and Assembly Results Permalink https://escholarship.org/uc/item/4c13k1dh
More informationProduct Catalog # Description List Price (JPY) Primer Assays (desalted)
Assays and Controls Primer Assay Pricing Primer Assays (desalted) 10025636 Primer assay desalted, 200 reactions 16,000 (Wet-lab validated human, mouse, and rat) 10025637 Primer assay desalted, 1,000 reactions
More informationMetagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome
Also: Sunaina Melissa Gardiner UTS Catherine Burke UTS Michael Liu UTS Chris Beitel UTS, UC Davis Matt DeMaere UTS Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin
More informationTranscriptome analysis
Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize
More informationAnalysis of structural variation. Alistair Ward - Boston College
Analysis of structural variation Alistair Ward - Boston College What is structural variation? What differentiates SV from short variants? What are the major SV types? Summary of MEI detection What is an
More informationDNA & Protein Synthesis. The source and the process!
DNA & Protein Synthesis The source and the process! Agenda I. DNA and Genes II. Protein Synthesis III. The Genetic Code I. DNA & Genes: The beauty of DNA Remember: DNA is a macromolecule that stores information
More informationPurpose of sequence assembly
Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery But not for transcript quantification Variant
More informationBrowser Exercises - I. Alignments and Comparative genomics
Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)
More informationReview of whole genome methods
Review of whole genome methods Suffix-tree based MUMmer, Mauve, multi-mauve Gene based Mercator, multiple orthology approaches Dot plot/clustering based MUMmer 2.0, Pipmaker, LASTZ 10/3/17 0 Rationale:
More information