Purpose of sequence assembly

Size: px
Start display at page:

Download "Purpose of sequence assembly"

Transcription

1 Sequence Assembly

2 Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery But not for transcript quantification Variant discovery

3 Shear Genomic DNA chromosome sheared fragments

4 Sequence both ends of each fragment chromosome End sequences

5 Sequence both ends of each fragment chromosome End sequences

6 Align sequence reads to form contigs chromosome alignment

7 Paired ends allow linking of contigs into scaffolds chromosome contigs captured gaps scaffold In the sequence file, gaps are represented with Ns AGTCCCCTGGGAGATACGNNNNNNNNNNNNNNGATGATCAGCCGCATGAGCAG

8 Genome Assemblers

9 De Novo Genome Assembly Two major strategies: Overlap Layout Consensus Long reads 250 bp Pairwise comparison of reads to identify overlaps Eulerian paths/de Bruin graphs Short reads 250 bp Cataloging of subsequences (k-mers) Reconstruction of paths through the k-mers

10 Overlap Layout Consensus Fragment DNA Sequence fragments Compare all sequence reads in pairwise fashion Calculate number of overlapping bases Build a matrix

11 Overlap matrix

12 Determine Layout of Overlaps Examine best overlaps: Check their layout: GCATCGTG CATCGTGA 12. ATCGTGAT 20. AAGTGAAA 17. AGTGAAAC From: Computational Genome Analysis: An Introduction; Deonieret al.

13 Add new overlaps in a greedy fashion GACCGCAT ATGCGCAT GCATCGTG CATCGTGA ATCGTGAT GCGCATCG CGCAGCGC From: Computational Genome Analysis: An Introduction; Deonieret al.

14 Determine consensus sequence Consensus: GACCGCAT ATGCGCAT GCATCGTG CATCGTGA ATCGTGAT GCGCATCG CGCAGCGC GCGCATCGTGAT From: Computational Genome Analysis: An Introduction; Deonieret al.

15 OLC is computationally expensive 20 reads requires (20 x 20) 20 = 380 comparisons What about 10 million reads? An NP-complete problem

16 De Bruijn Graphs Break sequence reads into a set of overlapping subsequences of length k (k-mers) e.g. AGTTATCCG can be represented by the overlapping 3-mers: AGT, GTT, TTA, TAT, ATC, CCG Count how many times each k-mer occurs Place each k-mer at a node in a graph Make a path (edge) between nodes if their sequences overlap by k-1 (i.e. AGT ßà GTT) Assign the merged sequence to the edge (AGTT) Traverse each edge only once (or more if k-mer abundance implies a repeated sequence) Reconstruct genome from edge sequences

17 DeBruijn Graphs a C G A T A G T C G G Short-read sequencing b TGCAATG 3 GGCGTGC CGTGCAA ATGGCGT 5 CAATGGC Genome: ATGGCGT GGCGTGC CGTGCAA TGCAATG CAATGGC ATGGCGT ATGGCGTGCAATGGCGT Overlap Layout Consensus Vertices are k-mers Edges are pairwise alignments Vertices are (k 1)-mers Edges are k-mers De Bruijn Graphs c CAA 8 GCA 9 7 AAT TGC 10 6 ATG GTG 1 5 TGG CGT 2 4 GGC 3 GCG k-mers from vertices Genome: ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT ATG ATGGCGTGCAATG k-mers from edges d CAA CA 9 AA GT AAT 10 5 CGT AT CG 4 GCG TG GTG 6 TGC 7 8 GCA ATG 1 GC TGG 2 GG 3 GGC Hamiltonian cycle Visit each vertex once (harder to solve) Eulerian cycle Visit each edge once (easier to solve) From Compeau et al., Nature Biotech, 2011

18 Eulerian cycles with sequencing errors a ATGG TGGC GGCG GCGT CGTG GTGC TGCA GCAA CAAT ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT AATG b TGGA GGAG GAGT GGA GAG AGT ATGG TGGC GGCG GCGT CGTG GTGC TGCA GCAA CAAT ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT AGTG c From Compeau et al., Nature Biotech, 2011

19 Eulerian cycle with repeated sequences CAA CA AA 13 GT AAT 14 4 CGT 8 9 AT CG ATG 1 3 GCG 7 TG TGG 10 GTG 5 6 TGC 2 GC GG GGC GCA Genome: ATG TGC GCG CGT GTG TGC GCG CGT GTG TGG GGC GCA CAA AAT ATG ATGCGGTGCGTGGCAATG From Compeau et al., Nature Biotech, 2011

20 It was the best de Bruijn Graph Assembly was the best of the best of times, best of times, it of times, it was times, it was the it was the worst was the worst of the worst of times, worst of times, it After graph construction, try to simplify the graph as much as possible it was the age was the age of the age of foolishness the age of wisdom, age of wisdom, it of wisdom, it was wisdom, it was the

21 de Bruijn Graph Assembly It was the best of times, it it was the worst of times, it of times, it was the the age of foolishness After graph construction, try to simplify the graph as much as possible it was the age of the age of wisdom, it was the

22 Reference-based assembly Useful when a high-quality reference genome sequence is available

23 Inchworm Assembles transcripts (dominant isoforms) Reports novel portions of alternative transcripts Chrysalis Clusters inchworm contigs into groups representing all isoforms for a given gene Builds de Bruijn graphs for each transcript Butterfly builds transcripts by using actual reads to trace paths through the graphs

24 Transcriptome assembly - Inchworm Paralogous genes: Gene A Contigs Gene B Alternative transcripts: Gene C Transcript 1 Transcript 2

25 Transcriptome assembly - Chrysalis Contigs

26 Transcriptome assembly - Butterfly

27 Assembly metrics No. of scaffolds/contigs Largest scaffold/contig N50 scaffold/contig size 50% of genome contained in scaffolds/contigs of size N50 L50 Minimum number of scaffolds/contigs with summed length 50% of genome Genome coverage (read coverage) Each base represented by an average of X reads

28 This Morning s Exercises Assemble a bacterial genome sequence Velvet Generate an interleaved dataset Choose a suitable k-mer range Run assemblies with different k-mer lengths Examine assembly metrics Discovar de novo Generate assembly Compare assembly metrics with Velvet Supplemental exercises Run assemblies using quality-trimmed input data Refine velvet k-mer range for optimal performance

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

Genome Assembly, part II. Tandy Warnow

Genome Assembly, part II. Tandy Warnow Genome Assembly, part II Tandy Warnow How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable

More information

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina

More information

PCR analysis was performed to show the presence and the integrity of the var1csa and var-

PCR analysis was performed to show the presence and the integrity of the var1csa and var- Supplementary information: Methods: Table S1: Primer Name Nucleotide sequence (5-3 ) DBL3-F tcc ccg cgg agt gaa aca tca tgt gac tg DBL3-R gac tag ttt ctt tca ata aat cac tcg c DBL5-F cgc cct agg tgc ttc

More information

Lecture 11: Gene Prediction

Lecture 11: Gene Prediction Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Supplemental Data Supplemental Figure 1.

Supplemental Data Supplemental Figure 1. Supplemental Data Supplemental Figure 1. Silique arrangement in the wild-type, jhs, and complemented lines. Wild-type (WT) (A), the jhs1 mutant (B,C), and the jhs1 mutant complemented with JHS1 (Com) (D)

More information

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 1 short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 2 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:

More information

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR 1 The problem We wish to clone a yet unknown gene from a known

More information

de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ

de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ de novo transcriptome assembly de novo from the Latin expression meaning from the beginning In bioinformatics, we often use

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

Supplementary. Table 1: Oligonucleotides and Plasmids. complementary to positions from 77 of the SRα '- GCT CTA GAG AAC TTG AAG TAC AGA CTG C

Supplementary. Table 1: Oligonucleotides and Plasmids. complementary to positions from 77 of the SRα '- GCT CTA GAG AAC TTG AAG TAC AGA CTG C Supplementary Table 1: Oligonucleotides and Plasmids 913954 5'- GCT CTA GAG AAC TTG AAG TAC AGA CTG C 913955 5'- CCC AAG CTT ACA GTG TGG CCA TTC TGC TG 223396 5'- CGA CGC GTA CAG TGT GGC CAT TCT GCT G

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

Supporting information for Biochemistry, 1995, 34(34), , DOI: /bi00034a013

Supporting information for Biochemistry, 1995, 34(34), , DOI: /bi00034a013 Supporting information for Biochemistry, 1995, 34(34), 10807 10815, DOI: 10.1021/bi00034a013 LESNIK 10807-1081 Terms & Conditions Electronic Supporting Information files are available without a subscription

More information

NGS part 2: applications. Tobias Österlund

NGS part 2: applications. Tobias Österlund NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Assembly. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core

Assembly. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core Assembly Ian Misner, Ph.D. Bioinformatics Crash Course Multiple flavors to choose from De novo No prior sequence knowledge required Takes what you have and tries to build the best contigs/scaffolds possible

More information

10/20/2009 Comp 590/Comp Fall

10/20/2009 Comp 590/Comp Fall Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments

More information

Lecture 14: DNA Sequencing

Lecture 14: DNA Sequencing Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing

More information

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis 1 2 3 4 5 6 7 8 9 10 11 12 Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis Information Research). Exons

More information

Introduction to Bioinformatics. Genome sequencing & assembly

Introduction to Bioinformatics. Genome sequencing & assembly Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put

More information

Genome Assembly CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2018

Genome Assembly CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2018 Genome Assembly CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2018 Overview What is genome assembly? Steps in a genome assembly Planning an assembly project QC assessment of assemblies

More information

Electronic Supplementary Information

Electronic Supplementary Information Electronic Supplementary Material (ESI) for Molecular BioSystems. This journal is The Royal Society of Chemistry 2017 Electronic Supplementary Information Dissecting binding of a β-barrel outer membrane

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Disease and selection in the human genome 3

Disease and selection in the human genome 3 Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression

More information

De novo sequence assembly

De novo sequence assembly 2015.6.12 De novo sequence assembly 徐唯哲 Paul Wei Che HSU 中央研究院分子生物研究所研究助技師 Assistant Research Specialist Bioinformatics Service Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C. Bioinformatics

More information

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Supplementary Information. Construction of Lasso Peptide Fusion Proteins

Supplementary Information. Construction of Lasso Peptide Fusion Proteins Supplementary Information Construction of Lasso Peptide Fusion Proteins Chuhan Zong 1, Mikhail O. Maksimov 2, A. James Link 2,3 * Departments of 1 Chemistry, 2 Chemical and Biological Engineering, and

More information

CSCI2950-C DNA Sequencing and Fragment Assembly

CSCI2950-C DNA Sequencing and Fragment Assembly CSCI2950-C DNA Sequencing and Fragment Assembly Lecture 2: Sept. 7, 2010 http://cs.brown.edu/courses/csci2950-c/ DNA sequencing How we obtain the sequence of nucleotides of a species 5 3 ACGTGACTGAGGACCGTG

More information

ORFs and genes. Please sit in row K or forward

ORFs and genes. Please sit in row K or forward ORFs and genes Please sit in row K or forward https://www.flickr.com/photos/teseum/3231682806/in/photostream/ Question: why do some strains of Vibrio cause cholera and others don t? Methods Mechanisms

More information

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores. 1 Introduction 2 Chromosomes Topology & Counts 3 Genome size 4 Replichores and gene orientation 5 Chirochores 6 7 Codon usage 121 marc.bailly-bechet@univ-lyon1.fr Bacterial genome structures Introduction

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

II 0.95 DM2 (RPP1) DM3 (At3g61540) b

II 0.95 DM2 (RPP1) DM3 (At3g61540) b Table S2. F 2 Segregation Ratios at 16 C, Related to Figure 2 Cross n c Phenotype Model e 2 Locus A Locus B Normal F 1 -like Enhanced d Uk-1/Uk-3 149 64 36 49 DM2 (RPP1) DM1 (SSI4) a Bla-1/Hh-0 F 3 111

More information

PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells

PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells Supplementary Information for: PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells Ju Hye Jang 1, Hyun Kim 2, Mi Jung Jang 2, Ju Hyun Cho 1,2,* 1 Research Institute

More information

Meta-IDBA: A de Novo Assembler for Metagenomic Data

Meta-IDBA: A de Novo Assembler for Metagenomic Data Category Meta-IDBA: A de Novo Assembler for Metagenomic Data Yu Peng 1, Henry C.M. Leung 1, S.M. Yiu 1 and Francis Y.L. Chin 1,* 1 Department of Computer Science, Rm 301 Chow Yei Ching Building, The University

More information

Y-chromosomal haplogroup typing Using SBE reaction

Y-chromosomal haplogroup typing Using SBE reaction Schematic of multiplex PCR followed by SBE reaction Multiplex PCR Exo SAP purification SBE reaction 5 A 3 ddatp ddgtp 3 T 5 A G 3 T 5 3 5 G C 5 3 3 C 5 ddttp ddctp 5 T 3 T C 3 A 5 3 A 5 5 C 3 3 G 5 3 G

More information

Supplemental Data. mir156-regulated SPL Transcription. Factors Define an Endogenous Flowering. Pathway in Arabidopsis thaliana

Supplemental Data. mir156-regulated SPL Transcription. Factors Define an Endogenous Flowering. Pathway in Arabidopsis thaliana Cell, Volume 138 Supplemental Data mir156-regulated SPL Transcription Factors Define an Endogenous Flowering Pathway in Arabidopsis thaliana Jia-Wei Wang, Benjamin Czech, and Detlef Weigel Table S1. Interaction

More information

Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC

Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC Supplementary Appendixes Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC ACG TAG CTC CGG CTG GA-3 for vimentin, /5AmMC6/TCC CTC GCG CGT GGC TTC CGC

More information

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG

More information

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below). Protein Synthesis Instructions The purpose of today s lab is to: Understand how a cell manufactures proteins from amino acids, using information stored in the genetic code. Assemble models of four very

More information

de novo metagenome assembly

de novo metagenome assembly 1 de novo metagenome assembly Rayan Chikhi CNRS Univ. Lille 1 Formation metagenomique de novo metagenomics 2 de novo metagenomics Goal: biological sense out of sequencing data Techniques: 1. de novo assembly

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore

More information

Analysis of RNA-seq Data

Analysis of RNA-seq Data Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,

More information

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast

More information

evaluated with UAS CLB eliciting UAS CIT -N Libraries increase in the

evaluated with UAS CLB eliciting UAS CIT -N Libraries increase in the Supplementary Figures Supplementary Figure 1: Promoter scaffold library assemblies. Many ensembless of libraries were evaluated in this work. As a legend, the box outline color in top half of the figure

More information

Add 5µl of 3N NaOH to DNA sample (final concentration 0.3N NaOH).

Add 5µl of 3N NaOH to DNA sample (final concentration 0.3N NaOH). Bisulfite Treatment of DNA Dilute DNA sample to 2µg DNA in 50µl ddh 2 O. Add 5µl of 3N NaOH to DNA sample (final concentration 0.3N NaOH). Incubate in a 37ºC water bath for 30 minutes. To 55µl samples

More information

Table S1. Bacterial strains (Related to Results and Experimental Procedures)

Table S1. Bacterial strains (Related to Results and Experimental Procedures) Table S1. Bacterial strains (Related to Results and Experimental Procedures) Strain number Relevant genotype Source or reference 1045 AB1157 Graham Walker (Donnelly and Walker, 1989) 2458 3084 (MG1655)

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Bioinformatics Facility Cornell University Data generation Sequencing Platforms Short reads: Illumina Long reads: PacBio; Oxford Nanopore Contiging/Scaffolding

More information

Supplementary Figure 1A A404 Cells +/- Retinoic Acid

Supplementary Figure 1A A404 Cells +/- Retinoic Acid Supplementary Figure 1A A44 Cells +/- Retinoic Acid 1 1 H3 Lys4 di-methylation SM-actin VEC cfos (-) RA (+) RA 14 1 1 8 6 4 H3 Lys79 di-methylation SM-actin VEC cfos (-) RA (+) RA Supplementary Figure

More information

Bioinformatic analysis of Illumina sequencing data for comparative genomics Part I

Bioinformatic analysis of Illumina sequencing data for comparative genomics Part I Bioinformatic analysis of Illumina sequencing data for comparative genomics Part I Dr David Studholme. 18 th February 2014. BIO1033 theme lecture. 1 28 February 2014 @davidjstudholme 28 February 2014 @davidjstudholme

More information

Hes6. PPARα. PPARγ HNF4 CD36

Hes6. PPARα. PPARγ HNF4 CD36 SUPPLEMENTARY INFORMATION Supplementary Table Positions and Sequences of ChIP primers -63 AGGTCACTGCCA -79 AGGTCTGCTGTG Hes6-0067 GGGCAaAGTTCA ACOT -395 GGGGCAgAGTTCA PPARα -309 GGCTCAaAGTTCAaGTTCA CPTa

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 1, 2004 Prokaryotic Gene Structure prokaryotes are simplest free-living organisms studying prokaryotes can give us a sense what is the minimum

More information

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer TEACHER S GUIDE SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer SYNOPSIS This activity uses the metaphor of decoding a secret message for the Protein Synthesis process. Students teach themselves

More information

ΔPDD1 x ΔPDD1. ΔPDD1 x wild type. 70 kd Pdd1. Pdd3

ΔPDD1 x ΔPDD1. ΔPDD1 x wild type. 70 kd Pdd1. Pdd3 Supplemental Fig. S1 ΔPDD1 x wild type ΔPDD1 x ΔPDD1 70 kd Pdd1 50 kd 37 kd Pdd3 Supplemental Fig. S1. ΔPDD1 strains express no detectable Pdd1 protein. Western blot analysis of whole-protein extracts

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day five Alternative splicing Assembly RNA edits Alternative splicing

More information

Search for and Analysis of Single Nucleotide Polymorphisms (SNPs) in Rice (Oryza sativa, Oryza rufipogon) and Establishment of SNP Markers

Search for and Analysis of Single Nucleotide Polymorphisms (SNPs) in Rice (Oryza sativa, Oryza rufipogon) and Establishment of SNP Markers DNA Research 9, 163 171 (2002) Search for and Analysis of Single Nucleotide Polymorphisms (SNPs) in Rice (Oryza sativa, Oryza rufipogon) and Establishment of SNP Markers Shinobu Nasu, Junko Suzuki, Rieko

More information

Supplemental material

Supplemental material Supplemental material Diversity of O-antigen repeat-unit structures can account for the substantial sequence variation of Wzx translocases Yaoqin Hong and Peter R. Reeves School of Molecular Bioscience,

More information

De novo Genome Assembly

De novo Genome Assembly De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment

More information

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS) RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS) Terms and definitions TRANSCRIPTOME The full set of RNA transcripts and their

More information

Supporting Online Information

Supporting Online Information Supporting Online Information Isolation of Human Genomic DNA Sequences with Expanded Nucleobase Selectivity Preeti Rathi, Sara Maurer, Grzegorz Kubik and Daniel Summerer* Department of Chemistry and Chemical

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION 1. RNA/DNA sequences used in this study 2. Height and stiffness measurements on hybridized molecules 3. Stiffness maps at varying concentrations of target DNA 4. Stiffness measurements on RNA/DNA hybrids.

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

De novo genome assembly. Dr Torsten Seemann

De novo genome assembly. Dr Torsten Seemann De novo genome assembly Dr Torsten Seemann IMB Winter School - Brisbane Mon 1 July 2013 Introduction Ideal world I would not need to give this talk! Human DNA Non-existent USB3 device AGTCTAGGATTCGCTA

More information

SUPPLEMENTARY MATERIALS AND METHODS. E. coli strains, plasmids, and growth conditions. Escherichia coli strain P90C (1)

SUPPLEMENTARY MATERIALS AND METHODS. E. coli strains, plasmids, and growth conditions. Escherichia coli strain P90C (1) SUPPLEMENTARY MATERIALS AND METHODS E. coli strains, plasmids, and growth conditions. Escherichia coli strain P90C (1) dinb::kan (lab stock) derivative was used as wild-type. MG1655 alka tag dinb (2) is

More information

Supplementary Materials for

Supplementary Materials for www.sciencesignaling.org/cgi/content/full/10/494/eaan6284/dc1 Supplementary Materials for Activation of master virulence regulator PhoP in acidic ph requires the Salmonella-specific protein UgtL Jeongjoon

More information

Supplemental Data. Bennett et al. (2010). Plant Cell /tpc

Supplemental Data. Bennett et al. (2010). Plant Cell /tpc BRN1 ---------MSSSNGGVPPGFRFHPTDEELLHYYLKKKISYEKFEMEVIKEVDLNKIEPWDLQDRCKIGSTPQNEWYFFSHKDRKYPTGS 81 BRN2 --------MGSSSNGGVPPGFRFHPTDEELLHYYLKKKISYQKFEMEVIREVDLNKLEPWDLQERCKIGSTPQNEWYFFSHKDRKYPTGS 82 SMB

More information

Outline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15

Outline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15 Outline Introduction Lectures 22, 23: Sequence Assembly Spring 2015 March 27, 30, 2015 Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based

More information

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and

More information

RNASEQ WITHOUT A REFERENCE

RNASEQ WITHOUT A REFERENCE RNASEQ WITHOUT A REFERENCE Experimental Design Assembly in Non-Model Organisms And other (hopefully useful) Stuff Meg Staton mstaton1@utk.edu University of Tennessee Knoxville, TN I. Project Design Things

More information

SUPPLEMENTAL MATERIAL GENOTYPING WITH MULTIPLEXING TARGETED RESEQUENCING

SUPPLEMENTAL MATERIAL GENOTYPING WITH MULTIPLEXING TARGETED RESEQUENCING SUPPLEMENTAL MATERIAL GENOTYPING WITH MULTIPLEXING TARGETED RESEQUENCING All of the patients and control subjects were sequenced and genotyped in the same way. Shotgun libraries of approximately 250 bp

More information

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Genome Assembly Using de Bruijn Graphs. Biostatistics 666 Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

NESTED Sequence-based Typing (SBT) protocol for epidemiological typing of Legionella pneumophila directly from clinical samples

NESTED Sequence-based Typing (SBT) protocol for epidemiological typing of Legionella pneumophila directly from clinical samples NESTED Sequence-based Typing (SBT) protocol for epidemiological typing of Legionella pneumophila directly from clinical samples VERSION 2.0 SUMMARY This procedure describes the use of nested Sequence-Based

More information

Overexpression Normal expression Overexpression Normal expression. 26 (21.1%) N (%) P-value a N (%)

Overexpression Normal expression Overexpression Normal expression. 26 (21.1%) N (%) P-value a N (%) SUPPLEMENTARY TABLES Table S1. Alteration of ZNF322A protein expression levels in relation to clinicopathological parameters in 123 Asian and 74 Caucasian lung cancer patients. Asian patients Caucasian

More information

Genome Sequencing and Assembly

Genome Sequencing and Assembly Genome Sequencing and Assembly History of Sequencing What was the first fully sequenced nucleic acid? Yeast trna (alanine trna) Robert Holley 1965 Image: Wikipedia History of Sequencing Sequencing began

More information

Dierks Supplementary Fig. S1

Dierks Supplementary Fig. S1 Dierks Supplementary Fig. S1 ITK SYK PH TH K42R wt K42R (kinase deficient) R29C E42K Y323F R29C E42K Y323F (reduced phospholipid binding) (enhanced phospholipid binding) (reduced Cbl binding) E42K Y323F

More information

Causes and Effects of N-Terminal Codon Bias in Bacterial Genes. Mikk Eelmets Journal Club

Causes and Effects of N-Terminal Codon Bias in Bacterial Genes. Mikk Eelmets Journal Club Causes and Effects of N-Terminal Codon Bias in Bacterial Genes Mikk Eelmets Journal Club 21.2.214 Introduction Ribosomes were first observed in the mid-195s (Nobel Prize in 1974) Nobel Prize in 29 for

More information

De Novo Co-Assembly Of Bacterial Genomes From Multiple Single Cells

De Novo Co-Assembly Of Bacterial Genomes From Multiple Single Cells Wayne State University Wayne State University Theses 1-1-2014 De Novo Co-Assembly Of Bacterial Genomes From Multiple Single Cells Narjes Sadat Movahedi Tabrizi Wayne State University, Follow this and additional

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi: 10.1038/nature07182 SUPPLEMENTAL FIGURES AND TABLES Fig. S1. myf5-expressing cells give rise to brown fat depots and skeletal muscle (a) Perirenal BAT from control (cre negative) and myf5-cre:r26r3-yfp

More information

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity Homework Why cited articles are especially useful. citeulike science citation index When cutting and pasting less is more. Project Your protein: I will mail these out this weekend If you haven t gotten

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

Arabidopsis actin depolymerizing factor AtADF4 mediates defense signal transduction triggered by the Pseudomonas syringae effector AvrPphB

Arabidopsis actin depolymerizing factor AtADF4 mediates defense signal transduction triggered by the Pseudomonas syringae effector AvrPphB Arabidopsis actin depolymerizing factor mediates defense signal transduction triggered by the Pseudomonas syringae effector AvrPphB Files in this Data Supplement: Supplemental Table S1 Supplemental Table

More information

strain devoid of the aox1 gene [1]. Thus, the identification of AOX1 in the intracellular

strain devoid of the aox1 gene [1]. Thus, the identification of AOX1 in the intracellular Additional file 2 Identification of AOX1 in P. pastoris GS115 with a Mut s phenotype Results and Discussion The HBsAg producing strain was originally identified as a Mut s (methanol utilization slow) strain

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig. 1 Characterization of GSCs. a. Immunostaining of primary GSC spheres from GSC lines. Nestin (neural progenitor marker, red), TLX (green). Merged images of nestin,

More information

Cat. # Product Size DS130 DynaExpress TA PCR Cloning Kit (ptakn-2) 20 reactions Box 1 (-20 ) ptakn-2 Vector, linearized 20 µl (50 ng/µl) 1

Cat. # Product Size DS130 DynaExpress TA PCR Cloning Kit (ptakn-2) 20 reactions Box 1 (-20 ) ptakn-2 Vector, linearized 20 µl (50 ng/µl) 1 Product Name: Kit Component TA PCR Cloning Kit (ptakn-2) Cat. # Product Size DS130 TA PCR Cloning Kit (ptakn-2) 20 reactions Box 1 (-20 ) ptakn-2 Vector, linearized 20 µl (50 ng/µl) 1 2 Ligation Buffer

More information

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data Introduction The US Food and Drug Administration (FDA) has coordinated the Sequencing Quality Control project (SEQC/MAQC-III)

More information

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types

More information

Gene synthesis by circular assembly amplification

Gene synthesis by circular assembly amplification Gene synthesis by circular assembly amplification Duhee Bang & George M Church Supplementary figures and text: Supplementary Figure 1. Dpo4 gene (1.05kb) construction by various methods. Supplementary

More information

Lecture 18: Single-cell Sequencing and Assembly. Spring 2018 May 1, 2018

Lecture 18: Single-cell Sequencing and Assembly. Spring 2018 May 1, 2018 Lecture 18: Single-cell Sequencing and Assembly Spring 2018 May 1, 2018 1 SINGLE-CELL SEQUENCING AND ASSEMBLY 2 Single-cell Sequencing Motivation: Vast majority of environmental bacteria are unculturable

More information

Supplemental Information. Human Senataxin Resolves RNA/DNA Hybrids. Formed at Transcriptional Pause Sites. to Promote Xrn2-Dependent Termination

Supplemental Information. Human Senataxin Resolves RNA/DNA Hybrids. Formed at Transcriptional Pause Sites. to Promote Xrn2-Dependent Termination Supplemental Information Molecular Cell, Volume 42 Human Senataxin Resolves RNA/DNA Hybrids Formed at Transcriptional Pause Sites to Promote Xrn2-Dependent Termination Konstantina Skourti-Stathaki, Nicholas

More information

Multiplexing Genome-scale Engineering

Multiplexing Genome-scale Engineering Multiplexing Genome-scale Engineering Harris Wang, Ph.D. Department of Systems Biology Department of Pathology & Cell Biology http://wanglab.c2b2.columbia.edu Rise of Genomics An Expanding Toolbox Esvelt

More information

BioInformatics and Computational Molecular Biology. Course Website

BioInformatics and Computational Molecular Biology. Course Website BioInformatics and Computational Molecular Biology Course Website http://bioinformatics.uchc.edu What is Bioinformatics Bioinformatics upgrades the information content of biological measurements. Discovery

More information

Bioinformatics? Assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Case study! it s a FAKE ONE, do not run away in panic! There s an outbreak of Mycoplasma bovis

More information

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN COMP710, Bioinformatics with Julia, Test One, Thursday the 20 th of April, 2017, 09h30-11h30 1 NAME:...... MODEL ANSWER... STUDENT NUMBER:...... Maximum marks: 50 Internal Examiner: Hugh Murrell, Computer

More information

Lecture 19A. DNA computing

Lecture 19A. DNA computing Lecture 19A. DNA computing What exactly is DNA (deoxyribonucleic acid)? DNA is the material that contains codes for the many physical characteristics of every living creature. Your cells use different

More information

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change Generation of transcriptome resources in rubber in response to Corynespora cassiicola causing Corynespora leaf disease for gene discovery and marker identification using NGS platform C. Bindu Roy and T.

More information