CSCI2950-C DNA Sequencing and Fragment Assembly
|
|
- Gregory Cook
- 5 years ago
- Views:
Transcription
1 CSCI2950-C DNA Sequencing and Fragment Assembly Lecture 2: Sept. 7, DNA sequencing How we obtain the sequence of nucleotides of a species 5 3 ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT 3 5 1
2 Outline DNA Sequencing Technology: Old and New Fragment Assembly Problem Overlap-Layout-Consensus Sequencing by Hybridization Eulerian and Hamiltonian Graphs Goal: DNA Sequencing Find the complete sequence of A, C, G, T s in DNA molecule. Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence: letters at a time (Sanger sequencing) letters (Next-gen sequencing) 2
3 How to Sequence DNA? Co-opt machinery of the cell. Every time a cell divides it copies its DNA sequence. DNA Replication One strand used as template to make a copy of other strand. $!! $! $ " # $! # " # "! $ " # $! " # $!! # " # " # " # "!! $! $ " " # " #! $ # " " # " # # $ $! " # " $! " # # " $! 3
4 DNA Replication DNA Replication Suppose we could: 1) Stop this reaction and measure the last nucleotide added. 2) Measure the length of DNA molecules. 4
5 DNA Sequencing 1. Start at fixed location (primer) 2. Grow DNA chain 3. Include mixture of normal nucleotides (A,C,T,G) and modified (fluorescent/ radioactive) nucleotides. 4. Modified nucleotides stop reaction at all possible points 5. Separate products with length, using gel electrophoresis Sanger sequencing (F. Sanger Nobel prize 1980) Technical Limitations 1. Need a lot of DNA 2. Reaction only works for bp 1. Biology 2. Computer Science Solutions 5
6 DNA Sequencing Cloning Many host cells DNA amplification Different types of vectors VECTOR Plasmid (bacteria) Size of insert 2,000-10,000 Can control the size Fosmid (bacteria) 40,000 BAC (Bacterial Artificial Chromosome) YAC (Yeast Artificial Chromosome) 70, ,000 > 300,000 Not used much recently 6
7 Disadvantages of Traditional (Sanger) Sequencing 1. Cloning process is laborious. Grow (bacterial) cells, each containing a single fragment. 2. Only one fragment sequenced at a time. Next-generation DNA sequencing 1. Massively parallel sequencing reactions. 2. In situ amplification of DNA. No cloning required. 7
8 1. PREPARE GENOMIC DNA SAMPLE 2. ATTACH DNA TO SURFACE 3. BRIDGE AMPLIFICATION Illumina Sequencing DNA Adapters Adapter Adapter DNA fragment Dense lawn of primers Randomly fragment genomic DNA and ligate adapters to both ends of the fragments. Bind single-stranded fragments randomly to the inside surface of the flow cell channels. Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification. 4. FRAGMENTS BECOME DOUBLE-STRANDED 5. DENATURE THE DOUBLE- STRANDED MOLECULES 6. COMPLETE AMPLIFICATION Attached Free terminus terminus Attached terminus Attached Attached Clusters The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate. Denaturation leaves single-stranded templates anchored to the substrate. Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell. Illumina Sequencing 7. DETERMINE FIRST BASE 8. IMAGE FIRST BASE 9. DETERMINE SECOND BASE Laser Laser The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase. After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified. The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase. 10. I MAGE SECOND CHEMISTRY CYCLE 11. SEQUENCING OVER MUL- TIPLE CHEMISTRY CYCLES 12. A LIGN DATA GCTGA... After laser excitation, the image is captured as before, and the identity of the second base is recorded. The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time. The data are aligned and compared to a reference, and sequencing differences are identified. 8
9 Goal: DNA Sequencing Find the complete sequence of A, C, G, T s in a DNA molecule. Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence: letters at a time (Sanger sequencing) letters (Next-gen sequencing) Issues 1. How to process data from sequencing machine? Base-calling ACTCGT 2. How to obtain sequence for longer DNA molecules? De novo assembly. 3. How to find differences from a reference genome: resequencing. 9
10 Sequence Trace Challenging to read answer 10
11 Challenging to read answer Challenging to read answer 11
12 Issues 1. How to process data from sequencing machine? Base-calling ACTCGT 2. How to obtain sequence for longer DNA molecules? De novo assembly. 3. How to find differences from a reference genome: resequencing. Sequencing Longer Regions 12
13 Fragment Assembly Fragment Assembly Computational Challenge: assemble individual short sequences (reads) into a single genomic sequence ( superstring ). Objective function? 13
14 Shortest Common Superstring Problem (SCS) Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s 1, s 2,., s n Output: A string s that Contains all strings s 1, s 2,., s n as substrings Has minimum length. Note: this formulation does not take into account errors in strings s i Shortest Common Superstring Problem: Example 14
15 Greedy Algorithm for SCS 1. Find pair of strings s i, s j with longest overlap. 2. Merge(s i, s j ) 3. Recurse How good is the greedy algorithm? Algorithm Analysis S = {s 1, s 2,., s n } Opt(S ) = length of SCS Claim: Greedy solution 4 Opt(S ). Conjecture: Greedy solution 2 Opt(S ). [Best known bound 2.5] Theorem: SCS is NP complete 15
16 SCS and Overlap Graph Build directed graph G = (V,E) V = {s 1, s 2,., s n } e = (s i, s j ) if prefix of s j matches suffix of s i w(s i, s j ) = length of overlap b/w s i, s j Goal: Find a maximum weight path visiting every VERTEX exactly once in the OVERLAP graph: Travelling Salesman (Hamiltonian path) problem Convert max to min by w -w SCS to TSP: An Example S = { ATC, CCA, CAG, TCC, AGT } SCS AGT CCA ATC ATCCAGT TCC CAG TSP 0 AGT 1 2 CAG ATC TCC CCA ATCCAGT 16
17 Problems 1. What if there is not a solution to SCS for real data? No superstring due to missing overlaps. 2. Is shortest string the correct objective function? Repeated sequences common in most genomes. Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? 17
18 Definition of Coverage Length of genomic segment: G Number of reads: N Length of each read: L Definition: Coverage c = n L / G How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides Lander-Waterman Statistics Given: N reads of length L from a genome of size G. Assume left end of reads uniformly distributed on genome. P(read R starts at ζ) = 1/G P(ζ covered by read R) = L/G ζ 18
19 Lander-Waterman Statistics Given: N reads of length L from a genome of size G P(ζ covered by read) = 1 (1 L/G) N 1 e -c, where c = N L / G is coverage P(ζ covered by read) 1 e -c Poisson approximation Rescale g = G/L, l = L/L. Left ends of reads are Poisson process with rate c = N L / G. Pr[ k reads in interval of length t] = e -ct (ct) k / k! c P(covered) Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? 19
20 Triazzle: A Fun Example Repeats make it very difficult! Try it (even on your IPhone): Repeats and Fragment Assembly Repeats a major problem for fragment assembly >50% of human (mammalian) genomes are repeats. ~5% for most bacterial genomes. 20
21 Repeat Types in Human Low-Complexity DNA (e.g. ATATATATACATA ) Microsatellite repeats (a 1 a k ) N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 10 6 copies LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies Repeats and Fragment Assembly Repeats: A major problem for fragment assembly Repeat Repeat Repeat Green and yellow fragments are interchangeable when assembling repetitive DNA 21
22 Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? Dominant paradigms 1. Overlap-Layout-Consensus 2. debruijn graphs Overlap-Layout-Consensus Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA Overlap: find potentially overlapping reads Layout: Merge reads into contigs and contigs into supercontigs Consensus: derive the DNA sequence and correct read errors 22
23 Overlap Find best match between suffix of one read and the prefix of another r suffix(r) prefix(r ) r' Due to sequencing errors, need to use dynamic programming to find the optimal overlap alignment Given 2 DNA sequences v and w: v : w : ATGTTAT ATCGTAC m = 7 n = 7 Alignment : 2 * k matrix ( k > m, n ) letters of v letters of w A T -- G T T A T A T C G T -- A C 5 matches 1 insertions 1 deletions 1 mismatch 23
24 Edit Distance Levenshtein (1966) introduced edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) to transform one string into the other d(v,w) = MIN number of elementary operations to transform v w Computing Edit Distance Let v i = prefix of v of length i: v 1 v i w j = prefix of w of length j: w 1 w j Let a i,j = edit distance between v i and w j a i, j = min a i-1, j + 1 a i, j a i-1, j if v i w j Insert Delete Mismatch a i-1, j-1 if v i = w j Match Can replace +1 with different penalties/scores for different types of changes 24
25 Computing Alignment i-1,j -1 i-1,j 0 or 1 1 i,j -1 1 i,j a i, j = min a i-1, j + 1 a i, j Insert Delete a i-1, j if v i w j Mismatch a i-1, j-1 if v i = w j Match Every Path in the Grid Corresponds to an Alignment W A T C G V A T G V = A T - G T W= A T C G T 4 25
26 Edit Graph Weighted graph G with two distinct vertices, one labeled source one labeled sink Every alignment is path from source to sink. Alignment as a Path in the Edit Graph - Corresponding path - 26
27 Alignments in Edit Graph (cont d) and represent indels in v and w. represent matches or mismatches. Every path in the edit graph corresponds to an alignment: 27
28 Alignment Runtime O(nm) time to fill in the mxn dynamic programming matrix. Overlap Goal: find best match between the suffix of one read and the prefix of another Thus far: global alignment. How to fix? 28
29 Overlap Goal: find best match between the suffix of one read and the prefix of another Finding optimal overlap alignment for all pairs of N reads of length L: O(N 2 L 2 ) Improve efficiency: Apply a filtration method to filter out pairs of fragments that do not share a significantly long common substring Overlapping Reads Identify all k-mers in reads (k ~ 24) TACA TAGATTACACAGATTAC T GA TAGT TAGATTACACAGATTAC TAGA 29
30 Sources Genomes sequenced: Serafim Batzoglou CS262_2006/ (Sequencing slides) 30
10/20/2009 Comp 590/Comp Fall
Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments
More informationLecture 14: DNA Sequencing
Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing
More informationDNA sequencing. Course Info
DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu
More informationThe first generation DNA Sequencing
The first generation DNA Sequencing Slides 3 17 are modified from faperta.ugm.ac.id/newbie/download/pak_tar/.../instrument20072.ppt slides 18 43 are from Chengxiang Zhai at UIUC. The strand direction http://en.wikipedia.org/wiki/dna
More informationOutline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15
Outline Introduction Lectures 22, 23: Sequence Assembly Spring 2015 March 27, 30, 2015 Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationWe begin with a high-level overview of sequencing. There are three stages in this process.
Lecture 11 Sequence Assembly February 10, 1998 Lecturer: Phil Green Notes: Kavita Garg 11.1. Introduction This is the first of two lectures by Phil Green on Sequence Assembly. Yeast and some of the bacterial
More information1. A brief overview of sequencing biochemistry
Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationBiol 478/595 Intro to Bioinformatics
Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12
More informationLectures 18, 19: Sequence Assembly. Spring 2017 April 13, 18, 2017
Lectures 18, 19: Sequence Assembly Spring 2017 April 13, 18, 2017 1 Outline Introduction Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based
More informationAlignment and Assembly
Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which
More informationGenetics and Genomics in Medicine Chapter 3. Questions & Answers
Genetics and Genomics in Medicine Chapter 3 Multiple Choice Questions Questions & Answers Question 3.1 Which of the following statements, if any, is false? a) Amplifying DNA means making many identical
More informationRestriction Enzymes (endonucleases)
In order to understand and eventually manipulate DNA (human or otherwise) an array of DNA technologies have been developed. Here are some of the tools: Restriction Enzymes (endonucleases) In order to manipulate
More informationChapter 7. DNA Microarrays
Bioinformatics III Structural Bioinformatics and Genome Analysis Chapter 7. DNA Microarrays 7.9 Next Generation Sequencing 454 Sequencing Solexa Illumina Solid TM System Sequencing Process of determining
More informationBioinformatics for Genomics
Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. When I was young my Father
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More informationDe novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club
De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation
More informationBST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data
BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%
More informationComputational Biology 2. Pawan Dhar BII
Computational Biology 2 Pawan Dhar BII Lecture 1 Introduction to terms, techniques and concepts in molecular biology Molecular biology - a primer Human body has 100 trillion cells each containing 3 billion
More informationConditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015
Conditional Random Fields, DNA Sequencing Armin Pourshafeie February 10, 2015 CRF Continued HMMs represent a distribution for an observed sequence x and a parse P(x, ). However, usually we are interested
More informationBENG 183 Trey Ideker. Genome Assembly and Physical Mapping
BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms
More information4. Analysing genes II Isolate mutants*
.. 4. Analysing s II Isolate mutants* Using the mutant to isolate the classify mutants by complementation analysis wild type study phenotype of mutants mutant 1 - use mutant to isolate sequence put individual
More informationChapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering
Chapter 8 Recombinant DNA and Genetic Engineering Genetic manipulation Ways this technology touches us Criminal justice The Justice Project, started by law students to advocate for DNA testing of Death
More informationChapter 20 DNA Technology & Genomics. If we can, should we?
Chapter 20 DNA Technology & Genomics If we can, should we? Biotechnology Genetic manipulation of organisms or their components to make useful products Humans have been doing this for 1,000s of years plant
More informationIntroduction to Bioinformatics. Genome sequencing & assembly
Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put
More informationNEXT GENERATION SEQUENCING. Farhat Habib
NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp
More informationIntroduction to Molecular Biology
Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve
More informationDNA Sequencing and Assembly
DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their
More informationHuman Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased
Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased exponentially since the 1990s. In 2005, with the introduction
More informationHuman genome sequence February, 2001
Computational Molecular Biology Symposium March 12 th, 2003 Carnegie Mellon University Organizer: Dannie Durand Sponsored by the Department of Biological Sciences and the Howard Hughes Medical Institute
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationBIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.
!! www.clutchprep.com CONCEPT: DNA CLONING DNA cloning is a technique that inserts a foreign gene into a living host to replicate the gene and produce gene products. Transformation the process by which
More informationBootcamp: Molecular Biology Techniques and Interpretation
Bootcamp: Molecular Biology Techniques and Interpretation Bi8 Winter 2016 Today s outline Detecting and quantifying nucleic acids and proteins: Basic nucleic acid properties Hybridization PCR and Designing
More informationRecombinant DNA recombinant DNA DNA cloning gene cloning
DNA Technology Recombinant DNA In recombinant DNA, DNA from two different sources, often two species, are combined into the same DNA molecule. DNA cloning permits production of multiple copies of a specific
More informationA Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin.
1 A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. Main Window Figure 1. The Main Window is the starting point when Consed is opened. From here, you can access
More informationChapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.
Chapter 20 Recombinant DNA Technology Copyright 2009 Pearson Education, Inc. 20.1 Recombinant DNA Technology Began with Two Key Tools: Restriction Enzymes and DNA Cloning Vectors Recombinant DNA refers
More informationEach cell of a living organism contains chromosomes
COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping
More informationFun with DNA polymerase
Fun with DNA polymerase Why would we want to be able to make copies of DNA? Can you think of a situation where you have only a small amount and would like more? Enzymatic DNA synthesis To use DNA polymerase
More informationChapter 6 - Molecular Genetic Techniques
Chapter 6 - Molecular Genetic Techniques Two objects of molecular & genetic technologies For analysis For generation Molecular genetic technologies! For analysis DNA gel electrophoresis Southern blotting
More informationMolecular Cell Biology - Problem Drill 11: Recombinant DNA
Molecular Cell Biology - Problem Drill 11: Recombinant DNA Question No. 1 of 10 1. Which of the following statements about the sources of DNA used for molecular cloning is correct? Question #1 (A) cdna
More informationPLNT2530 (2018) Unit 6b Sequence Libraries
PLNT2530 (2018) Unit 6b Sequence Libraries Molecular Biotechnology (Ch 4) Analysis of Genes and Genomes (Ch 5) Unless otherwise cited or referenced, all content of this presenataion is licensed under the
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More information2 Gene Technologies in Our Lives
CHAPTER 15 2 Gene Technologies in Our Lives SECTION Gene Technologies and Human Applications KEY IDEAS As you read this section, keep these questions in mind: For what purposes are genes and proteins manipulated?
More informationReading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction
Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain
More informationThe Diploid Genome Sequence of an Individual Human
The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.
More informationLecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD
Lecture 10 : Whole genome sequencing and analysis Introduction to Computational Biology Teresa Przytycka, PhD Sequencing DNA Goal obtain the string of bases that make a given DNA strand. Problem Typically
More informationNB536: Bioinformatics
NB536: Bioinformatics Instructor Prof. Jong Kyoung Kim Department of New Biology Office: E4-613 E-mail: jkkim@dgist.ac.kr Homepage: https://scg.dgist.ac.kr Course website https://scg.dgist.ac.kr/index.php/courses
More informationCISC 889 Bioinformatics (Spring 2004) Lecture 3
CISC 889 Bioinformatics (Spring 004) Lecture Genome Sequencing Li Liao Computer and Information Sciences University of Delaware Administrative Have you visited The NCBI website? Have you read Hunter s
More informationB. Incorrect! Ligation is also a necessary step for cloning.
Genetics - Problem Drill 15: The Techniques in Molecular Genetics No. 1 of 10 1. Which of the following is not part of the normal process of cloning recombinant DNA in bacteria? (A) Restriction endonuclease
More information7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau
7.1 Techniques for Producing and Analyzing DNA SBI4U Ms. Ho-Lau What is Biotechnology? From Merriam-Webster: the manipulation of living organisms or their components to produce useful usually commercial
More informationMatthew Tinning Australian Genome Research Facility. July 2012
Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909
More informationBiotechnology Chapter 20
Biotechnology Chapter 20 DNA Cloning DNA Cloning AKA Plasmid-based transformation or molecular cloning First off-let s sum up what happens. A plasmid is taken from a bacteria A gene is inserted into the
More informationFatchiyah
Fatchiyah Email: fatchiya@yahoo.co.id RNAs: mrna trna rrna RNAi DNAs: Protein: genome DNA cdna mikro-makro mono-poly single-multi Analysis: Identification human and animal disease Finger printing Sexing
More informationBiochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014
Biochemistry Dr. Shariq Syed Shariq AIKC/FinalYB/2014 What is DNA Sequence?? Our Genome is made up of DNA Biological instructions are written in our DNA in chemical form The order (sequence) in which nucleotides
More informationSelected Techniques Part I
1 Selected Techniques Part I Gel Electrophoresis Can be both qualitative and quantitative Qualitative About what size is the fragment? How many fragments are present? Is there in insert or not? Quantitative
More informationLecture Four. Molecular Approaches I: Nucleic Acids
Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single
More informationChapter 11: Applications of Biotechnology
Chapter 11: Applications of Biotechnology Lecture Outline Enger, E. D., Ross, F. C., & Bailey, D. B. (2012). Concepts in biology (14th ed.). New York: McGraw- Hill. 11-1 Why Biotechnology Works 11-2 Biotechnology
More informationDe novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationOutline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing
Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina
More informationNext-generation sequencing technologies
Next-generation sequencing technologies NGS applications Illumina sequencing workflow Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read
More informationChapter 20 Biotechnology
Chapter 20 Biotechnology Manipulation of DNA In 2007, the first entire human genome had been sequenced. The ability to sequence an organisms genomes were made possible by advances in biotechnology, (the
More informationTexas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR)
Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR) Progressing with the sequence of experiments, we are now ready to amplify the green
More informationReview of whole genome methods
Review of whole genome methods Suffix-tree based MUMmer, Mauve, multi-mauve Gene based Mercator, multiple orthology approaches Dot plot/clustering based MUMmer 2.0, Pipmaker, LASTZ 10/3/17 0 Rationale:
More informationGenetics Lecture 21 Recombinant DNA
Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of
More informationNext Generation Sequencing. Dylan Young Biomedical Engineering
Next Generation Sequencing Dylan Young Biomedical Engineering What is DNA? Molecule composed of Adenine (A) Guanine (G) Cytosine (C) Thymine (T) Paired as either AT or CG Provides genetic instructions
More informationBi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8
Bi 8 Lecture 4 DNA approaches: How we know what we know Ellen Rothenberg 14 January 2016 Reading: from Alberts Ch. 8 Central concept: DNA or RNA polymer length as an identifying feature RNA has intrinsically
More informationA shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter
A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing
More informationDNA Technology. Asilomar Singer, Zinder, Brenner, Berg
DNA Technology Asilomar 1973. Singer, Zinder, Brenner, Berg DNA Technology The following are some of the most important molecular methods we will be using in this course. They will be used, among other
More informationBiotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for
Chapter 20 Biotechnology PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from Joan Sharp Copyright
More informationBiotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems.
MITE 2 S Biology Biotechnology Summer 2004 Austin Che Biotechnology is difficult to define but in general it s the use of biological systems to solve problems. Recombinant DNA consists of DNA assembled
More informationGenome Projects. Part III. Assembly and sequencing of human genomes
Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences
More informationComputational Biology I LSM5191
Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA
More informationDe novo meta-assembly of ultra-deep sequencing data
De novo meta-assembly of ultra-deep sequencing data Hamid Mirebrahim 1, Timothy J. Close 2 and Stefano Lonardi 1 1 Department of Computer Science and Engineering 2 Department of Botany and Plant Sciences
More informationChapter 20: Biotechnology
Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter
More information4/26/2015. Cut DNA either: Cut DNA either:
Ch.20 Enzymes that cut DNA at specific sequences (restriction sites) resulting in segments of DNA (restriction fragments) Typically 4-8 bp in length & often palindromic Isolated from bacteria (Hundreds
More informationMolecular Genetics Techniques. BIT 220 Chapter 20
Molecular Genetics Techniques BIT 220 Chapter 20 What is Cloning? Recombinant DNA technologies 1. Producing Recombinant DNA molecule Incorporate gene of interest into plasmid (cloning vector) 2. Recombinant
More informationGenome 373: High- Throughput DNA Sequencing. Doug Fowler
Genome 373: High- Throughput DNA Sequencing Doug Fowler Tasks give ML unity We learned about three tasks that are commonly encountered in ML Models/Algorithms Give ML Diversity Classification Regression
More informationBio nformatics. Lecture 2. Saad Mneimneh
Bio nformatics Lecture 2 RFLP: Restriction Fragment Length Polymorphism A restriction enzyme cuts the DNA molecules at every occurrence of a particular sequence, called restriction site. For example, HindII
More informationBENG 183 Trey Ideker (the details )
BENG 183 Trey Ideker (the details ) (1) Devils in the details: Sequencing topics to be covered in today s lecture DNA preparation prior to sequencing Amplification: vectors or cycle sequencing PAGE and
More informationGenome Reassembly From Fragments. 28 March 2013 OSU CSE 1
Genome Reassembly From Fragments 28 March 2013 OSU CSE 1 Genome A genome is the encoding of hereditary information for an organism in its DNA The mathematical model of a genome is a string of character,
More informationNext Generation Sequencing. Tobias Österlund
Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationPurpose of sequence assembly
Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery But not for transcript quantification Variant
More informationLecture 2: High-Throughput Biology
Lecture 2: High-Throughput Biology COMP 465 Fall 2013 Study Chapter 3.8-3.11 8/27/2013 Comp 465 Fall 2013 1 Analyzing DNA Recall DNA is the essential information determining the function of living organisms
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationHigh-Throughput Assay Design. Microarrays. Applications. Overview. Algorithms Universal DNA Tag Array Design and Optimization
Algorithms for Universal DNA Tag Array Design and Optimization Watson- Crick C o m p l e m e n t a r i t y Four nucleotide types: A,C,T,G A s paired with T s (2 hydrogen bonds) C s paired with G s (3 hydrogen
More informationNextGen Sequencing Technologies Sequencing overview
Outline Conventional NextGen High-throughput sequencing (Next-Gen sequencing) technologies. Illumina sequencing in detail. Quality control. Sequence coverage. Multiplexing. FASTQ files. Shendure and Ji
More informationChapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi
Chapter 17. PCR the polymerase chain reaction and its many uses Prepared by Woojoo Choi Polymerase chain reaction 1) Polymerase chain reaction (PCR): artificial amplification of a DNA sequence by repeated
More informationGenetic Fingerprinting
Genetic Fingerprinting Introduction DA fingerprinting In the R & D sector: -involved mostly in helping to identify inherited disorders. In forensics: -identification of possible suspects involved in offences.
More informationCourse summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.
Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution
More informationCSCI 2570 Introduction to Nanocomputing
CSCI 2570 Introduction to Nanocomputing DNA Computing John E Savage DNA (Deoxyribonucleic Acid) DNA is double-stranded helix of nucleotides, nitrogen-containing molecules. It carries genetic information
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationBio-inspired Computing for Network Modelling
ISSN (online): 2289-7984 Vol. 6, No.. Pages -, 25 Bio-inspired Computing for Network Modelling N. Rajaee *,a, A. A. S. A Hussaini b, A. Zulkharnain c and S. M. W Masra d Faculty of Engineering, Universiti
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationHigh Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014
High Throughput Sequencing Technologies J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationGENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.
Student Name: All questions are worth 5 pts. each. GENETICS EXAM 3 FALL 2004 1. a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size. b) Name one of the materials (of the two
More informationFactors affecting PCR
Lec. 11 Dr. Ahmed K. Ali Factors affecting PCR The sequences of the primers are critical to the success of the experiment, as are the precise temperatures used in the heating and cooling stages of the
More information