CSCI2950-C DNA Sequencing and Fragment Assembly

Size: px
Start display at page:

Download "CSCI2950-C DNA Sequencing and Fragment Assembly"

Transcription

1 CSCI2950-C DNA Sequencing and Fragment Assembly Lecture 2: Sept. 7, DNA sequencing How we obtain the sequence of nucleotides of a species 5 3 ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT 3 5 1

2 Outline DNA Sequencing Technology: Old and New Fragment Assembly Problem Overlap-Layout-Consensus Sequencing by Hybridization Eulerian and Hamiltonian Graphs Goal: DNA Sequencing Find the complete sequence of A, C, G, T s in DNA molecule. Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence: letters at a time (Sanger sequencing) letters (Next-gen sequencing) 2

3 How to Sequence DNA? Co-opt machinery of the cell. Every time a cell divides it copies its DNA sequence. DNA Replication One strand used as template to make a copy of other strand. $!! $! $ " # $! # " # "! $ " # $! " # $!! # " # " # " # "!! $! $ " " # " #! $ # " " # " # # $ $! " # " $! " # # " $! 3

4 DNA Replication DNA Replication Suppose we could: 1) Stop this reaction and measure the last nucleotide added. 2) Measure the length of DNA molecules. 4

5 DNA Sequencing 1. Start at fixed location (primer) 2. Grow DNA chain 3. Include mixture of normal nucleotides (A,C,T,G) and modified (fluorescent/ radioactive) nucleotides. 4. Modified nucleotides stop reaction at all possible points 5. Separate products with length, using gel electrophoresis Sanger sequencing (F. Sanger Nobel prize 1980) Technical Limitations 1. Need a lot of DNA 2. Reaction only works for bp 1. Biology 2. Computer Science Solutions 5

6 DNA Sequencing Cloning Many host cells DNA amplification Different types of vectors VECTOR Plasmid (bacteria) Size of insert 2,000-10,000 Can control the size Fosmid (bacteria) 40,000 BAC (Bacterial Artificial Chromosome) YAC (Yeast Artificial Chromosome) 70, ,000 > 300,000 Not used much recently 6

7 Disadvantages of Traditional (Sanger) Sequencing 1. Cloning process is laborious. Grow (bacterial) cells, each containing a single fragment. 2. Only one fragment sequenced at a time. Next-generation DNA sequencing 1. Massively parallel sequencing reactions. 2. In situ amplification of DNA. No cloning required. 7

8 1. PREPARE GENOMIC DNA SAMPLE 2. ATTACH DNA TO SURFACE 3. BRIDGE AMPLIFICATION Illumina Sequencing DNA Adapters Adapter Adapter DNA fragment Dense lawn of primers Randomly fragment genomic DNA and ligate adapters to both ends of the fragments. Bind single-stranded fragments randomly to the inside surface of the flow cell channels. Add unlabeled nucleotides and enzyme to initiate solid-phase bridge amplification. 4. FRAGMENTS BECOME DOUBLE-STRANDED 5. DENATURE THE DOUBLE- STRANDED MOLECULES 6. COMPLETE AMPLIFICATION Attached Free terminus terminus Attached terminus Attached Attached Clusters The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate. Denaturation leaves single-stranded templates anchored to the substrate. Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell. Illumina Sequencing 7. DETERMINE FIRST BASE 8. IMAGE FIRST BASE 9. DETERMINE SECOND BASE Laser Laser The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase. After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified. The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase. 10. I MAGE SECOND CHEMISTRY CYCLE 11. SEQUENCING OVER MUL- TIPLE CHEMISTRY CYCLES 12. A LIGN DATA GCTGA... After laser excitation, the image is captured as before, and the identity of the second base is recorded. The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time. The data are aligned and compared to a reference, and sequencing differences are identified. 8

9 Goal: DNA Sequencing Find the complete sequence of A, C, G, T s in a DNA molecule. Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence: letters at a time (Sanger sequencing) letters (Next-gen sequencing) Issues 1. How to process data from sequencing machine? Base-calling ACTCGT 2. How to obtain sequence for longer DNA molecules? De novo assembly. 3. How to find differences from a reference genome: resequencing. 9

10 Sequence Trace Challenging to read answer 10

11 Challenging to read answer Challenging to read answer 11

12 Issues 1. How to process data from sequencing machine? Base-calling ACTCGT 2. How to obtain sequence for longer DNA molecules? De novo assembly. 3. How to find differences from a reference genome: resequencing. Sequencing Longer Regions 12

13 Fragment Assembly Fragment Assembly Computational Challenge: assemble individual short sequences (reads) into a single genomic sequence ( superstring ). Objective function? 13

14 Shortest Common Superstring Problem (SCS) Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s 1, s 2,., s n Output: A string s that Contains all strings s 1, s 2,., s n as substrings Has minimum length. Note: this formulation does not take into account errors in strings s i Shortest Common Superstring Problem: Example 14

15 Greedy Algorithm for SCS 1. Find pair of strings s i, s j with longest overlap. 2. Merge(s i, s j ) 3. Recurse How good is the greedy algorithm? Algorithm Analysis S = {s 1, s 2,., s n } Opt(S ) = length of SCS Claim: Greedy solution 4 Opt(S ). Conjecture: Greedy solution 2 Opt(S ). [Best known bound 2.5] Theorem: SCS is NP complete 15

16 SCS and Overlap Graph Build directed graph G = (V,E) V = {s 1, s 2,., s n } e = (s i, s j ) if prefix of s j matches suffix of s i w(s i, s j ) = length of overlap b/w s i, s j Goal: Find a maximum weight path visiting every VERTEX exactly once in the OVERLAP graph: Travelling Salesman (Hamiltonian path) problem Convert max to min by w -w SCS to TSP: An Example S = { ATC, CCA, CAG, TCC, AGT } SCS AGT CCA ATC ATCCAGT TCC CAG TSP 0 AGT 1 2 CAG ATC TCC CCA ATCCAGT 16

17 Problems 1. What if there is not a solution to SCS for real data? No superstring due to missing overlaps. 2. Is shortest string the correct objective function? Repeated sequences common in most genomes. Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? 17

18 Definition of Coverage Length of genomic segment: G Number of reads: N Length of each read: L Definition: Coverage c = n L / G How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides Lander-Waterman Statistics Given: N reads of length L from a genome of size G. Assume left end of reads uniformly distributed on genome. P(read R starts at ζ) = 1/G P(ζ covered by read R) = L/G ζ 18

19 Lander-Waterman Statistics Given: N reads of length L from a genome of size G P(ζ covered by read) = 1 (1 L/G) N 1 e -c, where c = N L / G is coverage P(ζ covered by read) 1 e -c Poisson approximation Rescale g = G/L, l = L/L. Left ends of reads are Poisson process with rate c = N L / G. Pr[ k reads in interval of length t] = e -ct (ct) k / k! c P(covered) Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? 19

20 Triazzle: A Fun Example Repeats make it very difficult! Try it (even on your IPhone): Repeats and Fragment Assembly Repeats a major problem for fragment assembly >50% of human (mammalian) genomes are repeats. ~5% for most bacterial genomes. 20

21 Repeat Types in Human Low-Complexity DNA (e.g. ATATATATACATA ) Microsatellite repeats (a 1 a k ) N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 10 6 copies LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies Repeats and Fragment Assembly Repeats: A major problem for fragment assembly Repeat Repeat Repeat Green and yellow fragments are interchangeable when assembling repetitive DNA 21

22 Questions 1. How many reads to sequence? 2. How to assemble in presence of missing data and/or repeated sequences? Dominant paradigms 1. Overlap-Layout-Consensus 2. debruijn graphs Overlap-Layout-Consensus Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA Overlap: find potentially overlapping reads Layout: Merge reads into contigs and contigs into supercontigs Consensus: derive the DNA sequence and correct read errors 22

23 Overlap Find best match between suffix of one read and the prefix of another r suffix(r) prefix(r ) r' Due to sequencing errors, need to use dynamic programming to find the optimal overlap alignment Given 2 DNA sequences v and w: v : w : ATGTTAT ATCGTAC m = 7 n = 7 Alignment : 2 * k matrix ( k > m, n ) letters of v letters of w A T -- G T T A T A T C G T -- A C 5 matches 1 insertions 1 deletions 1 mismatch 23

24 Edit Distance Levenshtein (1966) introduced edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) to transform one string into the other d(v,w) = MIN number of elementary operations to transform v w Computing Edit Distance Let v i = prefix of v of length i: v 1 v i w j = prefix of w of length j: w 1 w j Let a i,j = edit distance between v i and w j a i, j = min a i-1, j + 1 a i, j a i-1, j if v i w j Insert Delete Mismatch a i-1, j-1 if v i = w j Match Can replace +1 with different penalties/scores for different types of changes 24

25 Computing Alignment i-1,j -1 i-1,j 0 or 1 1 i,j -1 1 i,j a i, j = min a i-1, j + 1 a i, j Insert Delete a i-1, j if v i w j Mismatch a i-1, j-1 if v i = w j Match Every Path in the Grid Corresponds to an Alignment W A T C G V A T G V = A T - G T W= A T C G T 4 25

26 Edit Graph Weighted graph G with two distinct vertices, one labeled source one labeled sink Every alignment is path from source to sink. Alignment as a Path in the Edit Graph - Corresponding path - 26

27 Alignments in Edit Graph (cont d) and represent indels in v and w. represent matches or mismatches. Every path in the edit graph corresponds to an alignment: 27

28 Alignment Runtime O(nm) time to fill in the mxn dynamic programming matrix. Overlap Goal: find best match between the suffix of one read and the prefix of another Thus far: global alignment. How to fix? 28

29 Overlap Goal: find best match between the suffix of one read and the prefix of another Finding optimal overlap alignment for all pairs of N reads of length L: O(N 2 L 2 ) Improve efficiency: Apply a filtration method to filter out pairs of fragments that do not share a significantly long common substring Overlapping Reads Identify all k-mers in reads (k ~ 24) TACA TAGATTACACAGATTAC T GA TAGT TAGATTACACAGATTAC TAGA 29

30 Sources Genomes sequenced: Serafim Batzoglou CS262_2006/ (Sequencing slides) 30

10/20/2009 Comp 590/Comp Fall

10/20/2009 Comp 590/Comp Fall Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments

More information

Lecture 14: DNA Sequencing

Lecture 14: DNA Sequencing Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing

More information

DNA sequencing. Course Info

DNA sequencing. Course Info DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu

More information

The first generation DNA Sequencing

The first generation DNA Sequencing The first generation DNA Sequencing Slides 3 17 are modified from faperta.ugm.ac.id/newbie/download/pak_tar/.../instrument20072.ppt slides 18 43 are from Chengxiang Zhai at UIUC. The strand direction http://en.wikipedia.org/wiki/dna

More information

Outline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15

Outline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15 Outline Introduction Lectures 22, 23: Sequence Assembly Spring 2015 March 27, 30, 2015 Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

We begin with a high-level overview of sequencing. There are three stages in this process.

We begin with a high-level overview of sequencing. There are three stages in this process. Lecture 11 Sequence Assembly February 10, 1998 Lecturer: Phil Green Notes: Kavita Garg 11.1. Introduction This is the first of two lectures by Phil Green on Sequence Assembly. Yeast and some of the bacterial

More information

1. A brief overview of sequencing biochemistry

1. A brief overview of sequencing biochemistry Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry

More information

CSE182-L16. LW statistics/assembly

CSE182-L16. LW statistics/assembly CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

Lectures 18, 19: Sequence Assembly. Spring 2017 April 13, 18, 2017

Lectures 18, 19: Sequence Assembly. Spring 2017 April 13, 18, 2017 Lectures 18, 19: Sequence Assembly Spring 2017 April 13, 18, 2017 1 Outline Introduction Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based

More information

Alignment and Assembly

Alignment and Assembly Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which

More information

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

Genetics and Genomics in Medicine Chapter 3. Questions & Answers Genetics and Genomics in Medicine Chapter 3 Multiple Choice Questions Questions & Answers Question 3.1 Which of the following statements, if any, is false? a) Amplifying DNA means making many identical

More information

Restriction Enzymes (endonucleases)

Restriction Enzymes (endonucleases) In order to understand and eventually manipulate DNA (human or otherwise) an array of DNA technologies have been developed. Here are some of the tools: Restriction Enzymes (endonucleases) In order to manipulate

More information

Chapter 7. DNA Microarrays

Chapter 7. DNA Microarrays Bioinformatics III Structural Bioinformatics and Genome Analysis Chapter 7. DNA Microarrays 7.9 Next Generation Sequencing 454 Sequencing Solexa Illumina Solid TM System Sequencing Process of determining

More information

Bioinformatics for Genomics

Bioinformatics for Genomics Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. When I was young my Father

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation

More information

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%

More information

Computational Biology 2. Pawan Dhar BII

Computational Biology 2. Pawan Dhar BII Computational Biology 2 Pawan Dhar BII Lecture 1 Introduction to terms, techniques and concepts in molecular biology Molecular biology - a primer Human body has 100 trillion cells each containing 3 billion

More information

Conditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015

Conditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015 Conditional Random Fields, DNA Sequencing Armin Pourshafeie February 10, 2015 CRF Continued HMMs represent a distribution for an observed sequence x and a parse P(x, ). However, usually we are interested

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

4. Analysing genes II Isolate mutants*

4. Analysing genes II Isolate mutants* .. 4. Analysing s II Isolate mutants* Using the mutant to isolate the classify mutants by complementation analysis wild type study phenotype of mutants mutant 1 - use mutant to isolate sequence put individual

More information

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering Chapter 8 Recombinant DNA and Genetic Engineering Genetic manipulation Ways this technology touches us Criminal justice The Justice Project, started by law students to advocate for DNA testing of Death

More information

Chapter 20 DNA Technology & Genomics. If we can, should we?

Chapter 20 DNA Technology & Genomics. If we can, should we? Chapter 20 DNA Technology & Genomics If we can, should we? Biotechnology Genetic manipulation of organisms or their components to make useful products Humans have been doing this for 1,000s of years plant

More information

Introduction to Bioinformatics. Genome sequencing & assembly

Introduction to Bioinformatics. Genome sequencing & assembly Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put

More information

NEXT GENERATION SEQUENCING. Farhat Habib

NEXT GENERATION SEQUENCING. Farhat Habib NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp

More information

Introduction to Molecular Biology

Introduction to Molecular Biology Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve

More information

DNA Sequencing and Assembly

DNA Sequencing and Assembly DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their

More information

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased exponentially since the 1990s. In 2005, with the introduction

More information

Human genome sequence February, 2001

Human genome sequence February, 2001 Computational Molecular Biology Symposium March 12 th, 2003 Carnegie Mellon University Organizer: Dannie Durand Sponsored by the Department of Biological Sciences and the Howard Hughes Medical Institute

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY. !! www.clutchprep.com CONCEPT: DNA CLONING DNA cloning is a technique that inserts a foreign gene into a living host to replicate the gene and produce gene products. Transformation the process by which

More information

Bootcamp: Molecular Biology Techniques and Interpretation

Bootcamp: Molecular Biology Techniques and Interpretation Bootcamp: Molecular Biology Techniques and Interpretation Bi8 Winter 2016 Today s outline Detecting and quantifying nucleic acids and proteins: Basic nucleic acid properties Hybridization PCR and Designing

More information

Recombinant DNA recombinant DNA DNA cloning gene cloning

Recombinant DNA recombinant DNA DNA cloning gene cloning DNA Technology Recombinant DNA In recombinant DNA, DNA from two different sources, often two species, are combined into the same DNA molecule. DNA cloning permits production of multiple copies of a specific

More information

A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin.

A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. 1 A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. Main Window Figure 1. The Main Window is the starting point when Consed is opened. From here, you can access

More information

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc. Chapter 20 Recombinant DNA Technology Copyright 2009 Pearson Education, Inc. 20.1 Recombinant DNA Technology Began with Two Key Tools: Restriction Enzymes and DNA Cloning Vectors Recombinant DNA refers

More information

Each cell of a living organism contains chromosomes

Each cell of a living organism contains chromosomes COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping

More information

Fun with DNA polymerase

Fun with DNA polymerase Fun with DNA polymerase Why would we want to be able to make copies of DNA? Can you think of a situation where you have only a small amount and would like more? Enzymatic DNA synthesis To use DNA polymerase

More information

Chapter 6 - Molecular Genetic Techniques

Chapter 6 - Molecular Genetic Techniques Chapter 6 - Molecular Genetic Techniques Two objects of molecular & genetic technologies For analysis For generation Molecular genetic technologies! For analysis DNA gel electrophoresis Southern blotting

More information

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Molecular Cell Biology - Problem Drill 11: Recombinant DNA Molecular Cell Biology - Problem Drill 11: Recombinant DNA Question No. 1 of 10 1. Which of the following statements about the sources of DNA used for molecular cloning is correct? Question #1 (A) cdna

More information

PLNT2530 (2018) Unit 6b Sequence Libraries

PLNT2530 (2018) Unit 6b Sequence Libraries PLNT2530 (2018) Unit 6b Sequence Libraries Molecular Biotechnology (Ch 4) Analysis of Genes and Genomes (Ch 5) Unless otherwise cited or referenced, all content of this presenataion is licensed under the

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

2 Gene Technologies in Our Lives

2 Gene Technologies in Our Lives CHAPTER 15 2 Gene Technologies in Our Lives SECTION Gene Technologies and Human Applications KEY IDEAS As you read this section, keep these questions in mind: For what purposes are genes and proteins manipulated?

More information

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain

More information

The Diploid Genome Sequence of an Individual Human

The Diploid Genome Sequence of an Individual Human The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.

More information

Lecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD

Lecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD Lecture 10 : Whole genome sequencing and analysis Introduction to Computational Biology Teresa Przytycka, PhD Sequencing DNA Goal obtain the string of bases that make a given DNA strand. Problem Typically

More information

NB536: Bioinformatics

NB536: Bioinformatics NB536: Bioinformatics Instructor Prof. Jong Kyoung Kim Department of New Biology Office: E4-613 E-mail: jkkim@dgist.ac.kr Homepage: https://scg.dgist.ac.kr Course website https://scg.dgist.ac.kr/index.php/courses

More information

CISC 889 Bioinformatics (Spring 2004) Lecture 3

CISC 889 Bioinformatics (Spring 2004) Lecture 3 CISC 889 Bioinformatics (Spring 004) Lecture Genome Sequencing Li Liao Computer and Information Sciences University of Delaware Administrative Have you visited The NCBI website? Have you read Hunter s

More information

B. Incorrect! Ligation is also a necessary step for cloning.

B. Incorrect! Ligation is also a necessary step for cloning. Genetics - Problem Drill 15: The Techniques in Molecular Genetics No. 1 of 10 1. Which of the following is not part of the normal process of cloning recombinant DNA in bacteria? (A) Restriction endonuclease

More information

7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau

7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau 7.1 Techniques for Producing and Analyzing DNA SBI4U Ms. Ho-Lau What is Biotechnology? From Merriam-Webster: the manipulation of living organisms or their components to produce useful usually commercial

More information

Matthew Tinning Australian Genome Research Facility. July 2012

Matthew Tinning Australian Genome Research Facility. July 2012 Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909

More information

Biotechnology Chapter 20

Biotechnology Chapter 20 Biotechnology Chapter 20 DNA Cloning DNA Cloning AKA Plasmid-based transformation or molecular cloning First off-let s sum up what happens. A plasmid is taken from a bacteria A gene is inserted into the

More information

Fatchiyah

Fatchiyah Fatchiyah Email: fatchiya@yahoo.co.id RNAs: mrna trna rrna RNAi DNAs: Protein: genome DNA cdna mikro-makro mono-poly single-multi Analysis: Identification human and animal disease Finger printing Sexing

More information

Biochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014

Biochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014 Biochemistry Dr. Shariq Syed Shariq AIKC/FinalYB/2014 What is DNA Sequence?? Our Genome is made up of DNA Biological instructions are written in our DNA in chemical form The order (sequence) in which nucleotides

More information

Selected Techniques Part I

Selected Techniques Part I 1 Selected Techniques Part I Gel Electrophoresis Can be both qualitative and quantitative Qualitative About what size is the fragment? How many fragments are present? Is there in insert or not? Quantitative

More information

Lecture Four. Molecular Approaches I: Nucleic Acids

Lecture Four. Molecular Approaches I: Nucleic Acids Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single

More information

Chapter 11: Applications of Biotechnology

Chapter 11: Applications of Biotechnology Chapter 11: Applications of Biotechnology Lecture Outline Enger, E. D., Ross, F. C., & Bailey, D. B. (2012). Concepts in biology (14th ed.). New York: McGraw- Hill. 11-1 Why Biotechnology Works 11-2 Biotechnology

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina

More information

Next-generation sequencing technologies

Next-generation sequencing technologies Next-generation sequencing technologies NGS applications Illumina sequencing workflow Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read

More information

Chapter 20 Biotechnology

Chapter 20 Biotechnology Chapter 20 Biotechnology Manipulation of DNA In 2007, the first entire human genome had been sequenced. The ability to sequence an organisms genomes were made possible by advances in biotechnology, (the

More information

Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR)

Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR) Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR) Progressing with the sequence of experiments, we are now ready to amplify the green

More information

Review of whole genome methods

Review of whole genome methods Review of whole genome methods Suffix-tree based MUMmer, Mauve, multi-mauve Gene based Mercator, multiple orthology approaches Dot plot/clustering based MUMmer 2.0, Pipmaker, LASTZ 10/3/17 0 Rationale:

More information

Genetics Lecture 21 Recombinant DNA

Genetics Lecture 21 Recombinant DNA Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of

More information

Next Generation Sequencing. Dylan Young Biomedical Engineering

Next Generation Sequencing. Dylan Young Biomedical Engineering Next Generation Sequencing Dylan Young Biomedical Engineering What is DNA? Molecule composed of Adenine (A) Guanine (G) Cytosine (C) Thymine (T) Paired as either AT or CG Provides genetic instructions

More information

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8 Bi 8 Lecture 4 DNA approaches: How we know what we know Ellen Rothenberg 14 January 2016 Reading: from Alberts Ch. 8 Central concept: DNA or RNA polymer length as an identifying feature RNA has intrinsically

More information

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing

More information

DNA Technology. Asilomar Singer, Zinder, Brenner, Berg

DNA Technology. Asilomar Singer, Zinder, Brenner, Berg DNA Technology Asilomar 1973. Singer, Zinder, Brenner, Berg DNA Technology The following are some of the most important molecular methods we will be using in this course. They will be used, among other

More information

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for Chapter 20 Biotechnology PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from Joan Sharp Copyright

More information

Biotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems.

Biotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems. MITE 2 S Biology Biotechnology Summer 2004 Austin Che Biotechnology is difficult to define but in general it s the use of biological systems to solve problems. Recombinant DNA consists of DNA assembled

More information

Genome Projects. Part III. Assembly and sequencing of human genomes

Genome Projects. Part III. Assembly and sequencing of human genomes Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA

More information

De novo meta-assembly of ultra-deep sequencing data

De novo meta-assembly of ultra-deep sequencing data De novo meta-assembly of ultra-deep sequencing data Hamid Mirebrahim 1, Timothy J. Close 2 and Stefano Lonardi 1 1 Department of Computer Science and Engineering 2 Department of Botany and Plant Sciences

More information

Chapter 20: Biotechnology

Chapter 20: Biotechnology Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter

More information

4/26/2015. Cut DNA either: Cut DNA either:

4/26/2015. Cut DNA either: Cut DNA either: Ch.20 Enzymes that cut DNA at specific sequences (restriction sites) resulting in segments of DNA (restriction fragments) Typically 4-8 bp in length & often palindromic Isolated from bacteria (Hundreds

More information

Molecular Genetics Techniques. BIT 220 Chapter 20

Molecular Genetics Techniques. BIT 220 Chapter 20 Molecular Genetics Techniques BIT 220 Chapter 20 What is Cloning? Recombinant DNA technologies 1. Producing Recombinant DNA molecule Incorporate gene of interest into plasmid (cloning vector) 2. Recombinant

More information

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

Genome 373: High- Throughput DNA Sequencing. Doug Fowler Genome 373: High- Throughput DNA Sequencing Doug Fowler Tasks give ML unity We learned about three tasks that are commonly encountered in ML Models/Algorithms Give ML Diversity Classification Regression

More information

Bio nformatics. Lecture 2. Saad Mneimneh

Bio nformatics. Lecture 2. Saad Mneimneh Bio nformatics Lecture 2 RFLP: Restriction Fragment Length Polymorphism A restriction enzyme cuts the DNA molecules at every occurrence of a particular sequence, called restriction site. For example, HindII

More information

BENG 183 Trey Ideker (the details )

BENG 183 Trey Ideker (the details ) BENG 183 Trey Ideker (the details ) (1) Devils in the details: Sequencing topics to be covered in today s lecture DNA preparation prior to sequencing Amplification: vectors or cycle sequencing PAGE and

More information

Genome Reassembly From Fragments. 28 March 2013 OSU CSE 1

Genome Reassembly From Fragments. 28 March 2013 OSU CSE 1 Genome Reassembly From Fragments 28 March 2013 OSU CSE 1 Genome A genome is the encoding of hereditary information for an organism in its DNA The mathematical model of a genome is a string of character,

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Purpose of sequence assembly

Purpose of sequence assembly Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery But not for transcript quantification Variant

More information

Lecture 2: High-Throughput Biology

Lecture 2: High-Throughput Biology Lecture 2: High-Throughput Biology COMP 465 Fall 2013 Study Chapter 3.8-3.11 8/27/2013 Comp 465 Fall 2013 1 Analyzing DNA Recall DNA is the essential information determining the function of living organisms

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

High-Throughput Assay Design. Microarrays. Applications. Overview. Algorithms Universal DNA Tag Array Design and Optimization

High-Throughput Assay Design. Microarrays. Applications. Overview. Algorithms Universal DNA Tag Array Design and Optimization Algorithms for Universal DNA Tag Array Design and Optimization Watson- Crick C o m p l e m e n t a r i t y Four nucleotide types: A,C,T,G A s paired with T s (2 hydrogen bonds) C s paired with G s (3 hydrogen

More information

NextGen Sequencing Technologies Sequencing overview

NextGen Sequencing Technologies Sequencing overview Outline Conventional NextGen High-throughput sequencing (Next-Gen sequencing) technologies. Illumina sequencing in detail. Quality control. Sequence coverage. Multiplexing. FASTQ files. Shendure and Ji

More information

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi Chapter 17. PCR the polymerase chain reaction and its many uses Prepared by Woojoo Choi Polymerase chain reaction 1) Polymerase chain reaction (PCR): artificial amplification of a DNA sequence by repeated

More information

Genetic Fingerprinting

Genetic Fingerprinting Genetic Fingerprinting Introduction DA fingerprinting In the R & D sector: -involved mostly in helping to identify inherited disorders. In forensics: -identification of possible suspects involved in offences.

More information

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects. Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing DNA Computing John E Savage DNA (Deoxyribonucleic Acid) DNA is double-stranded helix of nucleotides, nitrogen-containing molecules. It carries genetic information

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Bio-inspired Computing for Network Modelling

Bio-inspired Computing for Network Modelling ISSN (online): 2289-7984 Vol. 6, No.. Pages -, 25 Bio-inspired Computing for Network Modelling N. Rajaee *,a, A. A. S. A Hussaini b, A. Zulkharnain c and S. M. W Masra d Faculty of Engineering, Universiti

More information

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Genome Assembly Using de Bruijn Graphs. Biostatistics 666 Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position

More information

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 High Throughput Sequencing Technologies J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion

More information

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,

More information

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size. Student Name: All questions are worth 5 pts. each. GENETICS EXAM 3 FALL 2004 1. a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size. b) Name one of the materials (of the two

More information

Factors affecting PCR

Factors affecting PCR Lec. 11 Dr. Ahmed K. Ali Factors affecting PCR The sequences of the primers are critical to the success of the experiment, as are the precise temperatures used in the heating and cooling stages of the

More information