Genome Reassembly From Fragments. 28 March 2013 OSU CSE 1

Similar documents
Biology Evolution: Mutation I Science and Mathematics Education Research Group

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

The structure, type and functions of a cell are all determined by chromosomes:

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Biochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014

Genes and Gene Technology

Connect-A-Contig Paper version

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

DNA, Genes and Chromosomes. Vocabulary

Genetic tests are available for hundreds of disorders. DNA testing can pinpoint the exact genetic basis of a disorder.

Introduction to Molecular Biology

Chapter 8. Microbial Genetics. Lectures prepared by Christine L. Case. Copyright 2010 Pearson Education, Inc.

CSCI2950-C DNA Sequencing and Fragment Assembly

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

DNA. Using DNA to solve crimes

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

Introduction to Bioinformatics for Computer Scientists. Lecture 2

Genomes DNA Genes to Proteins. The human genome is a multi-volume instruction manual

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

O C. 5 th C. 3 rd C. the national health museum

GENETICS 1 Classification, Heredity, DNA & RNA. Classification, Objectives At the end of this sub section you should be able to: Heredity, DNA and RNA

Microbial Genetics. Chapter 8

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

DNA & THE GENETIC CODE DON T PANIC! THIS SECTION OF SLIDES IS AVAILABLE AT CLASS WEBSITE

DNA and genetic information

Introduction to Bioinformatics

Module 6 Microbial Genetics. Chapter 8

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

ادخ مانب DNA Sequencing یرهطم لضفلاوبا دیس

Recombinant DNA recombinant DNA DNA cloning gene cloning

Human Chromosomes Section 14.1

Lecture 2: Biology Basics Continued

Genetics and Biotechnology. Section 1. Applied Genetics

Introduction to Bioinformatics

A Brief Introduction to Bioinformatics

Conditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015

Chapter 9: DNA: The Molecule of Heredity

Appendix A DNA and PCR in detail DNA: A Detailed Look

Review of whole genome methods

DNA The Stuff of Life

What is Genetics? Genetics The study of how heredity information is passed from parents to offspring. The Modern Theory of Evolution =

MATH 5610, Computational Biology

What is DNA? It is the in which the

Basic concepts of molecular biology

Genetics: Analysis of Genes and Genomes, Ninth Edition. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION

Genome Sequence Assembly

SOURCES AND RESOURCES:

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

Page 1. C) DNA molecules, only D) both DNA and RNA molecules. C) nitrogenous bases D) amino acids. C) starch and glycogen D) fats and oils

2 Gene Technologies in Our Lives

Making sense of DNA For the genealogist

Nucleic Acids. One-letter abbreviation. 1.1* Base / Nucleotide DNA / RNA / Both. 1.2* Base / Nucleotide DNA / RNA / Both

13-2 Manipulating DNA Slide 1 of 32

4/3/2013. DNA Synthesis Replication of Bacterial DNA Replication of Bacterial DNA

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

NEXT GENERATION SEQUENCING. Farhat Habib

Chapter 7. DNA Microarrays

Chapter 1 The Genetics Revolution MULTIPLE-CHOICE QUESTIONS. Section 1.1 (The birth of genetics)

Semantic Information in Genetic Sequences

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Chapter 4 DNA Structure & Gene Expression

10/20/2009 Comp 590/Comp Fall

Chapter 15 The Human Genome Project and Genomics. Chapter 15 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning

Genetics and Biotechnology 13.2 DNA Technology

BIOLOGY 111. CHAPTER 6: DNA: The Molecule of Life

Chapter 12. DNA TRANSCRIPTION and TRANSLATION

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Transcription and Translation. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences

Lecture Three: Genes and Inheritance

THE COMPONENTS & STRUCTURE OF DNA

DNA AND PROTEIN SYSNTHESIS

Human genome sequence February, 2001

Lecture 2: Biology Basics Continued. Fall 2018 August 23, 2018

LECTURE 1 : GENETICS

CHAPTER 11 DNA NOTES PT. 4: PROTEIN SYNTHESIS TRANSCRIPTION & TRANSLATION

Introduction to 'Omics and Bioinformatics

The Molecular Basis of Inheritance

NUCLEIC ACIDS. DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides.

Introduction to Algorithms in Computational Biology Lecture 1

WARM UP. 1. Take out your laptop and Chapter 12 Notes 2. Log in to Google Classroom 3. Wait for me to post the quick quiz!

RNA & PROTEIN SYNTHESIS

Protein Synthesis: From Gene RNA Protein Trait

KEY CONCEPTS AND PROCESS SKILLS. 1. Blood types can be used as evidence about identity and about family relationships.

Chapter 10. DNA: The Molecule of Heredity. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc.

Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

Wednesday, November 22, 17. Exons and Introns

GENETICS. Genetics developed from curiosity about inheritance.

If Dna Has The Instructions For Building Proteins Why Is Mrna Needed

CELL BIOLOGY: DNA. Generalized nucleotide structure: NUCLEOTIDES: Each nucleotide monomer is made up of three linked molecules:

Roadmap. The Cell. Introduction to Molecular Biology. DNA RNA Protein Central dogma Genetic code Gene structure Human Genome

12/8/09 Comp 590/Comp Fall

BISHOPAgEd.Weebly.com. Weeks: Dates: 1/18-1/29 Unit: RNA &Protein Synthesis. Monday Tuesday Wednesday Thursday Friday. FFA Meeting 6pm 27 E

VISHVESHWARAIAH TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY. A seminar report on GENETIC ALGORITHMS.

Interest Grabber Notebook #1

1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below.

Resources. How to Use This Presentation. Chapter 10. Objectives. Table of Contents. Griffith s Discovery of Transformation. Griffith s Experiments

Reading for lecture 2

How is genome sequencing done?

Transcription:

Genome Reassembly From Fragments 28 March 2013 OSU CSE 1

Genome A genome is the encoding of hereditary information for an organism in its DNA The mathematical model of a genome is a string of character, where each character is one of 'A', 'C', 'G', or 'T', which stand for the names of the four nucleotides that occur on a DNA backbone 28 March 2013 OSU CSE 2

Quoted from Wikipedia: An analogy to the human genome stored on DNA is that of instructions stored in a book: The book (genome) contains 23 chapters (chromosomes); Each chapter contains 48 to 250 million letters (A,C,G,T) without spaces; Hence, the book contains over 3.2 billion letters total; The book fits into a cell nucleus the size of a pinpoint; At least one copy of the book (all 23 chapters) is contained in most cells of our body. 28 March 2013 OSU CSE 3

Quoted from Wikipedia: This is what we care about for An analogy to the human genome stored on the next project... DNA is that of instructions stored in a book: The book (genome) contains 23 chapters (chromosomes); Each chapter contains 48 to 250 million letters (A,C,G,T) without spaces; Hence, the book contains over 3.2 billion letters total; The book fits into a cell nucleus the size of a pinpoint; At least one copy of the book (all 23 chapters) is contained in most cells of our body. 28 March 2013 OSU CSE 4

Genome Sequencing The Human Genome Project was designed to determine the entire sequence of human DNA and to map its mathematical model (genotype) to physical and functional manifestations in a person (phenotype) Sequencing is done piece-by-piece because it is effectively impossible to do anything directly with 3.2 billion nucleotides length = 3.2 billion 28 March 2013 OSU CSE 5

Genome Sequencing: Step 1 Use enzymes that can cut up many strands of the same DNA (each a string of length about 3.2 billion letters or bases ) into pieces at different locations, creating a soup of fragments each of much smaller length (on the order of 1000) 28 March 2013 OSU CSE 6

Genome Sequencing: Step 2 Use machines that can physically sequence each of these fragments to determine their mathematical models Example: "TCTAAGCCTA..." 28 March 2013 OSU CSE 7

Genome Sequencing: Step 2 Use machines that can physically sequence each of these fragments to determine their mathematical models Example: "AGTAGAACG..." 28 March 2013 OSU CSE 8

Genome Sequencing: Step 2 Use machines that can physically sequence each of these fragments to determine their mathematical models 28 March 2013 OSU CSE 9

Genome Sequencing: Step 3 Use computer algorithms to reassemble the original very long string model from the models of its fragments, by combining fragments based on their overlaps 28 March 2013 OSU CSE 10

Genome Sequencing: Step 3 Use computer algorithms to reassemble the original very long string model from the models of its fragments, by combining fragments based on their overlaps How would you do it at all, never mind doing it efficiently? 28 March 2013 OSU CSE 11

Greedy Reassembly: Step 1 A naïve (but still interesting) idea is to pick two fragments with the most overlap and to combine them into a longer fragment 28 March 2013 OSU CSE 12

Greedy Reassembly: Step 1 A naïve (but still interesting) idea is to pick two fragments with the most overlap and to combine them into a longer fragment 28 March 2013 OSU CSE 13

Greedy Reassembly: Step 1 A naïve (but still interesting) idea is to pick two fragments with the most overlap and to combine them into a longer fragment 28 March 2013 OSU CSE 14

Finding Overlaps Given two strings, what is the longest string that is a prefix of one and a suffix of the other? Example of one pair of strings: s1 = "AGTAGAACG" s2 = "CGAGGTAGT" 28 March 2013 OSU CSE 15

Finding Overlaps Given two strings, what is the longest string that is a prefix of one and a suffix of the other? Example of one pair of strings: s1 = "AGTAGAACG" s2 = "CGAGGTAGT" 28 March 2013 OSU CSE 16

Finding Overlaps Given two strings, what is the longest string that is a prefix of one and a suffix of the other? Example of one pair of strings: s1 = "AGTAGAACG" s2 = "CGAGGTAGT" 28 March 2013 OSU CSE 17

Finding Overlaps Given two strings, what is the longest string that is a prefix of one and a suffix of the other? Example of one pair of strings: s1 = "AGTAGAACG" s2 = "CGAGGTAGT" The longest string that is a prefix of one and a suffix of the other is "AGT". 28 March 2013 OSU CSE 18

Combine If these two strings have the most overlap of any pair in the soup, then we remove these two strings from the soup : "AGTAGAACG" "CGAGGTAGT" and replace them by this one: "CGAGGTAGTAGAACG" 28 March 2013 OSU CSE 19

Combine If these two strings have The the idea most is that overlap both the of any pair in the soup, shorter then strings we could remove have been fragments of this these two strings from the soup : "AGTAGAACG" "CGAGGTAGT" and replace them by this one: "CGAGGTAGTAGAACG" longer string. 28 March 2013 OSU CSE 20

Combine If these two strings have the most overlap of any pair in the soup, then we remove these two strings from the soup : "AGTAGAACG" "CGAGGTAGT" Notice that math model of the soup is a finite set of string of character, so in a Java program it can be of type Set<String>. and replace them by this one: "CGAGGTAGTAGAACG" 28 March 2013 OSU CSE 21

Greedy Reassembly: Step 2 Continue the process until there is only one fragment in the soup (declare success) 28 March 2013 OSU CSE 22

Greedy Reassembly: Step 2 Continue the process until there is only one fragment in the soup (declare success), or until no two fragments overlap at all (too bad) 28 March 2013 OSU CSE 23

Success? Even if there is only one fragment left, it might not be the original long string that was chopped up but it s a good guess! And after all, we are just guessing; critical information is lost when the long strand is chopped up into fragments, but we can reassemble it from fragments with high probability if enough copies of the original string are chopped up into fragments 28 March 2013 OSU CSE 24

Project The project is to do greedy reassembly, not for a genome of length 3.2 billion, but rather for a reasonably short piece of text (e.g., the Gettysburg Address), many copies of which have been chopped up into random fragments for you to reassemble 28 March 2013 OSU CSE 25

Resources Wikipedia: Genome http://en.wikipedia.org/wiki/genome Wikipedia: Human Genome Project http://en.wikipedia.org/wiki/human_genome_project Wikipedia: Whole Genome Sequencing http://en.wikipedia.org/wiki/genome_sequencing Wikipedia: Sequence Assembly http://en.wikipedia.org/wiki/sequence_assembly 28 March 2013 OSU CSE 26