Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Similar documents
Lecture Four. Molecular Approaches I: Nucleic Acids

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.

Computational Biology I LSM5191

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

B. Incorrect! Ligation is also a necessary step for cloning.

Restriction Enzymes (endonucleases)

Multiple choice questions (numbers in brackets indicate the number of correct answers)

7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau

Molecular Genetics Techniques. BIT 220 Chapter 20

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Molecular Genetics II - Genetic Engineering Course (Supplementary notes)

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

Amplified segment of DNA can be purified from bacteria in sufficient quantity and quality for :

Genome Sequence Assembly

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Gene Cloning and DNA Analysis: An introduction

Proposed Models of DNA Replication. Conservative Model. Semi-Conservative Model. Dispersive model

Appendix A DNA and PCR in detail DNA: A Detailed Look

Recombinant DNA recombinant DNA DNA cloning gene cloning

_ DNA absorbs light at 260 wave length and it s a UV range so we cant see DNA, we can see DNA only by staining it.

Genetics and Biotechnology. Section 1. Applied Genetics

Computational Biology 2. Pawan Dhar BII

Overview: The DNA Toolbox

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

M Keramatipour 2. M Keramatipour 1. M Keramatipour 4. M Keramatipour 3. M Keramatipour 5. M Keramatipour

The Biotechnology Toolbox

NB536: Bioinformatics

Genetic Engineering & Recombinant DNA

XXII DNA cloning and sequencing. Outline

Motivation From Protein to Gene

Molecular Biology: DNA sequencing

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

Molecular Biology (2)

Chapter 15 Gene Technologies and Human Applications

Biotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems.

1. A brief overview of sequencing biochemistry

DNA Technology. Asilomar Singer, Zinder, Brenner, Berg

Lecture 2: High-Throughput Biology

Covalently bonded sugar-phosphate backbone with relatively strong bonds keeps the nucleotides in the backbone connected in the correct sequence.

Bootcamp: Molecular Biology Techniques and Interpretation

Chapter 15 Recombinant DNA and Genetic Engineering. Restriction Enzymes Function as Nature s Pinking Shears

Recitation CHAPTER 9 DNA Technologies

A Lot of Cutting and Pasting Going on Here Recombinant DNA and Biotechnology

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Recombinant DNA Technology

Chapter 6 - Molecular Genetic Techniques

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

We begin with a high-level overview of sequencing. There are three stages in this process.

BIOTECHNOLOGY. Sticky & blunt ends. Restriction endonucleases. Gene cloning an overview. DNA isolation & restriction

Manipulation of Purified DNA

STRUCTURE AND DIAGNOSTIC APPLICATIONS OF DNA

Manipulating DNA. Nucleic acids are chemically different from other macromolecules such as proteins and carbohydrates.

2054, Chap. 14, page 1

Chapter 20 DNA Technology & Genomics. If we can, should we?

Molecular Cloning. Genomic DNA Library: Contains DNA fragments that represent an entire genome. cdna Library:

Genetics Lecture 21 Recombinant DNA

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA

Selected Techniques Part I

2 Gene Technologies in Our Lives

BIOTECHNOLOGY : PRINCIPLES AND PROCESSES

NCERT. 2. An enzyme catalysing the removal of nucleotides from the ends of DNA is: a. endonuclease b. exonuclease c. DNA ligase d.

Biotechnology Chapter 20

Introduction to Molecular Biology

Winter Quarter Midterm Exam

Moayyad Al-shafei. Mohammad Tarabeih. Dr Ma'mon Ahram. 1 P a g e

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

DNA REPLICATION & BIOTECHNOLOGY Biology Study Review

Polymerase Chain Reaction PCR

Chapter 9 Genetic Engineering

Design. Construction. Characterization

Lecture 22: Molecular techniques DNA cloning and DNA libraries


2014 Pearson Education, Inc. CH 8: Recombinant DNA Technology

CHAPTER 9 DNA Technologies

Deoxyribonucleic Acid DNA

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

3 Designing Primers for Site-Directed Mutagenesis

Factors affecting PCR

Review of whole genome methods

Lecture 3 (FW) January 28, 2009 Cloning of DNA; PCR amplification Reading assignment: Cloning, ; ; 330 PCR, ; 329.

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

CH 8: Recombinant DNA Technology

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Chapter 10 Analytical Biotechnology and the Human Genome

The structure, type and functions of a cell are all determined by chromosomes:

4. Analysing genes II Isolate mutants*

Thebiotutor.com A2 Biology OCR Unit F215: Control, genomes and environment Module 2.3 Genomes and gene technologies Answers

Chapter 7. DNA Microarrays

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Biotechnology: DNA Technology & Genomics

BIO 311C Spring Lecture 34 Friday 23 Apr.

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi

DNA: The Primary Source of Heritable Information. Genetic information is transmitted from one generation to the next through DNA or RNA

Biotechnology DNA technology

Transcription:

Bioinformatics Support of Genome Sequencing Projects Seminar in biology

Introduction The Big Picture Biology reminder Enzyme for DNA manipulation DNA cloning DNA mapping Sequencing genomes Alignment of large-scale DNA toc

Introduction Genome sequencing is figuring out the order of DNA nucleotides. Sequencing the genome is an important step towards understanding it. The whole genome can't be sequenced all at once because available methods of DNA sequencing can only handle short stretches of DNA at a time.

The big picture Make multiples copies of an existing DNA strain. Cut the resulting copies in multiples short overlapping fragments. Sequences it using automatic sequencers. link together the sequenced fragments in the correct order to give the master sequences of the chromosomes that make up the genome. The shotgun approach

Biology reminder THE STEPS OF THE LADDER ARE MADE OF PAIRS OF NITROGEN BASES: ADENOSINE GUANOSINE CYTIDINE THYMIDINE = A = G = C = T

DNA Polymerase Enzymes for DNA manipulation DNA polymerase are enzymes that synthesize new polynucleotides complementary to an existing DNA or RNA template

Nuclease Nuclease degrade DNA molecules by breaking the phosphodiester bonds that link one nucleotide to the next

Ligase Ligase : which join DNA molecules together by synthesizing phosphodiester bonds between nucleotides at the ends of two different molecules, or at the two ends of a single molecule

End-modification enzymes Terminal deoxynucleotidyl transferase, obtained from calf thymus tissue, is one example of an end-modification enzyme. Allow the attachment of radioactive, fluorescent or other types of marker to DNA molecules.

DNA cloning Vector based on plasmid Vector based on virus Polymerase Chain Reaction aka PCR

Plasmid based Build a recombinant plasmid. Use of the natural processes of transformation, enhanced significantly by suspending the cells in calcium chloride before adding the DNA, and then briefly incubating the mixture at 42. Can handle DNA fragments up to 10 kb in size.

Plasmid based

Virus based With a λ phage up to 18 kb of new DNA can be cloned. A phage virus

The Polymerase Chain Reaction PCR is a test-tube reaction and does not involve the use of living cells => way faster. Need a primer to start the synthesis reaction. Reaction is done in the 5 -> 3 direction. With the primer we can choose the starting point of the synthesis.

The Polymerase Chain Reaction heat the mixture to 94 C so hydrogen bonds that hold together the two polynucleotides of the double helix are broken. The temperature is then reduced to 50 60 C to allow the primers to attach to their annealing positions. The temperature is raised to 72 C, the optimum for Taq polymerase. Fragments up to 40 kb can be amplified.

DNA mapping Genetic and Physical Maps Genetic mapping is based on the use of genetic techniques to construct maps showing the positions of genes and other sequence features on a genome. Physical mapping uses molecular biology techniques to examine DNA molecules directly in order to construct maps showing the positions of sequence features, including genes.

DNA mapping (A) The DNA molecule contains a tandemly repeated element made up of many copies of the sequence GATTA. (B) the DNA molecule contains two copies of a genome-wide repeat. When the sequences are examined, two fragments appear to overlap, but one fragment contains the left-hand part of one repeat and the other fragment has the right-hand part of the second repeat.

Genetic Mapping Genes were the first markers to be used. Single nucleotide polymorphisms (SNPs)

Genetic Mapping The first genetic maps, constructed in the early decades of the 20th century for organisms such as the fruit fly. By 1922 over 50 genes had been mapped onto the four fruit-fly chromosomes, but nine of these were for eye color. Problem : microbes, such as bacteria and yeast, have very few visual characteristics. The standard blood groups such as the ABO and immunological proteins such as the human leukocyte antigens (HLA system) were added. Problem again : in most eukaryotic genomes the genes are widely spaced out with large gaps between them.

Single nucleotide polymorphisms (SNPs) These are positions in a genome where some individuals have one nucleotide (e.g. a G) and others have a different nucleotide. In the human genome there are at least 1.42 million SNPs. A single nucleotide polymorphism (SNP). SNP detection is rapid and powerful because it is based on oligonucleotide hybridization analysis.

Oligonucleotide hybridization

Oligonucleotide hybridization The oligonucleotide probe has two end-labels. One of these is a fluorescent dye and the other is a quenching compound. The two ends of the oligonucleotide base-pair to one another, so the fluorescent signal is quenched. When the probe hybridizes to its target DNA, the ends of the molecule become separated, enabling the fluorescent dye to emit its signal.

Physical Mapping A map generated by genetic techniques is rarely sufficient for directing the sequencing phase of a genome project because genetic maps have limited accuracy and a low resolution. These two limitations of genetic mapping mean that for most eukaryotes a genetic map must be checked and supplemented by alternative mapping procedures before large-scale DNA sequencing begins. Let s introduce a physical mapping technique : Restriction mapping.

Restriction Mapping The objective is to map the EcoRI (E) and BamHI (B) sites in a linear DNA molecule of 4.9 kb. We do single and double restrictions on this DNA molecule. The unresolved issue being the position of one of the three BamHI sites By giving suboptimal condition for a single restriction experiment we can solve this issue.

The shotgun approach Does the shotgun approach works with big genome? 1995 : the sequence of the 1830 kb genome of the bacterium Haemophilus influenzae was published. Sequence assembly required 30 hours on a computer with 512 Mb of RAM, and resulted in 140 lengthy contiguous sequences.

MUMmer MUMmer: fast alignment of large-scale DNA http://www.tigr.org/software/mummer/ MUMmer is able to align rapidly sequences containing millions of nucleotides using suffix tree data structure. Human chromosome #1: 200 million base pairs MUMmer highlights the exact differences between two genomes. MUMmer locates SNP, repeats, tandem repeats, DNA insertion and MUM (maximum unique match) 1999 : standard for sequence alignement was dynamic programming or hashing technique (BLAST, FASTA) dynamic programming : space requirement O(n), time requirement O(n 2 ).

MUM Maximal Unique Sequences Sequences in genomes A and B that : occur exactly once in A and in B are not contained in any larger such sequence Genome A: tcgatctaagatcga ACCATCAcgact Genome B: gcattataagatcga ACCATCAtccag A B

Suffix trie Suffix trie for the string «BANANAS» BANANAS ANANAS NANAS ANAS NAS AS S A trie is a type of tree that has N possible branches from each node, where N is the number of characters in the alphabet. The reason you don't hear much about the use of suffix tries is the simple fact that constructing one requires O(N 2 ) time and space.

Suffix tree construction Building trees: O(N 2 ) algorithm Initialize One edge for the entire string S[1..N] For i = 2 to N Add suffix S[i..N] to suffix tree Find match point for string S[i..N] in current tree If in middle of edge, create new node w Add remainder of S[i..N] as edge label to suffix i leaf Running Time O(N-i) time to add suffix S[i..N]

Suffix tree From suffix trie to suffix tree for the string «BANANAS» using path compression. In the worst case, a suffix tree can be built with a maximum of 2N nodes, where N is the length of the input text O(N)

Suffix tree in O(N) Building Suffix Trees in O(N) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen developed a simpler to understand variant in 1995

Ukkonen's algorithm For a given string of text, T. Ukkonen's algorithm starts with an empty tree, then progressively adds each of the N prefixes of T to the suffix tree. For example, when creating the suffix tree for BANANAS, B is inserted into the tree, then BA, then BAN, and so on. When BANANAS is finally inserted, the tree is complete Building the Suffix Tree for BANANAS

Rather than storing strings, store a pair of indices (x,y) where x is beginning of string and y is the end. Further optimisation

Search for MUM s Given strings ababaabs and aabaat: s,8 a b t,6 a t,5 ba b s,7 a b t,4 a s,6 a baabs,2 s,5 a baabs,1 bs,4 t,3 aat,1 bs,3 t,2 Unique matches: internal nodes with two leaf nodes from different genomes. Bottom-up traversal of the tree List of UMs: aab,abaa,baa.

MUMmer history 1999 : MUMmer version 1 2001 : MUMmer version 2 2004 : MUMmer version 3 (open source) MUMmer 1 use 38 bytes per base-pair. Version 2 uses only 20. Version 3 slightly less. MUMmer 2 is three times faster than MUMmer 1 and use 60 % less memory. MUMmer 3 is written in C++. Suffix tree library are in C (Prof. Dr Stefan Kurtz). MUMmer 3 size is about 600 KBytes in tar.gz format. MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer.

References Two books Prof. Dr Stefan Kurtz publications about suffix tree at uni-hamburg.de MUMmer home page : http://www.tigr.org/software/mummer/ Google. Keywords : suffix tree

Thank you for your attention