Bioinformatics for Genomics
|
|
- Lewis Manning
- 6 years ago
- Views:
Transcription
1 Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.
2 When I was young my Father used to tell me that the two most worthwhile pursuits in life were the pursuit of truth and of beauty and I believe that Alfred Nobel must have felt much the same when he gave these prizes for literature and the sciences.
3 Sometime in the future, I am a hundred percent certain scientists will sit down at a computer terminal, design what they want the organism to do, and build it.
4 The steps of the sequencing of the human genome Watson and Crick propose the double helix model for DNA Sanger proposes the sequencing method with terminators Dulbecco forbode in Science the sequencing of the human genome Watson becomes director of the project at the NIH Craig Venter (then NIH) publishes the first EST sequencing Watson leaves the direction of the genome project at NIH, enter Collins Venter leaves NIH and founds TIGR TIGR sequences from the first bacterial genome, with the shotgun method Venter founds Celera Genomics and PE; He announces the genome for 2001 The race for the human genome starts (Venter VS Collins) 26 June Bill Clinton announces the completion of the human genome sequencing (actually two 'proofs', published in early 2001)
5 What is a genome?
6 What is a genome? The entirety of the inheritable information of an organism
7 Being able to sequence the genome allows us to 'read' all this information of an organism (and hopefully understand it)
8 Ok, but how long is a genome? And how many genes does it contain?
9 Ok, but how long is a genome? Human genome: 3,12 billion bases, 3120 Megabases (Mb) Drosophila: 175 Megabases Small eukaryote (Saccharomyces cerevisiae): 12 Mb Big bacterial genome (Pseudomonas aeuriginosa): 6,4 Mb Small bacterial genome (Haemophilus inflenzae): 1,8 Mb And how many genes does it contain? Human genome: 3120 Mb, genes Pseudomonas aeuriginosa: 6,4 Mb, 5570 genes Haemophilus inflenzae: 1,8 Mb, 1727 genes
10 So how do we sequence an entire genome?
11 The strategy of the public consortium Hierarchical sequencing Cut a chromosome into pieces with restriction enzymes, clone one piece in a vector, subclone into smaller vectors, and into smaller vectors YACs (yeast artificial chromosome), inserts from Mb BACs (bacterial artificial chromosomes), inserts of Mb Cosmids, inserts under 50 Kb Plasmid vectors, few kb Merging the fragments of the genome cloned in the different vectors
12 Here comes Venter, with his rebellion against NIH Just 1% of the human genome translated into proteins Why not start to sequence just mrna? Venter generates cdna from RNA (reverse transcription) cdna cloning random sequencing of clones production of EST EST = Expressed Sequence Tags 1991: Venter publishes 337 ESTs corresponding to genes expressed in the brain The publication in Science was preceded by the presentation of a patent application for the sequenced ESTs Patent rejected, he leaves NIH to give birth to the TIGR, financed by Healthcare Investment, with the goal to compete with the public consortium to sequence the human genome
13 1993 A novel idea for the first genome End of 1993 Hamilton Smith (Nobel in 1978 for the description of the first restriction enzyme) enters the Scientific Committee of the TIGR Smith proposes the idea of the shotgun sequencing Chosen organism: Haemophilus influenzae
14 Why shotgun? NOT A NOVEL TECHNOLOGY IT IS A NOVEL EXPERIMENTAL METHOD It was invented to work with Sanger technology... but the principle is still used with nextgen
15 1. Random fragmentation of many copies of genomic DNA Sonication, hydroshearing...
16 2. Cloning of the genomic fragments in a plasmid vector
17 3. Sequencing of the inserts based on PCR primers located on the plasmid (known sequence) Each sequence we obtain will be called a sequencing READ
18 3. Sequencing of the inserts based on PCR primers located on the plasmid (known sequence) So, if we are sequencing a 1MB genome with 1000nt reads We just need 1000 reads, right?
19 NO The obtained sequences will cover the genome, randomly, with different 'depth', or 'coverage' Genome coverage: The average depth of sequencing coverage is theoretically LN/G L is the read length N is the number of reads G is the genome length LN= total number of bases generated
20 Genome coverage: The average depth of sequencing coverage is theoretically LN/G L is the read length N is the number of reads G is the genome length LN= total number of bases generated EXAMPLE: 1 Megabase genome 100 nt reads 200,000 reads The coverage is 200,000x100/1,000,000=20
21 Genome coverage: The average depth of sequencing coverage is theoretically LN/G HOWEVER: genome coverage of 1 does not mean that each base in my genome has been sequenced, because the sequencing is random P=1-e -m The probability that a base has been sequenced (P) is equal to 1 minus e (Euler's number = ) elevated to -m, where m is the coverage
22 Genome coverage A coverage of 5x leads to sequencing of 99.33% of the bases IN THEORY Possible problems?
23 Genome coverage Uneven sequencing Can be due to Technical biases Difficult stretches So in the times of Sanger sequencing 10x was considered good Now with NextGen you want to go 30x, better if 50x
24 4. use bioinformatic techniques to ASSEMBLE the genome Most bases will be covered on both strands, a few gaps will be present
25 Example: I want to sequence the following genome This lecture is very very interesting
26 I fragment (multiple copies randomly), clone and sequence This l is lect ure is e is v ry ver teres ecture s very v ry inte esting But I do not know the order of the fragments
27 Thanks to the partial overlapping I can reconstruct the sequence This l is lect ectur ure is e is v s very v ry ver ry inte teres esting
28 Advantages of shotgun sequencing -No Need for genomic mapping prior to sequencing -High level of automation -speed up the sequencing phase -reduce costs
29 Issues of shot-gun sequencing - Highly purified genomic DNA - Bionformatic support - Difficulty in presence of complex genomic regions, such as repeated sequences in the genome
30 Let's try to sequence the sentence below It's a fair bet that if it's fair tomorrow, then my fair wife and I will head to the Spring Fair, held in a fair sized park, in this fair city to win a prize, if everyone plays fair
31 It's a fair bet that if it's fair tomorrow, then my fair wife and I will head to the Spring Fair, held in a fair sized park, in this fair city to win a prize, if everyone plays fair We will have a number of sequences that will read 'fair' and we would not know how to assemble them
32 So repeats, and other issues, can lead to gaps in the assembly
33 Our assembly will not consist of 1 or more chromosome, but multiple contigs CONTIG: a contiguous sequence generated by the overlap of sequencing reads
34 5. Finishing This is the step in which we try to close the gaps
35 5. Finishing This is the step in which we try to close the gaps Molecular biology methods can help: Cloning in vectors that can receive long inserts PCR Inverse PCR Bioinformatics can also help, with a number of approaches, but mainly with scaffolding
36 PAIRED END READS These are reads that are generated when both ends of a fragment are sequenced We will thus have 2 reads, that represent 2 fragments of the genome which are distinct, but near
37 PAIRED END READS Many technologies can generate paired end reads (Sanger, 454, Illumina) We can use the information from pairs to order contigs into a scaffold
38 SCAFFOLD A sequence of a genome that is composed of contigs and gaps Contigs are ordered based on the information from paired ends The unknown bases are indicated with N
39 SHOTGUN SEQUENCING PIPELINE 1. Fragmentation 2. Cloning 3. Random sequencing at high coverage 4. Bioinformatic assembly 5. Finishing Final result: a complete assembly (sometimes)
40 So, does shotgun sequencing work? Sequencing of Haemophilus influenzae (1995) proved that it does Automatization of the preparation steps Improvement of Sanger sequencing technologies Evolution of powerful assembly bioinformatics algorithms Sequencing of D. melanogaster (2000) Sequencing of the human genome ( ) "When we started to sequence the Drosophila, we had already halved the timing and sequencing of a genome assembly such as Haemophilus, which only three years ago represented a whole year of work. Today it would only take eight hours to sequence it and fifteen minutes to assemble it. " Craig Venter 5 November 2003
41 And the human genome In 1998, the public project, which has already costed 1.9 billion dollars, is going slowly... In January 1998, Mike Hunkapiller (PE) Shows Venter the PRISM 3700 (96 capillaries) Celera Genomics is founded (PE funding, direction by Venter): 300 PRISM 3700, 100 robots, and computers for 80 million dollars Celera announcees the sequencing of the human genome in two years, with a shotgun approach to cost million dollars The race continues, with strong ethical implications 26 June Bill Clinton announces the completion of the human genome sequencing (actually two 'proofs', published in early 2001).
42
43 Shotgun Sequencing and Next-gen Sequencing Next-gen technologies perform 1 random fragmentation 2 amplification of fragments 3 massive sequencing It s a built-in shotgun Based on the technology used, I may have other issues 454 errors due to homopolymers Illumina assembly issues due to short reads
44 Shotgun Sequencing and Third-gen Sequencing Third-gen technologies are perfect for de novo genome assembly Can generate good amounts of reads Long reads, that make assembly easier and can solve repeats ISSUES ARE: Cost expecially important for big eukaryotic genomes Errors can be solved using a mix approach Exampe: Illumina for quality and depth, PacBio for length
45 And now the bioinformatics! The assembly starts when we generate the output of the sequencing machines This output can differ, but most of the time it will be a big FASTQ file maybe you know what a FASTA file is?
46 a FASTA file >PoolPRRSnew_1/1 CCCCGGGTCAAGGGCTGTTGTTTTATTGTTCACCTTCATTATGACAGTGCGAGGTGGTTT ATACGGGATTTGCTGCAACAC >PoolPRRSnew_2/1 CTTTCATGTGGTACGCAAGGGCGGCCAGGATCCGGTCACGATTGGGAACTAACTGCCG CCCTGCTTCAATTTTGCAGCCGA Starts with a > Followed by the name of the sequence On the second line we have the nucleotides
47 The FASTQ FORMAT Specific file format for DNA CCCCGGGTCAAGGGCTGTTGTTTTATTGTTCACCTTCATTATGACAGTGCGAGGTGGT + CTTTCATGTGGTACGCAAGGGCGGCCAGGATCCGGTCACGATTGGGAACTAACTGCC + ABBBBFFFFFBCFGGGGGGGGGGGGGGHHAHHHFGGGGHGEGHFHGHHGHHHHHHH It is a plain text file that contains all the information regarding the reads and their quality
48 THE FASTQ FORMAT Each sequencing read is described in 4 line FIRST followed by the unique name of the sequence SECOND ROW: nucleotide sequence THIRD ROW: + FOURTH ROW: the quality of each base, in ASCII CCCCGGGTCAAGGGCTGTTGTTTTATTGTTCACCTTCATTATGACAGTGCGAGGTGGTTTATACGG + 1AAA?1D1>CDF1AAAEABFGGHHGDGFAGC2EEAG11FBF2GHFHHHBBEGCEFFE/AEEEDGGE?E?
49 A BASE QUALITY Every sequence can contain errors Base quality: given a probability of error P, the quality is: Q = -10 log10p So the probability of descresce error in logarithmic scale with increasing quality Bases with a quality <20 are usually discarded: QUALITY CHECK
50 THE ASSEMBLY The ASSEMBLER is a software that uses algorithms to combine the short sequences obtained in the process of sequencing in longer fragments (contigs) The goal is to find the shortest string (the genome) that includes all of the input strings (the reads) As the number of input strings grows, the difficulty of the problem grows exponentially (NP complete problem) We must use approximation algorithms
51 ASSEMBLY ALGORITHMS An example of algorithm 'greedy': 1. pairwise comparison of the sequences 2. Choose the pair that aligns best 3. repeat steps 1 and 2 Greedy algorithms have issues with repeats
52 ASSEMBLY ALGORITHMS Other methods are more refined and based on graphs Overlap-layout-consensus EULER assembler
53 ASSEMBLY ALGORITHMS The difficulties that an assembler has to face are: amount of data (gigabytes of sequences) presence of repeated sequences presence of sequencing errors Tens of Assemblers have been developed, how to choose Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species - GigaScience 2013, 2:10 doi: / x-2-10
54 Read the literature Consider your specific biological system See if the software handles your reads (or mixed reads) See what others have done Read the manuals Try different assemblers and evaluate your results
55 ASSEMBLY RESULTS Your assembler will output a set of contigs, usually in a fasta file >contig19.1 size257_bb gctcatttataacgttcctcaaagcctaaatgacaggttcaggtgggttgccttgctccc gtcacctacccttcaaaatcaataatttcacccgctcctctatcaactgcaatcaatata aaggtgtactcccacttccattctttggtcttttttttatatatgtgcacagctcctcaa tctctaggacctcgatatccttaagtgactttgtgctgttgcagctcttatctaactttt cttgcacatagccgccc >contig20.1 size182_bb ccaaaaagtataaacttttcaacgcctttttgaaacaatttccaagcttataaaacttga taaggaaccggtcttccagccgcatcacgcactaaatcaccgttatcaaatttagctgct acataattgccatgcccaatttttccgcccttatacattacaggtttcacttcttttgca cc How good is my assembly? What parameters should we check?
56 ASSEMBLY RESULTS Your assembler will output a set of contigs, usually in a fasta file >contig19.1 size257_bb gctcatttataacgttcctcaaagcctaaatgacaggttcaggtgggttgccttgctccc gtcacctacccttcaaaatcaataatttcacccgctcctctatcaactgcaatcaatata aaggtgtactcccacttccattctttggtcttttttttatatatgtgcacagctcctcaa tctctaggacctcgatatccttaagtgactttgtgctgttgcagctcttatctaactttt cttgcacatagccgccc >contig20.1 size182_bb ccaaaaagtataaacttttcaacgcctttttgaaacaatttccaagcttataaaacttga taaggaaccggtcttccagccgcatcacgcactaaatcaccgttatcaaatttagctgct acataattgccatgcccaatttttccgcccttatacattacaggtttcacttcttttgca cc To evaluate the assembly we use multiple parameters: the number of contigs the genome size N50: the length of the shortest contig that added to those of higher dimensions comprises at least 50% of the genome
57 N50: the length of the shortest contig that added to those of higher dimensions comprises at least 50% of the genome
58 EXAMPLE OF ASSEMBLY RESULTS Sequencing of 16 genomes of Klebsiella pneumoniae
59 ASSEMBLY RESULTS Genome size: GOOD, always as expected N50: not bad, always above nt Contigs number: acceptable, maybe not for KPCVA3 However, since we know our biological system, we know these genomes contain plasmids, which make the assembly quality lower
60 ASSEMBLY RESULTS What is GOOD depends on what we aim for If we are sequencing a novel genome, we should try to CLOSE it If we want to compare similar isolates we can accept the above results
Genome Projects. Part III. Assembly and sequencing of human genomes
Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences
More informationBENG 183 Trey Ideker. Genome Assembly and Physical Mapping
BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More information10/20/2009 Comp 590/Comp Fall
Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments
More informationWe begin with a high-level overview of sequencing. There are three stages in this process.
Lecture 11 Sequence Assembly February 10, 1998 Lecturer: Phil Green Notes: Kavita Garg 11.1. Introduction This is the first of two lectures by Phil Green on Sequence Assembly. Yeast and some of the bacterial
More informationGENES & GENOME DATABASES
GENES & GENOME DATABASES BME 110/BIOL 181 Computational Biology Tools Prof. Todd Lowe April 5, 2012 ADMIN Discuss Fun Quiz Readings: Dummies Chapters 1, 2 (pp. 29-56), Ch 3; NYTimes piece on Jim Kent Assigned
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationIntroduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph
Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent
More informationLecture 14: DNA Sequencing
Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing
More informationReading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction
Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain
More informationAlignment and Assembly
Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which
More informationIntroduction to Molecular Biology
Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve
More informationChapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering
Chapter 8 Recombinant DNA and Genetic Engineering Genetic manipulation Ways this technology touches us Criminal justice The Justice Project, started by law students to advocate for DNA testing of Death
More informationGenomics AGRY Michael Gribskov Hock 331
Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will
More informationCSCI2950-C DNA Sequencing and Fragment Assembly
CSCI2950-C DNA Sequencing and Fragment Assembly Lecture 2: Sept. 7, 2010 http://cs.brown.edu/courses/csci2950-c/ DNA sequencing How we obtain the sequence of nucleotides of a species 5 3 ACGTGACTGAGGACCGTG
More informationDNA Sequencing and Assembly
DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More information3) This diagram represents: (Indicate all correct answers)
Functional Genomics Midterm II (self-questions) 2/4/05 1) One of the obstacles in whole genome assembly is dealing with the repeated portions of DNA within the genome. How do repeats cause complications
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationComputational Biology 2. Pawan Dhar BII
Computational Biology 2 Pawan Dhar BII Lecture 1 Introduction to terms, techniques and concepts in molecular biology Molecular biology - a primer Human body has 100 trillion cells each containing 3 billion
More informationGenetics Lecture 21 Recombinant DNA
Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of
More informationDNA sequencing. Course Info
DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationThe Diploid Genome Sequence of an Individual Human
The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.
More informationConditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015
Conditional Random Fields, DNA Sequencing Armin Pourshafeie February 10, 2015 CRF Continued HMMs represent a distribution for an observed sequence x and a parse P(x, ). However, usually we are interested
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationBiochemistry 412. Overview of Genomics & Proteomics. 20 January 2009
Biochemistry 412 Overview of Genomics & Proteomics 20 January 2009 DNA Sequencing & the Human Genome Project The term genome was coined in 1920 (!) by Professor Hans Winkler, University of Hamburg,
More informationCourse summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.
Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution
More informationNGS developments in tomato genome sequencing
NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More information3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome
Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts
More informationChapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.
Chapter 20 Recombinant DNA Technology Copyright 2009 Pearson Education, Inc. 20.1 Recombinant DNA Technology Began with Two Key Tools: Restriction Enzymes and DNA Cloning Vectors Recombinant DNA refers
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More informationDe novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club
De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation
More information1. A brief overview of sequencing biochemistry
Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry
More informationEach cell of a living organism contains chromosomes
COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping
More informationIntroduction to Bioinformatics. Lecture 20: Sequencing genomes
Introduction to Bioinformatics Lecture 20: Sequencing genomes Nucleic Acid Basics Nucleic Acids Are Polymers Each Monomer Consists of Three Moieties: Nucleotide A Base + A Ribose Sugar + A Phosphate Nucleoside
More informationThe Structure of Proteins and DNA
The Structure of roteins and DNA auling 1951 rick&watson 1953 The History of enome Mapping 1955: Fred Sanger produces first amino-acid sequencing of a protein (insulin) 1956: Tjio, Levan determine the
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationBIOTECHNOLOGY. Biotechnology is the process by which living organisms are used to create new products THE ORGANISMS
BIOTECHNOLOGY Biotechnology is the process by which living organisms are used to create new products THE ORGANISMS Bacteria: are prokaryotic organisms that contain circular DNA and no organelles. They
More informationBioinformatics Support of Genome Sequencing Projects. Seminar in biology
Bioinformatics Support of Genome Sequencing Projects Seminar in biology Introduction The Big Picture Biology reminder Enzyme for DNA manipulation DNA cloning DNA mapping Sequencing genomes Alignment of
More informationCONSTRUCTION OF GENOMIC LIBRARY
MODULE 4-LECTURE 4 CONSTRUCTION OF GENOMIC LIBRARY 4-4.1. Introduction A genomic library is an organism specific collection of DNA covering the entire genome of an organism. It contains all DNA sequences
More informationHuman genome sequence February, 2001
Computational Molecular Biology Symposium March 12 th, 2003 Carnegie Mellon University Organizer: Dannie Durand Sponsored by the Department of Biological Sciences and the Howard Hughes Medical Institute
More informationComputational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall
Computational Genomics Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall 2015-16 1 What s in class this week Motivation Administrata Some very basic biology Some very basic biotechnology Examples of our type
More informationIntroduction to Bioinformatics. Genome sequencing & assembly
Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put
More informationManipulating genes and cells (Kap. 10)
Manipulating genes and cells (Kap. 10) restriction enzymes and agarose gel electrophoresis DNA sequencing nucleic acid hybridization techniques genomic and cdna libraries cloning of DNA PCR and PCR applications
More informationBiol 478/595 Intro to Bioinformatics
Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationBSCI410-Liu/Spring 06 Exam #1 Feb. 23, 06
Your Name: Your UID# 1. (20 points) Match following mutations with corresponding mutagens (X-RAY, Ds transposon excision, UV, EMS, Proflavin) a) Thymidine dimmers b) Breakage of DNA backbone c) Frameshift
More informationMatthew Tinning Australian Genome Research Facility. July 2012
Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909
More informationChapter 20 DNA Technology & Genomics. If we can, should we?
Chapter 20 DNA Technology & Genomics If we can, should we? Biotechnology Genetic manipulation of organisms or their components to make useful products Humans have been doing this for 1,000s of years plant
More informationRestriction Enzymes (endonucleases)
In order to understand and eventually manipulate DNA (human or otherwise) an array of DNA technologies have been developed. Here are some of the tools: Restriction Enzymes (endonucleases) In order to manipulate
More informationMolecular Genetics Techniques. BIT 220 Chapter 20
Molecular Genetics Techniques BIT 220 Chapter 20 What is Cloning? Recombinant DNA technologies 1. Producing Recombinant DNA molecule Incorporate gene of interest into plasmid (cloning vector) 2. Recombinant
More informationGENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.
Student Name: All questions are worth 5 pts. each. GENETICS EXAM 3 FALL 2004 1. a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size. b) Name one of the materials (of the two
More informationChapter 15 The Human Genome Project and Genomics. Chapter 15 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning
Chapter 15 The Human Genome Project and Genomics Genomics Is the study of all genes in a genome Relies on interconnected databases and software to analyze sequenced genomes and to identify genes Impacts
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationConnect-A-Contig Paper version
Teacher Guide Connect-A-Contig Paper version Abstract Students align pieces of paper DNA strips based on the distance between markers to generate a DNA consensus sequence. The activity helps students see
More informationIntegrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013
Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA March 2, 2013 Steven R. Kain, Ph.D. ABRF 2013 NuGEN s Core Technologies Selective Sequence Priming Nucleic Acid Amplification
More informationAP Biology
Advanced Techniques Electrophoresis & RFLPs Gel Electrophoresis Separation of DNA fragments by size DNA is negatively charged moves toward + charge in electrical field agarose gel swimming through Jello
More informationStudying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome
Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful
More informationClass 35: Decoding DNA
Class 35: Decoding DNA Sign up for your PS8 team design review! CS150: Computer Science University of Virginia Computer Science DNA Helix Photomosaic from cover of Nature, 15 Feb 2001 (made by Eric Lander)
More informationLecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD
Lecture 10 : Whole genome sequencing and analysis Introduction to Computational Biology Teresa Przytycka, PhD Sequencing DNA Goal obtain the string of bases that make a given DNA strand. Problem Typically
More informationINTRODUCTION TO REVERSE TRANSCRIPTION PCR (RT-PCR) ABCF 2016 BecA-ILRI Hub, Nairobi 21 st September 2016 Roger Pelle Principal Scientist
INTRODUCTION TO REVERSE TRANSCRIPTION PCR (RT-PCR) ABCF 2016 BecA-ILRI Hub, Nairobi 21 st September 2016 Roger Pelle Principal Scientist Objective of PCR To provide a solution to one of the most pressing
More informationSynthetic Biology. IICA First Seminar on SynBio for Biotechnology Decision Makers March 16-17, Fan-Li Chou. Foreign Agricultural Service
Synthetic Biology IICA First Seminar on SynBio for Biotechnology Decision Makers March 16-17, 2016 Fan-Li Chou U.S. Department of Agriculture Outline What is synthetic biology? Who cares? Why do we care?
More information2017 Amplyus, all rights reserved
The Human Genome Project What it is: The initiative that sequenced the entire human genome The Human Genome Project (HGP) is widely recognized as a tremendous success of government initiative and international
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationDNA and genome sequencing. Matthew Hudson Dept of Crop Sciences University of Illinois
DNA and genome sequencing Matthew Hudson Dept of Crop Sciences University of Illinois Genome projects 2,424 ongoing genome projects 696 for eukaryotes 520 completed genomes 47 from eukaryotes Almost every
More informationHuman genome sequence
NGS: the basics Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion
More informationCHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning
Section A: DNA Cloning 1. DNA technology makes it possible to clone genes for basic research and commercial applications: an overview 2. Restriction enzymes are used to make recombinant DNA 3. Genes can
More informationBiotechnology: DNA Technology & Genomics
Chapter 20. Biotechnology: DNA Technology & Genomics 2003-2004 1 The BIG Questions! How can we use our knowledge of DNA to: " diagnose disease or defect? " cure disease or defect? " change/improve organisms?!
More informationChapter 5. Structural Genomics
Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic
More informationChapter 10 Genetic Engineering: A Revolution in Molecular Biology
Chapter 10 Genetic Engineering: A Revolution in Molecular Biology Genetic Engineering Direct, deliberate modification of an organism s genome bioengineering Biotechnology use of an organism s biochemical
More informationNUCLEIC ACIDS. DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides.
NUCLEIC ACIDS DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides. Base Adenine Guanine Cytosine Uracil Thymine Abbreviation A G C U T DNA RNA 2
More informationChapter 6 - Molecular Genetic Techniques
Chapter 6 - Molecular Genetic Techniques Two objects of molecular & genetic technologies For analysis For generation Molecular genetic technologies! For analysis DNA gel electrophoresis Southern blotting
More information7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I
7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I For the last several lectures we have been looking at how one can manipulate prokaryotic genomes and how prokaryotic genes are regulated. In the next
More informationGene Cloning and DNA Analysis: An introduction
Gene Cloning and DNA Analysis: An introduction T. A. Brown. 6th edition 2010 Published by Blackwell Science Ltd & 140.128.147.174/yclclass/ =>2011 Part I The Basic Principles of Gene Cloning and DNA Analysis
More information7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau
7.1 Techniques for Producing and Analyzing DNA SBI4U Ms. Ho-Lau What is Biotechnology? From Merriam-Webster: the manipulation of living organisms or their components to produce useful usually commercial
More informationBi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8
Bi 8 Lecture 4 DNA approaches: How we know what we know Ellen Rothenberg 14 January 2016 Reading: from Alberts Ch. 8 Central concept: DNA or RNA polymer length as an identifying feature RNA has intrinsically
More informationNuts and bolts of phage genome sequencing. the 5 5 and 5 8 perspective. Allison Johnson & Anneke Padolina
Nuts and bolts of phage genome sequencing the 5 5 and 5 8 perspective Allison Johnson & Anneke Padolina Our role in DNA sequencing Rachel, do you want to get on the bus and take your sample to the sequencing
More informationTranscriptome Assembly, Functional Annotation (and a few other related thoughts)
Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types
More informationChapter 15 Gene Technologies and Human Applications
Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect
More informationDe novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationGenetics and Genomics in Medicine Chapter 3. Questions & Answers
Genetics and Genomics in Medicine Chapter 3 Multiple Choice Questions Questions & Answers Question 3.1 Which of the following statements, if any, is false? a) Amplifying DNA means making many identical
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationBioinformatics Course AA 2017/2018 Tutorial 2
UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it
More informationTexas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR)
Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR) Progressing with the sequence of experiments, we are now ready to amplify the green
More informationوراثة األحياء الدقيقة Microbial Genetics
وراثة األحياء الدقيقة Microbial Genetics د. تركي محمد الداود مكتب 2 ب 45 مقدمة Introduction Microscopic biology began in 1665. Robert Hooke Robert Hooke (1635-1703) discovered organisms are made up of
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases
More informationGenome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall
Genome Sequencing Technologies Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall Sciences start with Observation Sciences start with Observation and flourish with
More informationSequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements
More informationREVIEWS STRATEGIES FOR THE SYSTEMATIC SEQUENCING OF COMPLEX GENOMES. Eric D. Green
STRATEGIES FOR THE SYSTEMATIC SEQUENCING OF COMPLEX GENOMES Eric D. Green Recent spectacular advances in the technologies and strategies for DNA sequencing have profoundly accelerated the detailed analysis
More informationNext Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park
Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast
More informationLecture 22: Molecular techniques DNA cloning and DNA libraries
Lecture 22: Molecular techniques DNA cloning and DNA libraries DNA cloning: general strategy -> to prepare large quantities of identical DNA Vector + DNA fragment Recombinant DNA (any piece of DNA derived
More informationLander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book
Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book Prof. Tesler Math 186 & 283 Winter 2019 Prof. Tesler 5.1 Shotgun Sequencing Math 186 & 283 / Winter 2019
More information