BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Similar documents
3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Genome Projects. Part III. Assembly and sequencing of human genomes

Molecular Biology: DNA sequencing

GENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.

Biotechnology Chapter 20

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

1. A brief overview of sequencing biochemistry

Genome Sequencing-- Strategies

Introduction to some aspects of molecular genetics

BSCI410-Liu/Spring 06 Exam #1 Feb. 23, 06

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Chapter 5. Structural Genomics

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Chapter 20 DNA Technology & Genomics. If we can, should we?

BENG 183 Trey Ideker. Genotyping. To be covered in one 1.5 hr lecture

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Chapter 20 Recombinant DNA Technology. Copyright 2009 Pearson Education, Inc.

Alignment and Assembly

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Concepts: What are RFLPs and how do they act like genetic marker loci?

Bioinformatics for Genomics

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

Map-Based Cloning of Qualitative Plant Genes

DNA sequencing. Course Info

Chapter 15 Gene Technologies and Human Applications

Molecular Genetics Techniques. BIT 220 Chapter 20

3I03 - Eukaryotic Genetics Repetitive DNA

CSE182-L16. LW statistics/assembly

CISC 889 Bioinformatics (Spring 2004) Lecture 3

Lecture 14: DNA Sequencing

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

The Diploid Genome Sequence of an Individual Human

Lecture Four. Molecular Approaches I: Nucleic Acids

Restriction Enzymes (endonucleases)

We begin with a high-level overview of sequencing. There are three stages in this process.

Genome Sequence Assembly

Restriction Site Mapping:

Chapter 20: Biotechnology

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Chapter 6 - Molecular Genetic Techniques

BIOTECHNOLOGY. Biotechnology is the process by which living organisms are used to create new products THE ORGANISMS

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Genetics and Biotechnology. Section 1. Applied Genetics

BIO 202 Midterm Exam Winter 2007

10/20/2009 Comp 590/Comp Fall

REVIEWS STRATEGIES FOR THE SYSTEMATIC SEQUENCING OF COMPLEX GENOMES. Eric D. Green

GREG GIBSON SPENCER V. MUSE

2. Outline the levels of DNA packing in the eukaryotic nucleus below next to the diagram provided.

DESIGNER GENES - BIOTECHNOLOGY

Biotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems.

User Manual. Catalog No.: DWSK-V101 (10 rxns), DWSK-V102 (25 rxns) For Research Use Only

CONSTRUCTION OF GENOMIC LIBRARY

Fatchiyah

2054, Chap. 14, page 1

Genomes summary. Bacterial genome sizes

AP Biology

7 Gene Isolation and Analysis of Multiple

Results WCP (Whole chromosome paint) FISH

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Course Syllabus for FISH/CMBL 7660 Fall 2008

BIOTECHNOLOGY. Sticky & blunt ends. Restriction endonucleases. Gene cloning an overview. DNA isolation & restriction

Chapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering

Chapter 9 Genetic Engineering

Biol 478/595 Intro to Bioinformatics

Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome

Chapter 20 Biotechnology

Computational Biology I LSM5191

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Multiple choice questions (numbers in brackets indicate the number of correct answers)

CHAPTERS 16 & 17: DNA Technology

2014 Pearson Education, Inc. CH 8: Recombinant DNA Technology

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

DNA Sequencing and Assembly

Unit 8: Genomics Guided Reading Questions (150 pts total)

Genome Analysis. Bacterial genome projects

Research techniques in genetics. Medical genetics, 2017.

CH 8: Recombinant DNA Technology

Biotechnology and Genomics in Public Health. Sharon S. Krag, PhD Johns Hopkins University

Computational gene finding

The international effort to sequence the 17Gb wheat genome: Yes, Wheat can!

Molecular Biology (2)

7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I

Recombinant DNA Technology. The Role of Recombinant DNA Technology in Biotechnology. yeast. Biotechnology. Recombinant DNA technology.

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.

Genetics Lecture 21 Recombinant DNA

BSCI410-Liu/Spring 09/Feb 26 Exam #1 Your name:

Human genome sequence February, 2001

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Bootcamp: Molecular Biology Techniques and Interpretation

Introduction to Molecular Biology

Genomic resources. for non-model systems

Learning Objectives :

Recombinant DNA Technology

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Transcription:

BENG 183 Trey Ideker Genome Assembly and Physical Mapping

Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms (SNPs) or other mutations Gene sequencing Or associated upstream and downstream control regions (e.g., promoters, enhancers, intron splice sites) cdna sequencing and Expressed Sequence Tags (ESTs) What sequencing methods (e.g. Sanger, pyro, SBH) are best suited for each of these scenarios?

Complete genome sequencing: Why bother? ( For instance, why not just sequence expressed genes as cdnas? Expressed genes constitute <5% of the entire human genome! ) HOWEVER: Control elements not sequenced Many genes expressed at low levels Some genes difficult to recognize The remaining 95% of junk DNA may have some as yet unknown but important function

Sequencing DNA fragments > 1 kb A typical run produces a maximum of 600-800 bp of good DNA sequence. To sequence larger fragments: Nested deletions Primer walking Subcloning and physical mapping Shotgun cloning and assembly

Primer walking

Nested deletions (1) Exonuclease mediated (2) Transposon mediated

Strategies for Long-range and The problem: Genome Sequencing Sequencing reads are limited to 500 to 1000 bps. This is only partly handled by nested deletions and primer walking The shotgun solution: By oversampling, many reads can be assembled into a single target sequence. There are two competing strategies for this: 1) Clone-by-clone hierarchical approach 2) Whole-genome shotgun sequencing These two approaches sparked a huge debate

Clone-by-clone (hierarchical) shotgun approach Whole genomes are sequenced by first cloning large pieces into a set of overlapping cosmids, then ordering cosmids by physical mapping. Each ordered cosmid is sequenced by shotgun sequencing, i.e., random sub-cloning of fragments into vectors for sequencing. Sequences are pieced together for each cosmid using assembly software such as PHRAP (remember PHRED?). Remaining gaps in a cosmid or between cosmids are closed using primer walking or probing of Southern blots to identify new fragments

Comparison of two sequencing approaches HIERARCHICAL (1990) GENOME-WIDE SHOTGUN (1998)

Sizes of sequencing vectors Vector Whole chromosome YAC BAC Cosmid Plasmid M13 Size (approx.) 250 MB 1500 KB 150 KB 40 KB 5-10 KB 1 KB

Methods for physical mapping Mapping method Experimental resource Breakpoints Markers Fingerprinting Library of clones Endpoints of clones Restriction sites or STSs Hybridization mapping Library of clones Endpoints of clones Whole clones or STSs In situ hybridization (cytogenetic map) Chromosomes Cytological landmarks Optical mapping Chromosomes Restriction fragments DNA probes Restriction sites or STSs Radiation hybrid Human/rodent fusion cells Radiation-induced chromosome breaks STSs Genetic linkage (meiotic) Pedigrees Recombination sites DNA polymorphisms Table 4.2; Primrose and Twyman 3 rd Edition 2003

Fingerprinting with restriction enzymes FPC software (Fingerprinted contigs) From Primrose and Twyman 3 rd Edition 2003

Sequence Tagged Sites (STS s) An STS is a primer pair that amplifies a unique region of the genome (i.e., produces a single PCR band see below right). These can also be used for identifying overlapping cosmids, i.e. fingerprinting Figures 4.3 and 4.4; Primrose and Twyman 3 rd Edition 2003

Hybridization mapping From Primrose and Twyman 3 rd Edition 2003 Clones hybridizing to repeats are pre-screened and removed End-sequences of clones can be used as STS s

Cytogenetic mapping Uses FISH: Fluorescent In-Situ Hybridization Clones are fluorescently labeled and hybridized directly to metaphase plates Repeats and duplicated genes cause problems

Example pig cytogenetic map

Separating chromosomes with Fluorescence-Activated Cell Sorting (FACS) (Figs. 3.4 and 3.5 from Primrose and Twyman 3 rd Edition 2003)

Optical Mapping From Primrose and Twyman 3 rd Edition 2003

Radiation Hybrid (RH) Mapping (1) Human cells are X-ray irradiated to fragment chromosomes (2) These fragments are introduced into rodent cells (3) Human/rodent hybrids lose human chr. fragments randomly (4) Screen hybrids for markers A and B = prob. of breakage between A and B (i.e. separate fragments) P = probability that a DNA fragment is retained; Q = 1 P Rel. distance between A and B = log(1 ) Rays

Moving from hierarchical assembly to the whole-genome shotgun approach Little or no physical mapping required to hierarchically order clones Previously thought that cosmids were the upper size limit for shotgun sequencing This idea was destroyed when Fleischmann et al. (1995) determined the sequence of Haemophilus influenzae through a pure shotgun approach (no cosmid intermediary)

Comparison of two sequencing approaches HIERARCHICAL SHOTGUN (1990) GENOME-WIDE SHOTGUN (1998)

Sequence Assembly Algorithms Start from an initial sequence fragment (<1kb) chosen at random Chose the second fragment as having the best overlap with the first based on DNA sequence The overlap is based on strict match criteria specifying minimum length of match, max. length of unmatched segment, and the min. percentage of matching nucleotides A set of overlapping sequence reads is called a contig. Examples are PHRAP, Arachne, TIGRassembler

Even with all this sophistication, sequencing is still work Statistics for the genome sequence of Haemophilus influenzae: 1,830,137 bp of DNA in total Full shotgun approach producing DNA fragments 1.6-2.0 kb in length. 28,643 sequencing reactions 24,304 give useful & high-quality sequence Assembled directly into 140 contigs 8 technicians, 14 automated sequencers Total amount of time: 3 months

Genome sizes and coverage Organism S. cerevisiae (yeast) C. elegans (nematode worm) D. melanogaster (fruit fly) A. thaliana (flowering plant) Year MB sequenced % coverage (total) 1996 12 93 100 1998 97 99 100 2000 116 64 97 2000 115 92 100 Human chr. 22 1999 34 70 97 % coverage (euchrom.) Human genome (consortium) Human genome (Celera) 2001 2693 84 90 2001 2654 83 88-93 Reprinted from Table 5.2 of Primrose and Twyman