Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Similar documents
The Human Genome at Ten

Chapter 7. DNA Microarrays

Lecture 8: Sequencing and SNP. Sept 15, 2006

Matthew Tinning Australian Genome Research Facility. July 2012

Multiple choice questions (numbers in brackets indicate the number of correct answers)

DNA SEQUENCING BY SANGER METHOD

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Concepts and methods in sequencing and genome assembly

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.

SEQUENCING TARU SINGH UCMS&GTBH

The project of mapping Human Genome. Why they want to make a map of the human genome?????

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA

Next Generation Sequencing. Dylan Young Biomedical Engineering

Molecular Cloning. Genomic DNA Library: Contains DNA fragments that represent an entire genome. cdna Library:

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

The study of the structure, function, and interaction of cellular proteins is called. A) bioinformatics B) haplotypics C) genomics D) proteomics

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

Appendix A DNA and PCR in detail DNA: A Detailed Look

7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Pre-AP Biology DNA and Biotechnology Study Guide #1

DNA. Using DNA to solve crimes

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Introduction to Bioinformatics. Lecture 20: Sequencing genomes

What is Bioinformatics?

Ch 10 Molecular Biology of the Gene

A Crash Course in NGS for GI Pathologists. Sandra O Toole

Overview of Next Generation Sequencing technologies. Céline Keime

DNA/RNA STUDY GUIDE. Match the following scientists with their accomplishments in discovering DNA using the statement in the box below.

1. A brief overview of sequencing biochemistry

Biochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014

DNA Function: Information Transmission

DNA Structure DNA Nucleotide 3 Parts: 1. Phosphate Group 2. Sugar 3. Nitrogen Base

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Human Genomics. 1 P a g e

Gene Expression Technology

Biology Celebration of Learning (100 points possible)

DNA & Genetics. Chapter Introduction DNA 6/12/2012. How are traits passed from parents to offspring?

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Biotechnology. Biotechnology is difficult to define but in general it s the use of biological systems to solve problems.

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

MBioS 503: Section 1 Chromosome, Gene, Translation, & Transcription. Gene Organization. Genome. Objectives: Gene Organization

Molecular Genetics. The flow of genetic information from DNA. DNA Replication. Two kinds of nucleic acids in cells: DNA and RNA.

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

I. Gene Expression Figure 1: Central Dogma of Molecular Biology

The Polymerase Chain Reaction. Chapter 6: Background

Genetic Fingerprinting

Lesson 8. DNA: The Molecule of Heredity. Gene Expression and Regulation. Introduction to Life Processes - SCI 102 1

DNA: Structure and Function

Contact us for more information and a quotation

The Structure of Proteins The Structure of Proteins. How Proteins are Made: Genetic Transcription, Translation, and Regulation

Polymerase chain reaction

How do we know what the structure and function of DNA is? - Double helix, base pairs, sugar, and phosphate - Stores genetic information

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

Adv Biology: DNA and RNA Study Guide

Chapter 6 - Molecular Genetic Techniques

All This For Four Letters!?! DNA and Its Role in Heredity

(deoxyribonucleic acid)

Nuts and bolts of phage genome sequencing. the 5 5 and 5 8 perspective. Allison Johnson & Anneke Padolina

Lecture Four. Molecular Approaches I: Nucleic Acids

Complete draft sequence 2001

Honors Biology Reading Guide Chapter 10 v Fredrick Griffith Ø When he killed bacteria and then mixed the bacteria remains with living harmless

Videos. Lesson Overview. Fermentation

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Make the protein through the genetic dogma process.

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

The Biotechnology Toolbox

DNA, RNA and protein synthesis

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

Sequencing the Human Genome

SAMPLE LITERATURE Please refer to included weblink for correct version.

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Review Quizzes Chapters 11-16

Computational gene finding

Bacterial Genetics. Stijn van der Veen

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

DNA Sequencing by Ion Torrent. Marc Lavergne CHEM 4590

DNA sequencing. Course Info

Unit IX Problem 3 Genetics: Basic Concepts in Molecular Biology

Genetic Fingerprinting

Exam 2 Key - Spring 2008 A#: Please see us if you have any questions!

Course Overview. Objectives

translation The building blocks of proteins are? amino acids nitrogen containing bases like A, G, T, C, and U Complementary base pairing links

Resources. How to Use This Presentation. Chapter 10. Objectives. Table of Contents. Griffith s Discovery of Transformation. Griffith s Experiments

Chapter 10 Genetic Engineering: A Revolution in Molecular Biology

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Genomes: What we know and what we don t know

Next Gen Sequencing. Expansion of sequencing technology. Contents

BIOTECH 101 UNDERSTANDING THE BASICS

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

Transcription:

Genome Sequencing Technologies Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Sciences start with Observation

Sciences start with Observation and flourish with Introduction of Technology

Life Sciences Thrive Trough the Co-laboration between Biologists and Engineers Biologist Computer Engineer/ Bioinformatics BioMedical/ Chemical/ Mechanical Engineer

Components that lead to completion of the Human Genome Project Life Sciences: Discovery of DNA (1953) Invention of DNA Sequencing (1975) Polymerase Chain Reaction (PCR, 1985) Human Genome published (2003) Computer Sciences/ Bioinformatics: Computer languages (e.g. FORTRAN 1956) Floppy disc (1970), hard drives 1 st sequence database (EMBL) opens (1980) Genbank (1982) BLAST (Basic Local Alignment Search Tool) (1990) World Wide Web (1989) PC revolution (1 st PC by IBM in 1981) Engineering: Dye coupling to DNA (1986) Capillary electrophoresis (1995) Robots Automated Sequencer (1986) Microfluidics Nanotechnology

What is the Genome? The human body has about 100 trillion cells Each cell carries the same genetic information (genome) in its nucleus 2002 The Center for the Advancement of Genomics (TCAG). Why study genes? The study of genes is crucial to our understanding of the cell, development and disease. Depending on the cellular context only a subset of genes is expressed (transcriptome)

Main Biological Dogma DNA working library : coding, non-coding, regulatory elements RNA mrna Transcription Gene expression for particular cellular context Transcriptomics Genomics protein Translation and posttranslational modifications Expressed proteins in particular cellular context Proteomics

DNA structure and synthesis DNA: chemically linked chain of nucleotides which each consists of a phosphate, a sugar (ribose), and a nucleobase (Adenine, Guanine, Cytosine, Thymine)

DNA sequencing Principle of DNA synthesis Amplification of targeted DNA strand in presence of - primer that only hybridizes to one complementary strand- -(DNA) polymerase, (or Taq) - deoxynucleotides (datp,dttp,dctp,dgtp).

Frederick Sanger Two time Nobel laureate in chemistry 1958: identification of the amino acid sequence of insulin 1980: developed the chain termination method for DNA sequencing * 13 August 1918

Dideoxy Method of Sequencing (Sanger, 1975) DNA synthesis is carried out in the presence of limiting amounts of dideoxyribonucleoside triphosphates that results in chain termination Original method used radio-labeled primers or dideoxynucleotides

Sanger DNA sequencing using radiolabel

Fluorescently labeled ddntps used for sequencing reaction Use of thermo-stabile Taqpolymerase allowed automation. ddntps used for automated sequencing are labeled with different fluorescent dyes representing each of the 4 bases Enabled single lane electrophoresis

Automated DNA Sequencing- use of labeled dideoxynucleotides Random incorporation of ddntps results in termination of elongation reaction This reaction set-up will produce a set of DNAs of different lengths complementary to the template DNA Each fluorescent label corresponds to specific nucleotide.

Separation of Amplified Fragments When a fluorescence labeled, chemically modified dideoxynucleotide is integrated into the DNA strand, the reaction will be terminated, thus leaving a labeled fragment of a defined size. These fragments will then be separated by capillary electrophoresis according to their size.

Capillary Gel Electrophoresis Sequence ladder by radioactive sequencing compared to fluorescent peaks Capillary gel electrophoresis: Samples are excited by laser and emitted fluorescence read by CCD camera. Fluoresecent signals are converted into basecalls.

Automated DNA Sequencing ABI 310 one capillary DNA sequencer

ABI 310 Genetic Analyzer Capillary and electrode: Negatively charged DNA migrates towards anode DNA fragments labeled with different fluorescent dyes migrate according to their size past a laser

Sequencing Electropherogram Laser excites dyes, causing them to emit light at longer wavelengths Emitted light is collected by a CCD camera Software converts pattern of emissions into colored peaks Plot of colors detected in sequencing sample scanned from smallest to largest fragment www.contexo.info/dna_basics/dna_sequencing.htm

High-Throughput Whole Genome Sequencing Analysis of 384 sequencing reactions in parallel George Church, Scientific American, January 2006, pp47-54

How do you sequence a genome? Mapping versus Shotgun Mapping: break genome into smaller pieces, align the pieces to each other and sequence Shotgun: break genome into small pieces, sequence and re-align by finding overlapping sequences http://scienceblogs.com/digitalbio/2007/01/basics_how_do_you_sequence_a_g.php

The Human Genome Project Initial sequencing and analysis of the human genome Nature 409, 860-921 (15 February 2001) International Human Genome Sequencing Consortium The Sequence of the Human Genome Science, Vol 291, Issue 5507, 1304-1351, 16 February 2001 Venter et al. (Celera Genomics) Finishing the euchromatic sequence of the human genome Nature 431, 931-945 (21 October 2004) International Human Genome Sequencing Consortium

Development Sequencing Time Line 1953: Discovery of the DNA double helix by Watson & Crick

Development Sequencing Time Line 1953: Discovery of the DNA double helix by Watson & Crick 2005 2006

The Human Genome * Current genome (build 36) contains 2.85 billion nucleotides interrupted by only 341 gaps (2004) * Overall error rate is less than 1 error per 100,000 bp * Encodes for approx. 25,000 to 35,000 protein-coding genes * Genes make up about 1-2 % of the total DNA, the coding portions of a gene are called exons, which are interrupted by intervening sequences, called introns. * Genetic variation between individuals between 0.1-0.5%.

Who s sequence was used for the Human Genome Project? Different sources of DNA were used for original sequencing Celera: 5 individuals; HGSC: many Rationale: All humans share the same basic set of genes and genomic regulatory regions. The term genome is used as a reference to describe a composite genome The many small regions of DNA that vary among individuals are called polymorphisms: Mostly single nucleotide polymorphisms (SNPs) Insertions/ deletions (indels) Copy number variation Inversions

Single Nucleotide Polymorphisms (SNPs) The two forms of an SNP are called alleles. http://scienceblogs.com/digitalbio/2007/09/genetic_variation_i_what_is_a.php#more Most SNPs are without a physiological effect. They contribute to human variety. A smaller number correlates with an individuals susceptibility for certain diseases or response to drug treatments or disease.

How Does the Human Genome Compare to Others? # of protein-coding genes Human 25,000-35,000 C. elegans (roundworm) 19,000 D. melanogaster (fruitfly) 15,000 Haemophilus (bacterium) 1,738

Are we only twice as complex as the roundworm and the fruit fly? Many genes encode for more than one protein product (alternative splicing, RNA modifications, micrornas) Posttranslational modifications contribute to the complexity of the human proteome. Transcription factors and enhancers probably provide greater flexibility of gene expression Additional genes not found in invertebrates, such as genes encoding for antibodies, T- cell receptor, cytokines, etc. Alternative splicing of the alpha-tropomyosin gene

JGI Sequencing Facility (Joint Genome Institute, US Department of Energy) Assume 20 384-capillary sequencers running simultaneously approx. 700 bp per capillary run approx. 3 hours per run Approx. 40 Million bases per day in one facility

Hurdles to overcome Costs Low sensitivity, DNA needs to be amplified to be detected Electrophoresis is slow and has limited resolution Limited parallelization, semi high-throughput More powerful data analysis and bioinformatics tools for analysis of short fragments Change from bench-top robotics to microfluidic platforms

The Race for the $1,000 Genome Human Genome Project (2001, initial draft): $ 2-3 billion (includes development of technology) raw expenses estimated at $300 million Rhesus macaque (2006) $ 22 million Expected by end of 2006: $ 100,000 for full mammalian genome sequence Wanted:!!!!!! The $ 1,000 Genome!!!!!!!!! - low cost - high-throughput - high accuracy

Recent Technologies Pyrosequencing (454 Life Sciences, Roche) Emulsion PCR, Sequencing by synthesis SOLID System (ABI) Emulsion PCR, Sequencing by ligation Genome Analyzer (Illumina) Emulsion PCR, sequencing by synthesis using fluorescently labelled reversible chain terminators Single-molecule sequencing Nanopore Atomic Force Microscopy

Pyrosequencing 454 Life Sciences Addition of dntp releases a pyrophosphate (PPi) stochiometrically Sulfurase converts PPi to ATP ATP and Luciferase drive conversion of luciferin to oxyluciferin that generates visible light and can be measured with a CCD camera. Each light signal is proportional to the number of nucleotides incorporated. Apyrase digests unincorporated nucleotides Biotage, 2005 Real time sequencing Amplification of DNA by emulsion PCR Gel free, microchips with 800,000 holes 25 million bases in 4 hour run Shorter read length, approx. 100 bp, more complicated genome assembly Difficult to sequence repetitive stretches

Single molecule sequencing - Sequencing through nanopore Johan Lagerqvist www.ucsdnews.ucsd.edu Measure changes in electric current as DNA is passes through a nanopore surrounded by two pairs of tiny gold electrodes. Electrodes record electrical current perpendicular to the DNA strand As each DNA base is structurally and chemically different, each base creates its own electronic signature. Proposed model, not proven yet. Single molecule sequencing models using AFM also proposed.

Recent achievements May 2007: James Watson s genome deposited 454 technology, 2 month, $ 1-2 million Sept 2007: Craig Venter s genome deposited Sanger technology, 4 years, 70 million *Diploid genomes *Approx. 3.5% of Watson s genome could not be matched to the reference genome *Venter s genome had 4.1 million DNA variants comprising 12.3 Mb - 3.2 million SNPs - non-snp variants *These variants include a potentially increased risk of alcoholism, coronary heart disease, obesity, Alzheimer s disease, antisocial behavior and conduct disorder.

Aspects of the $ 1,000 Genome Triggers basic research: - how is activation of genes regulated? - understanding genetic links to cancer - facilitate genetic engineering - hunt for disease genes - diagnosis and treatment of diseases? - personalized patient treatment strategies? However: - How can patient privacy be protected? - Will insurance companies and employers use genetic information to screen out those at high risk for disease? - genetic engineering of bioterrorism agents