The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Size: px
Start display at page:

Download "The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun"

Transcription

1 The Journey of DNA Sequencing H. Sunny Sun What is a genome? Genome is the total genetic complement of a living organism. The nuclear genome comprises approximately 3.2 * 10 9 nucleotides of DNA, divided into 24 chromosomes that consist of 22 autosomes and the two sex chromosomes, X and Y. The mitochondrial genome is a circular DNA molecule of nucleotides, multiple copies of which are located in the energy-generating organelles called mitochondria. Genome size C value paradox: there is apparent absolute excess DNA compared with the amount that could be expected to code for proteins. Haploid genome size of model organisms Organisms Genome Estimated Gene/Mb Size No. of Genes Human 3 x 10 9 bp 35, Mouse 3 x 10 9 bp 35, Drosophila (fruitfly) 1.2 x 10 8 bp 10, C. elegans (roundworm)1 x 10 8 bp 13, Yeast 1.2 x 10 7 bp 5, E. coli (bacterial) 4.6 x 10 6 bp 4, Chromosomes The genetic material of the cell, complexed with protein and organized into number of linear or circular structure. Chromosome appearance varies with stage of the cell cycle and with cell type. 1

2 The three distinct but interrelated concepts of chromosomes Microscopy revealed that these genetic chromosomes had a physical manifestation Morphological chromosome Classical genetics gave us the picture of genes aligned like beads on a string Genetic chromosome Molecular biology showed us that the information in chromosomes lies in the nucleotide sequence of DNA Molecular chromosome The Morphological Chromosomes Chromosome packing One DNA molecule per chromosome 1.Nucleosome (10 nm): DNA + Histone core proteins (two of each H2A, H2B, H3, and H4) 2. Solenoid (30 nm): Nucleosome + H1 histone 3. Chromatin : solenoid + scaffold Nucleosomes 1. Nucleosomes are the fundamental unit of chromatin structure where eight histone molecules are clustered on 146 bp of DNA. 2. These clusters are separated from one another by 20 to 100 bp of spacer DNA. 3. The DNA is wrapped left-handed around a 3.2 nm radius core of histones. There are 1.8 turns of DNA per nucleosome. 4. DNase nicking of nucleosomes shows an average periodicity of 10.7 bp in center; 10.0 near ends. 2

3 Nucleosomes A chromatin loop with a bound regulatory factor Chromatin Euchromatin: actively transcribed regions; less densely stained, packed less tightly, high gene content. Heterochromatin: contains many repeats and no functional genes; densely stained, packed tightly, less gene content, mostly flanking the centromere region of each chromosome. Chromatin 3

4 Chromosome banding Certain chemical treatments of mammalian chromosomes yield differentially stained regions on chromosomes. C-banding stains centromeres. R-banding is the reverse of C-banding and stains non-centromeric regions in preference to centromeres. G-banding is obtained with Giemsa stain. It yields a series of lightly and darkly stained bands. Q-banding is a fluorescent pattern obtained using quinacrine for staining. The pattern of bands is very similar to that seen in G-banding. Chromosome C-banding Chromosome G-banding Karyotype Human male G-bands Human female G-bands Human male G-bands Human female G-bands 4

5 The Molecular Chromosome Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid" - Francis Crick and James D. Watson. Nature 17 (1): (dated 25 April 1953). - It was the first publication which described the discovery of the double helix structure of DNA. Double Helix DNA Base Pairing A-T pairing 2 H-Bonds G-C pairing 3 H-bonds 5

6 A-T/G-C Base Pairs Hold the DNA helices together A-T/G-C Base Pairs Hold the DNA helices together A-T/G-C Base Pairs Hold the DNA helices together A-T/G-C Base Pairs Hold the DNA helices together 6

7 A-T/G-C Base Pairs Hold the DNA helices together First generation sequencing: The Sanger Sequencing The Sequencing terminator Sequencing gels 7

8 Work flow of Sanger sequencing The Sanger sequencing - Long read length - Labor-intensive - Time-consuming Nat. Biotech. 26, (2008) Second generation sequencing Technologies Work flow of sequencings Next Generation sequencing (NGS) - Deep sequencing - High-throughput sequencing Nat. Biotech. 26, (2008) 8

9 NGS platforms Roche 454 Pyrosequencing Short reads 1. Genome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq Illumina 2. SOLiD 5500xl System Applied Biosystem 3. HeliScope Single Molecule Sequencer - Helicos Long reads 1. Genome Sequencer FLX System (454) Roche 2. PacBio RS - Pacific Bioscience 3. Personal Genome Machine, Ion Proton - Ion Torrent 4. MiSeq Illumina Roche 454/GS FLX Roche 454/GS Junior 9

10 Illumina/Solexa-GA Solexa-HiSeq Clonal amplification by bridge PCR isothermally MiSEQ ABI-SOLiD (Supported Oligonucleotide Ligation and Detection) - Small capacity system. PE 2x150cycles in 27hours. - PE 2 x 250bp coming soon error rate for read 1 less than 1%; read 2 about 1.2%. - In preparation PE 2 x 400bp error rate for read1 about 2%; read 2 about 4%. - In preparation Longer insert size possible 1.5kb Clonal amplification by emulsion PCR Sequencing by ligation template is probed by fluorescent tagged di-nucleotide probe 10

11 SOLiD 5500xl Ion Chemistry H+ ion is released during base incorporation. Individual polymerases attached to beads are positioned in tiny wells that rest on a tiny ph meter. The Ion Series.. Third generation sequencing PacBio RS 11

12 Emerging Technologies DNA, RNA and Protein analysis Oxford Nanopore new view on sequencing Exonuclease Cyclodextrin Electrically resistant Lipid bilayer Hemolysin pore - inner diameter of 1nm, about 100,000 times smaller than that of a human hair. Oxford Nanopore Nanopore array DNA sequencing Error rate 4%, prediction for end of the year 0.1 2%. 12

13 Oxford Nanopore new concepts Oxford Nanopore applications MinION - 150Mb per run - Tested 48kb read length -$900 per instrument -500 pores per device GridION - XXXMb per run - Tested 48kb read length -$XXX per instrument pores per device, soon 8000 pores -Cost per human genome $ DNA sequencing - Protein detection - Protein DNA interaction - Small molecule detection - 96 well plates for 96 samples - Controlled time of sequencing Advantages Nanopores offer a label-free, electrical, single-molecule DNA sequencing method No costly fluorescent labeling reagents No need for expensive optical hardware and sophisticated instrumentation to detect DNA, RNA and Protein. Performance/Limitations..? First data was released in Feb No updates since then No data available for the evaluation: High Error Rates - >5% Will start early access program in the next few months Applications of Next-generation Sequencing 13

14 RNA-Seq Protocol RNA-Seq Protocol (cont.) NGS data processing RNA-Seq Results 14

15 Advantages of RNA-Seq DNA reseqencing Exome-Seq Protocol Exome Capture 15

16 ChIP-Seq Protocol Methy-Seq Protocol MeDIP What type of sequencing platforms should I choose? -Application-dependent -long vs. short read -depth -turn-around time