Research school methods seminar Genomics and Transcriptomics Stephan Klee 19.11.2014
2
3
4
5
Genetics, Genomics what are we talking about? Genetics and Genomics Study of genes Role of genes in inheritence Study of single genes and their effects/resulting disease Although both look from different angles, both need to be considered to fully understand the whole picture Study of all of a person s genes and the interplay of the genes Role of interaction of genes with each other and the environment (nongenetic factors) Study of complex diseases such as heart and lung diseases, diabetes and cancer Offers new options to personlized medicine (influence of risk factors) 6
How similar are we to. Humans are 99.5 to 99.9% similar to each other (not relatives!) 7
8
Genomics - What we will deal with in this presentation 1. Methods of sequencing (from the beginnings to next generation to next next generation sequencing) 2. Applications (what can we do with the sequencing tools we ve seen?) 3. (Analyzing your data) ask your bioinformatician of choice :D 9
Genomics the 3 Generations of sequencing First generation: Chain-Termination Sequencing (Sanger sequencing) Shotgun sequencing Second generation (next generation sequencing): Roche 454 Sequencing (GS Junior System/ GS FLX+ System) Applied Biosystems SOLID (5500 System/ 5500xl System) Solexa Illumina (HiSeq System/ Genome analyzer Iix/ MySeq) Pacific Biosciences (PacBio RS) Third generation (next next generation sequencing): Oxford Nanopore Technologies (GridION System/ MinION) Helicos (Genetic Analysis System) 10
Genomics the 3 Generations of sequencing 11
Genomics methods of the first generation - chain termination sequencing (Sanger sequencing) - 1) Denaturation of the dsdna to ssdna 2) Requires initial primer 3) 4 seperate reaction mixes (only differences is different ddntp either A, C, G or T) 4) Dideoxynucleotides lead to early chain termination 5) Seperation on a polyacrylamide gel (each reaction different lane) Gels were radioactively labeled Feasable technique for read length of 100 to 1000bp advances: radioactively labeled ddntps were exchanged for flourescent ddntps (capillary based sequencing) 12
Genomics methods of the first generation - shotgun sequencing - Relies on Sanger sequencing, however is capable of sequencing genomes High throughput sequencing technique that can collect a large amount of data at a fast rate. Works by partially digesting a genome or big strand of DNA into small overlapping fragments These small fragments are sequenced and fragments that overlap are matched together 13
Genomics methods of the second generation - Roche 454 sequencing - Oldest of the NGS technologies Current: `GS FLX Titanium` since late 2008 Technology is canceled has wide spread user base and niche applications FAST sequencing (<6h per run) Read-length 300-1000bp (modal length ~700bp) http://www.youtube.com/watch?v=bfnjxkhp8jc 14
Genomics methods of the second generation - Roche 454 sequencing - Fragmentation of DNA (600-800bp) and adapter ligation (red + green) Deposition in microreactors together with a bead sporting adapter sequences 15
Genomics methods of the second generation - Roche 454 sequencing - Binding of fragment onto bead Replication of fragments in the microreactor (polymerase etc in solution) replicas bind to free bead-adapters Lysis of microreactors and extraction of fragment covered beads 16
Genomics methods of the second generation - Roche 454 sequencing - Placement of beads in the PicoTiterPlate Filling of the wells with bound reagents Especially reagents responsible for creating the luminous signals (luciferase) 17
Genomics methods of the second generation - Roche 454 sequencing - Washing of the plate/wells with dntps, one at a time Recording of the intensity of the pyrophosphat activity 18
Genomics methods of the second generation - Roche 454 sequencing - Image interpretation `Flow chart` Conversion to textual representation of sequence-read per well 19
Genomics methods of the second generation - Roche 454 sequencing - Advantages: FAST sequencing (<6h per run) Read-length 300-600bp (modal length ~500bp) Throughput: ~ 1 mio reads 400-600 MBases per run (after quality filtering) Areas of application: Whole genome seq Targeted resequencing Sequencing-based Transcriptome Analysis Metagenomics Disadvantages: Poly-NTP errors are common (require specific errorhandling) Low throughput of 400-600 Mbases per run More expensive than competitors 20
Different Plattforms: Genomics methods of the second generation - Illumina - HiSeq (1000/1500/2000/2500/ X ) MiSeq NextSeq They differ in capabilities and throughput, technology is the same NextSeq: up to 150bp paired end (PE) and 120Gbases / 1.5 days MiSeq: up to 300bp PE and 15Gbases / 3 days HiSeq 2000: up to 125bp PE and 600Gbases / 11 days HiSeq 2500: up to 125bp PE and 1Tbases / 6days HiSeq X: up to 150bp PE and 1.8 Tbases / 3 days 21 Maridis Annu. Rev. Genome. Human Genet. 2008
Genomics methods of the second generation - Illumina - Fragmentation of sample + ligation of adapters (2 types) + size selection Binding of fragments onto cell surface + initial replication a) Adapter I; b) Adapter II; c) orig. fragment; d) unbound adapters on surface 22
Genomics methods of the second generation - Illumina - Bridge formation and polymerase activity using unlabeled dntps Final double stranded bridge a) Full (surface bound) Adapters; b) incomplete Adapters (from aborted polymerase activity: no more space!) 23
Genomics methods of the second generation - Illumina - Denaturing of double stranded bridge a) identical (+/- strand) surface bound copies of DNA-fragment Repetition of bridging, amplification, denaturation until a `forest of fragments exists 24
Genomics methods of the second generation - Illumina - Removal of adapter I bound fragments, Addition of ddntp like labeled bases + primer (adapter I): Sequencing of base 1 (Laser excitation, recording of fluorescence activity) 25
Genomics methods of the second generation - Illumina - Removal sequence elongation terminator Addition of ddntp like labeled bases Sequencing of base 2 Processing of all recorded images into textual format 26
Genomics methods of the second generation - Illumina - Advantages: Low error rate Lowest cost per base Tons of data Disadvantages: Must run at very large scale Short read length (50-150bp) Runs take multiple days High startup costs De Novo sequencing difficult Areas of application: DNA sequencing Gene regulation analysis Sequencing-based Transcriptome analysis SNPs and SVs discovery Cytogenetic Analysis ChIP-sequencing Small RNA discovery analysis 27
Genomics methods of the second generation - Pacific Bioscience SMRT sequencing - Real time, bound polymerase chain reaction using labeled dntps Pacific Biosciences SMRT (Single Molecule Real Time) Special labeling: fluorescent is situated at the terminal phosphate Incorporation with DNA polymerase releases the label, leaving a natural DNA strand behind. Generates up to 4TB of raw data (per 30 minutes (!!)) Single-molecule sequencing has been developed to circumvent the 2 main biases of PCRdependent sequencing (like 454, Illumina): 1) PCR introduces an uncontrolled bias in template representation because its efficiencies vary as a function of template properties 2) PCR introduces errors (generating false-positive SNPs) 28
Genomics methods of the second generation - Pacific Bioscience SMRT sequencing - 29
Genomics methods of the second generation - Pacific Bioscience SMRT sequencing - ZMV zero mode waveguide 30
Genomics methods of the second generation - Pacific Bioscience SMRT sequencing - Advantages: Very fast Areas of application: Can deliver really long reads de-novo assembly (mean read-length is >5500bp, longest reads can reach Targeted sequencing 30kb) 1 run is not really expensive (~400$ per run) Disadvantages: Only ~30000-50000 reads per SMRT Cell Need many runs for higher coverage high startup costs http://www.youtube.com/watch?v=_ B_cUZ8hSYU 31
Third generation sequencing - Oxford Nanopore MinION & GridION - Sequencing without fluorescent labels, without fragmenting the DNA Pipetting ions and the entire DNA through a small nanopore located in a synthetic polymer-membrane using a voltage difference A nanopore is the only possibility for current to cross the membrane only small sample volume is required 32
Third generation sequencing - Oxford Nanopore MinION & GridION - The inside of the nanopore is engineered for enhanced sensing For each triplet of nucleotides, a characteristic electrical signal caused by the ion-flow is detected The current change can be directly measured Signal for each triplet (overlapping!) is recorded 33
Advantages: Third generation sequencing - Oxford Nanopore MinION & GridION - Minimal sample preparation no requirement for polymerase or ligase potential of very long read-lenghts it might well achieve the $1000 per mammalian genome goal the instrument is inexpensive Challenges/Disadavantages: slowing down DNA translocation improving signal/noise ratio Potentially high error rate 34
Applications DNA/RNA sequencing can be used for a variety of applications, including: Genome sequencing - De novo sequencing of genomes - resequencing of genomes Detection of variants (SNPs) and mutations exome sequencing Confirmation of clone constructs Detection of methylation events Gene expression studies (transcriptomics) - Whole transcriptome - RNA seq/ small RNA Chip-Seq/PAR-CLIP 35
Applications - ChIP sequencing - ChIP-Seq is short for 'chromatin immunoprecipitationsequencing Used to determine the influences of chromatin-associated proteins and transcription factors on the actual transcription. DNA is usually wound up around 'chromatin' Unwinding is necessary for transcription (accessibility) Carried out/aided by transcription-factors and associated proteins How do these work? Where do they bind? => ChIP-Seq tries to deliver the answers. 36
Applications - ChIP sequencing - 'Cross-linking' or 'binding' of proteins to DNA 37
Applications - ChIP sequencing - Lysate the cell containing the cross-linked proteins 38
Applications - ChIP sequencing - Pulldown (magnetic beads) Wash off Undo cross-linking Sequence the DNA (Illumina, SMRT) Rough area (region 200-400bp) where protein bound to /interacted with DNA 39
Applications - gene expression studies transcriptomics - Transcriptome - set of all mrnas present in certain cell, tissue, organ, - mrna level results from intensity of transcription and mrna stability Transcriptomics - analysis of differences in expression of gene populations under different conditions (treatment, development, disease) - also called expression profiling 40
Applications - gene expression studies transcriptomics - Different types of RNA mrna Coding RNA t-rna rrna hnrna sirna mirna Coding RNA Approximately 10.000 to 15.000 genes are active in every single cell Different abundance comparing the single mrnas low abundant mrnas about 1 copy per cell highly abundant mrnas more than 3.000 copies per cells 41
Applications - gene expression studies transcriptomics - Why analyzing the transcriptome? How to analyze the transcriptome? allows to analyze expression changes in: using different methods depending on Different cell types the different source of sample/question In different conditions of the to answer: environment/development Microarrays (Affymetrix, Illumina allows to compare healthy vs. Beadchip technology) diseased state RNA-seq (454, Illumina) identification of new genes Real-time quantitative PCR serial analysis of gene expression (SAGE) massively parallel signature sequencing (MPSS) 42
Applications - gene expression studies transcriptomics - Workflow of a DNA microarray 43
Applications - gene expression studies transcriptomics - Result of a microarray: the heatmap 1 column= 1 sample 1 row = 1 gene Color key: indicating th relative expression 44
Applications - gene expression studies transcriptomics - Advantages of microarrays Advantages of RNA-seq affordable fast technique no specific equipment required high resolution Disadvantages of microarrays cross-hybrydisation possible sequences of the targeted genes need to be known material intensive (solutions, RNA) variation among laboratories high-throughput high coverage of mrna compared to microarrays new unidentified exons can be detected can handle all RNA isoforms Disadvantages of RNA-seq high costs computational complexities sample preparation can potentially induce bias (temperature-increased shredding of mrna) 45
Thank you! 46