Overview of Next Generation Sequencing technologies. Céline Keime

Size: px
Start display at page:

Download "Overview of Next Generation Sequencing technologies. Céline Keime"

Transcription

1 Overview of Next Generation Sequencing technologies Céline Keime

2 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

3 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

4 Conventional vs second generation sequencing Conventional sequencing 2 nd generation sequencing è Important decrease of Cost per base Time needed to obtain sequences Shendure et al., 2008

5 Decrease of sequencing costs Log scale! When the sequencing centers transitioned from Sanger-based to 2 nd generation sequencing techniques

6 Increase of data volume Projects in Genome Online Database Bases in Sequence Read Archive

7 Second generation sequencing < Three main technologies < Hiseq - Illumina (formerly Solexa) < Fedurco et al., 2006 < SOLiD DNA Sequencer - Applied Biosystems by Life Technologies (Thermo Fisher Scientific) < Shendure et al., 2005 < Genome Sequencer FLX+ - Roche (formerly 454) < Margulies et al., 2005

8 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

9 1. Library preparation DNA Fragmentation Adapters ligation Size selection PCR amplification

10 1b. (possibly) Capture to select only regions of interest

11 2. Fixation of DNA fragments on a solid support Flow cell Surface coated with oligos

12 2. Fixation of DNA fragments on a solid support Patterned Flow Cell : billions of ordered wells à More reads à Faster run time Hiseq3000/4000/X sequencers

13 3. Amplification Hybridization Reverse strand synthesis Denaturation Bridge hybridization BRIDGE AMPLIFICATION Synthesis Denaturation Adapter Oligo Flowcell surface Cluster Cleavage of Reverse strand Bridge amplification repeated

14 3. Amplification : material and result cbot Clusters

15 4. Sequencing by synthesis First sequencing cycle: Determination of the 1 st base of each cluster 1. Addition of : - Sequencing primers - 4 nt with reversible terminators + fluorescent labelling - Polymerase 2. Laser excitation è emitted fluorescence captured è identification of the incorporated base 3. Cleavage of the protection groups and of the fluorescent labels

16 4. Sequencing by synthesis As many sequencing cycles as the number of bases needed in the resulting read eg cluster 1 = TAAGT è 1 read Each line (top and bottom) is divided into swaths composed of tiles à 1 image per tile per color, for each cycle

17 Alternatives to standard protocol < Paired-end sequencing < Sequencing of read 1 : previously described method < Then sequencing of read 2 : Cluster Bridge formation 1 For each molecule on the surface of the flow cell : Complementary strand synthesis 2 Denaturation Cleavage of the original strand è Sequence the other end of the original molecule è Step performed on the flow cell in the sequencer : keep the position of clusters This information allows to link pairs of sequences

18 Alternatives to standard protocol < Mate-pair sequencing Fragmentation (2-5 kb) Adapters ligation End labelling (biotin) Cluster generation Circularization Sequencing of the 1 st end Fragmentation ( bp) Selection of the biotinylated fragments Molecule reversal Sequencing of the 2 nd end

19 Alternatives to standard protocol < Multiplexing < Add a barcode (index) specific to each sample < Sequencing of several samples together < Single indexing < DNA or RNA : up to 24 indices < small RNA : up to 48 indices < Dual indexing (2 barcodes) < Allow to multiplex up to 96 samples

20 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

21 Sequencing by ligation Dinucleotides encoding AT

22 Sequencing by ligation Ligation of a probe with a sequence-specific part + a fluorescent label Fluorescence detection Cleavage of the fluorescent label

23 Sequencing by ligation < Same method with probes n-1, n-2, n-3 and n-4 < Each base is interrogated two times < The number of ligation cycles determines the length of the sequences < Here 7 cycles for 35 bp < The knowledge of the base in 5 of the universal primer (base 0) allows the deconvolution of the sequence of the whole read

24 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

25 Comparison of second generation sequencing technologies Illumina HiSeq Roche 454 Applied Biosystems SOLiD Amplification Bridge amplification Emulsion PCR Emulsion PCR Wildfire isothermal amplification Sequencing By synthesis (reversible terminators) By synthesis (pyrosequencing) By ligation

26 Comparison of second generation sequencing technologies Number of reads per paired-end run Maximum read length Number of bases per paired-end run HiSeq4000 Illumina GS FLX+ Roche 10B 1M 2.4B 5500xl W System Applied Biosystems 2x150 bp (paired-end) 1 kb (mode : 700 bp) 2x50 bp (paired-end) 1500G 0.7G 320G Maximum run time 3.5 days 1 day 10 days Main advantages Very high throughput Large read length Accuracy Main limitations Relatively short read length overcame with Moleculo Errors in homopolymers Relatively low throughput High cost Support of platform ended in 2016 Short read length

27 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

28 Benchtop high-throughput sequencing platforms < Laser-printer sized < Modest set-up and running costs < but cost per base higher than previously described sequencers < Modest run time < Instruments available : 454 GS Junior Ion PGM MiSeq MiniSeq

29 454 GS Junior (Roche) < Since 2010 < Similar approaches as 454 GS FLX 1 fragment = 1 bead Emulsion PCR amplification in a microreactor (droplet from the emulsion) à Clonal amplification of a unique fragment à Millions of PCR reactions in parallel Beads + enzymes loaded onto a device à only one bead per well à one bead = one read

30 454 GS Junior (Roche) Pyrosequencing - Nucleotides added one by one onto the plate - Nucleotide incorporation è liberation of a pyrophosphate - Light signal recorded by a camera Image analysis

31 Ion PGM (Life Technologies) < Since 2011 < Emulsion PCR, beads deposited on a chip (one bead per well) < Sequencing by synthesis using non-modified dntp < Method for the detection of dntp incorporation based on the detection of hydrogen ions released during the reaction

32 Ion PGM (Life Technologies) < Nucleotide added to a DNA template < Incorporation into DNA à hydrogen ion released à the charge of that ion changes the ph of the solution à detected by the ion sensor < Incorporation of 2 bases à voltage will double < No incorporation à No hydrogen ion released

33 MiSeq / MiniSeq (Illumina) < MiSeq since 2011, MiniSeq since 2016 < Similar approaches as HiSeq < Bridge amplification < Sequencing by synthesis < But smaller flow cell faster image acquisition and faster microfluidics è reduced run time

34 Medium output sequencing platforms < Ion Proton (Life technologies) < on_spec_sheet_fhr.pdf < NextSeq 500 (Illumina) < < NextSeq 550 = Microarray scanning + NextSeq 500

35 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

36 Ultra high-throughput sequencers < Illumina Hiseq X < 1.8 Tb per run / 6 billion single-end reads per run < Read length up to 2x150bp < Run time < 3 days < Only Hiseq X Ten or Five / for whole-genome sequencing < Illumina Hiseq X Ten < 10 HiseqX < Instrument cost : $10 million < $1,000 for a 30X human genome < Illumina Hiseq X Five < 5 HiseqX

37 Summary of Illumina sequencers Increasing system price and output MiniSeq 7.5Gb 25M 2x150 Output Number of single reads per run Maximum read length Decreasing price per Gb

38 New Illumina sequencer since 2017 : NovaSeq6000 < Up to 2Tb per run with 2 flow cells (6.6 billion reads per run) < Different flow cells will be available for different outputs < Output planned for S4 flow cell : 6Tb per run with 2 flow cells (up to 20 billion reads) < Read length up to 2x150bp < Run time up to 2 days * * * * Not released

39 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

40 Comparison between different generations of sequencing technologies < 1 st generation < Sequencing of pre-amplified molecules one by one < 2 nd generation < Clonal amplification and sequencing of million of molecules at the same time < PCR, RT needed è bias < 3 rd generation < Main improvements < No amplification < Direct RNA sequencing < Long reads < Current drawbacks < Important error rate < Small number of reads compared to 2 nd generation technologies < Nanopore sequencing, Pacific Biosciences

41 Nanopore sequencing < Nanopore = nano-scale hole è è If DNA pass through the nanopore, this creates a characteristic change in the magnitude of the current through the nanopore By measuring this change the nucleotide could be identified (4 nucleotides = 4 different signatures)

42 Nanopore sequencing < Branton et al., Nature Biotechnology 2008;26(10): < Nanopore sequencing < Sequence unique single stranded molecules through nanopores < Both strands of the DNA can be read < A lot of nanopore experiments conducted at the same time on an array < Array loaded on MinION or PromethION instruments

43 Nanopore sequencing MinION PromethION Flow cells 1 48 Channels ,000 Run time Up to 48h Up to 48h Number of reads at 10kb at standard speed Up to 4.4M Up to 1,250M

44 Nanopore sequencing < Advantages < Very low amount of starting material < DNA or RNA molecules are read directly < Long reads (> 5kb, depend on fragment length, longest reported ~ 1Mb) < Fast (1bp/ns) < No fluorescence < Major difficulties < Find the best characteristics of nanopores to allow the correct detection of nucleotides < Control the speed and position of DNA in the nanopore in order to be detectable < High error rate

45 Pacific Biosciences < Single Molecule Real Time Sequencing (SMRT) < Direct measurement of nucleotides incorporation by DNA polymerase < Sequencing occurs on SMRT Cells < Each SMRT Cell contains thousands of Zero-Mode Waveguides (ZMWs) in which polymerase is immobilized < ZMWs provides a window for watching DNA polymerase as it performs sequencing by synthesis

46 Pacific Biosciences < Eid et al. Science 2009;323(5910): < PacBio RS II, Sequel PacBio RS II Sequel Smaller, half of the cost and almost 7 times as powerful

47 Pacific Biosciences < Sequel output Complete specifications on

48 Pacific Biosciences < Advantages < No amplification < Very large read length in very short run time < Main limitations < Very important error rate (10-15%) < Expensive è limited throughput

49 Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing by ligation - SOLiD < Comparison of 2 nd generation technologies < Benchtop high-throughput sequencing platforms < Ultra high-throughput sequencers < Third generation sequencing < General information about Next Generation Sequencing

50 Next Generation Sequencing applications < During this training we will focus on RNA-seq and ChIP-seq < But there are a lot of other NGS applications < Annotated bibliography of *seq : < Review of applications using Illumina technology products/research_reviews/sequencing-methods-review.pdf

51 Next Generation Sequencing questions and answers < SEQanswers <