High throughput DNA Sequencing. An Equal Opportunity University!

Size: px
Start display at page:

Download "High throughput DNA Sequencing. An Equal Opportunity University!"

Transcription

1 High throughput DNA Sequencing An Equal Opportunity University!

2 irst Generation DNA sequencing utilize chain terminator technologies (adaptation of Sanger sequencing) Adapt fluorescence chemistry, high-resolution chromatography, automation Throughput if everything is running 24/7: 24 runs per day per capillary, with 384 capillaries, 500 bp per run, then the daily output may be as high as 4 million bp per day -> 1.7 billion bp per year (cost - $2 per run, almost $7 million for 1.7 billion bp of sequence) An Equal Opportunity University!

3 Sanger sequencing the basis for most sequencing technologies Synthesize DNA in a test tube, using tagged precursors (radioac@ve, fluorescently- labelled) and mixtures of normal and termina@ng precursors Separate terminated chains (gels, columns)

4 Some other numbers: Size of the human genome 3 billion bp (2 yrs, $14 million) Size of the Arabidopsis genome 120 million bp (2 months, $1 million) Size of the yeast genome 12 million bp (1 week, $50,000) Size of the E. coli genome 4.6 million bp (3 days, $17,000) Size of a transcriptome of 30,000 mrnas 100 million bp (6 weeks, $820,000) These estimates assume 1-fold coverage. A gold-standard genome requires at least 10-fold coverage. -> higher capacity and lower costs are needed! An Equal Opportunity University!

5 Next Generation DNA sequencing Approaches high-throughput DNA fragment isolation and preparation, sequencing, detection, and analysis Instrumentation nanofabrication, using emulsion/bead or chip-based architectures, coupled with high sensitivity detection hemistry high-sensitivity detection of nucleotide incorporation (or, for SOLID, oligonucleotide ligation); typically fluorescent-based ( > laser excitation and detection) but other technologies exist (pyrosequencing, Ion Torrent) An Equal Opportunity University!

6 Next Generation DNA sequencing Unifying themes: preparation of large numbers (millions 100 s of millions) of individual DNA molecules in ways that permit independent sequencing May involve clonal propagation (by PR) or single-molecule techniques or clonal techniques, DNA samples must be modified to possess adapters to permit amplification and sequencing or most technologies, relatively short (50-300) bp DNA fragments must be generated An Equal Opportunity University!

7 a Roche/454, Life/APG, Polonator Emulsion PR One DNA molecule per bead. lonal amplification to thousands of copies occurs in microreactors in an emulsion PR amplification Break emulsion Template dissociation million beads Primer, template, dntps and polymerase hemically crosslinked to a glass slide b Illumina/Solexa Solid-phase amplification One DNA molecule per cluster Template dntps and polymerase Sample preparation DNA (5 µg) million molecular clusters luster growth c Helicos BioSciences: one-pass sequencing Single molecule: primer immobilized Bridge amplification Billions of primed, single-molecule templates d Helicos BioSciences: two-pass sequencing Single molecule: template immobilized e Pacific Biosciences, Life/Visigen, LI-OR Biosciences Single molecule: polymerase immobilized Billions of primed, single-molecule templates Thousands of primed, single-molecule templates

8 a Illumina/Solexa Reversible terminators G T A G A T Incorporate all four nucleotides, each label with a different dye G T A G G A T Wash, fourcolour imaging G T G G A T leave dye and terminating groups, wash G T G G A T Repeat cycles b T A G Top: ATGT Bottom:

9 An Equal Opportunity University!

10 c Roche/454 Pyrosequencing 1 2 million template beads loaded into PTP wells low of single dntp type across PTP wells dntp APS Polymerase PP i Sulphurylase Luciferase ATP Luciferin Light and oxyluciferin d lowgram TAGGTTTTTTAAAATAATTTTTGGATTAAAATGTAGATAATG ATAAATTAATAAATAATTAGTTGATAGTGAATTTAT 7 6 T A G mer 5-mer 4-mer 3-mer 2-mer 1-mer

11 Ion Torrent sequencing measures the proton released during DNA synthesis

12

13 PacBio An immobilized DNA polymerase copies a captured DNA chain Each incorporated nucleo:de is detected as the chain is being elongated A single- molecule technique that does not require amplifica:on

14 Sequencing hemistry Ion Torrent 454 Sequencing Illumina PacBio Semiconductor sequencing Pyrosequencing Amplification approach Emulsion PR Emulsion PR Sequence- by- synthesis Immobilized DNA polymerase Bridge amplification Single molecule bp per run 1.9 Gb 650 Mb 500 Gb 90 Mb Reads/run 5 million 1 million 2 billion 30,000 Time per run 7 hours 20 hours 6 days 2 hours Read length 400 bp 650 bp 2x250 bp 3000 ost per run $875 $6200 $15,000 $900 ost per Gb $460 $9500 $30 $1100 ost per instrument 50,000 $450,000 USD $690,000 $695,000 gen- fieldguide- 2014/

15 How RNA- seq data is generated Isolate Transcript RNA Reverse Transcrip8on AAAAAA AAAAAA AAAAAA AAAAAA ragment cdna Size Selec8on Illumina Sequencing of each end RNA- Seq AGG *based on Illumina approach GGAG AAAA **strand-specific RNA-seq protocols exist for both Illumina and SOLiD Slide complements of Andrew McPherson AAA TGG GAAA bioinformatics.ca

16 Numerous possible analysis strategies There is no one correct way to analyze RNA- seq data (though there are some incorrect ways) Two major branches Direct alignment of reads (spliced or unspliced) to genome or transcriptome Assembly of reads followed by alignment* or transcriptome Image from Haas & Zody, 2010 *Assembly is the only option when working with a creature with no genome sequence, alignment of contigs may be to ESTs, cdnas etc RNA- Seq bioinformatics.ca

17 So*ware Summary BWA, BWA* Novoalign, MOSAIK Bowtie, TopHat SpliceMap, RNA-Mate, QPALMA ABySS, Velvet Scripture ufflinks ALEXA-Seq Trans-ABySS (uses BLAT) Image from Haas & Zody, 2010 *Modified exon-junction aware BWA, soon to be added to BWA release RNA- Seq bioinformatics.ca

18 Measuring gene expression using RNA- Seq The measured parameter numbers of individual sequence tags that map to a given gene Assume the rela?ve frac?on of RNA- Seq tags (out of the en?re collec?on of sequences) is a reasonable representa?on of the abundance of the mrna Possible complica?ons: inconsistent removal of rrna (this will change the overall representa?on of mrna- derived tags) amplifica?on of rare cdna products (this is an ar?fact that creates the impression of higher expression levels) gene- to- gene (and sequence- to- sequence) variability in RT and PR efficiencies Analysis determine rela?ve tag number ( tags per million ), normalize for mrna length (tags per million per kb of sequence), compare different samples, es?mate expression ra?os Op?mal situa?on analyze mul?ple (>3) biological replicates for each treatment, treat as one would for microarray studies (t- test, ANOVA, correct for tag distribu?on across gene) Results gene- by- gene list of rela?ve expression changes

19 Mapping results Each?c is a separate sequence tag Ideally, the tag distribu?on across a gene will be uniform Tag numbers are es?mates of expression levels Tag distribu?ons iden?fy splicing evens (and variants)

20 Lower expression Not expressed Higher expression

21 Determina?on of [mrna] summary Method Advantages Disadvantages RNA blozng RNAse protec?on qrt/pr Microarrays NGS aptures informa?on about en?re RNA Greater sensi?vity, ease of mul?plexing High sensi?vity, greater scale- up capability Genome- scale, near- nucleo?de resolu?on is possible Genome- scale, nucleo?de- level resolu?on, may not require finished genome Lower sensi?vity, low throughput Lose informa?on about the full mrnas, low throughput Lose informa?on about the full mrnas ost, availability of pla]orms, requires detailed genome informa?on ost, computa?onally intensive