Library construc.on (overviews and challenges)

Size: px
Start display at page:

Download "Library construc.on (overviews and challenges)"

Transcription

1 Computa(onal Biology and Genomics Workshop April 18-22, 2016 Colorado State University Todos Santos Center Library construc.on (overviews and challenges) Aines Castro Prieto

2 Content Short overview of NGS technologies NGS general workflow Focus on Library construc.on (fundamentals) Library construc.on in most common NGS applica.ons (Illumina)

3 Next Generation Sequencing platforms Roche 454 (GS Junior, GS FLX+) Ilumina (Solexa) HiSeq, Genome analyzer, MySeq Applied Biosystems SOLID (5500, 5500xl) Life Technologies Ion Torrent (PGM, Ion Proton) Helicos Helicos Genetic Analysis System Pacific Biosciences PacBio RS Oxford Nanopore Technologies GridION, MinION Amplified Single-Molecule Sequencing Single- Molecule Sequencing Single-Molecule Real Time Sequencing Second generation sequencing Next generation sequencing Third generation sequencing Next next generation sequencing

4 Applications Kim K M et al Int J Syst Evol Microbiol

5 Next Genera.on Sequencing (advantages) Micro and nanotechnologies that reduce library construc.on components and costs of chemical reagents Avoids the use of.me- consuming screening methods (i.e. cloning) Reads with high coverage (or depth) Increase of speed and depth analyses while the cost is reduced Highly mul.plex (simultaneous sequencing and analyses of many samples)

6 Depth of Coverage 99% Accuracy Problem: C C A T G C A T A G ACGAGCCAGC G A G T G A G G C CTGAGGCGCT T T C A A G C A T G TAAGT G A T C C A A C G C T T C G G A A T A G C T T G T A A C C G T A C T A A A T T A C T T A G Nucleo.de errors can add up quickly due to NGS high- throughput Solu.on: Sequence nucleo.des mul.ple.mes

7 Coverage The number of.mes a nucleo.de is sequenced The depth of coverage depends on the applica.on Coverage per = Total output generated genome Total size of the sample sequenced Exercise: What is the coverage needed to sequence a human genome on Illumina HiSeq 2500 (output: 1000 billion bp)?

8 Evolu.on of NGS pladorms Van Dijk et al Trends in Gene.cs

9 (Second Generation Sequencing Technologies): General Workflow Library construc.on Library clonal amplifica.on Emulsion PCR Bridge PCR Sequencing chemistry Pyrosequencing Sequencing by- liga.on Sequencing by- synthesis Semiconducción iónica Sequencing by- synthesis Reversible termina.on Data analyses 454 Roche SOLiD Ion Torrent Illumina

10 Second Generation Sequencing Technologies CORE STEPS

11 Common aiributes of NGS pladorms: 1. Library construc.on (by amplifica.on or liga.on) with adapters 2. Clonal amplifica.on of each fragment of the library on a solid surface with covalently aiached adapters that hybridize the library adapters (image) 3. Direct step- by- step detec.on of the nucleo.de base incorporated during the sequencing reac.on by each amplified library fragment set (image) 4. Hundreds of thousands to hundreds of millions reac.ons detected per instrument run = massively parallel sequencing 5. A digital read type that enables quan.ta.ve comparisons 6. Sequencing mechanism that screens both ends of each sequenced fragment (paired- end reads) 7. Short reads compared to capillary sequencers (noise) Marids E. 2014

12 Cyclic array sequencing Y Number of cycles determines the length of the read Emission wavelength + signal intensity determines the base call X Signal of one specific nucleo.de generated from a group of amplified copies of the same fragment

13 50 million clusters 1 cluster = thousand copies of original fragment Sequencing- by- synthesis (SBS) Terminación reversible

14 NGS pladorms Similari(es Differences DNA templates are spa.ally segregated DNA is sequenced by synthesis Massively Parallel Sequencing Mul.plexing Need for robust QC pipelines Specific protocols for library construc.on Chemistry/Different approaches to generate a signal Throughput (bp per run) Read length (bp) Error rate Mardis Annu. Rev. Genomics Hum. Genet.

15 Sources of bias in NGS workflow Ini.al PCR- amplifica.ons (polymerase errors) Clonal amplifica.on (Emulsion or Bridge) Sequencing reac.on and signal processing (homopolymers, indels, dephase) Need: Strict approaches for data valida.on and allele calling to dis.nguish true variants from artefacts

16 PCR- related problems in NGS Preferen.al amplifica.on ( jackporng ) of certain fragments in library construc.on Duplicate reads/de- duplicate Low input DNA favors jackporng due to lack of complexity Introduc.on of false posi.ves ar.facts due to subs.tu.on errors by the polymerase Early cycles: error appears as a true variant Later cycles: error is drowned out by correctly copied fragments in the cluster Cluster forma.on (bridge amplifica.on) Bias in amplifying high and low G+C fragments Reduced coverage at these loci

17 What is a DNA Library? A collec.on of DNA fragments DNA library: contains DNA fragments represen.ng the en.re genome of an organism cdna library contains only complementary DNA synthesized from mrna molecules in a cell - coding regions of the genome -

18 What is library construc.on? Preparing original source material (DNA or RNA) into a form compa.ble with NGS system Collec.on of similarly sized DNA/RNA fragments with known (and pladorm- specific) adapters sequences added to the 5 and 3 ends By introducing as liile quan.ta.ve bias as possible and reducing loss of material

19 Main goal: Goal: To construct a library that guarantees high molecular recovery of the the original fragments (high complexity and low PCR amplifica.on or other amplifica.on- based clonal bias). Library complexity is the number of unique fragments present in a given library. It is affected by the amount of: - star.ng material - DNA lost during cleanups and size selec.on - Duplica.on introduced by PCR

20 Why is library construc.on Key element to NGS SO important? Dictates the success of all downstream processes from sample prepara.on to sequencing Understanding library prepara.on will ensure you get the highest sequencing data

21 Basic principles for NGS library prepara.on 1. Fragmenta.on of source material (gdna, cdna or immunoprecipitated chroma.n) 2. Addi.on of adapter sequences 3. Size selec.on (and enrichment) 4. Final library quan.fica.on and quality control (QC)

22 Typical library construc.on workflow

23 Star.ng material input Quan(ty Methods: Nanodrop PicoGreen Qubit qpcr Quality Methods: Visualiza.on on agarose gel Agilent Bioanalizer

24 DNA (1-5μg) RNA 2:1 ra.o 28rS:18rS 260/280 = /230 = >1.7

25 Fragment libraries Bring the star.ng material into smaller pieces (<800 bp) Methods: Physical (acous.c shearing, sonica.on and hydrodynamic shear) Enzyma.c (DNase I, restric.on endonucleases or non- specific nucleases, transposase) Chemical (heat and divalent metal ca.on) Choice of method is not a major source of bias but affects significantly the recovery of desired fragments, thus the amount of star.ng material required Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

26 Fragmented DNA (nebuliza.on) on electrophore.c gel Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

27 Amplicon libraries ALTERNATIVELY, If sequence of specific target is known, PCR amplifica.on of those targets is used to produce DNA amplicons of the desired size range Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

28 Polish ends End- repair Each fragment is blunt- ended (overhangs fee) and 5 phosphorylated (enzyma.c reac.on using T4 DNA pol, Klenow) Adenylate 3 end To prevent fragments from liga.ng to one another (chimera) during adapter liga.on reac.on (Klenow exo) Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

29 Image from Broad boot camp Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

30 Addi.on of adapters All systems require each fragment to have dis.nct upstream and downstream adapters Adapters are % bp of known sequences containing the annealing site for the sequencing primer(s) Adapter sequences are ligated to the 5 end and 3 end of each fragment or amplicon (Later in the workflow) Addi.onal sequences are added to the adapter- fragments using tailed primers by PCR including: PlaLorm- specific sequences for clonal amplifica.on (P5,P7) and indexes/ barcodes/tags sequences for mul.plex sequencing (op.onal) Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

31 Architecture of a standard NGS libraries Image from Broad boot camp Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

32 How to choose a library type?

33 Mul.plexing Sequencing mul.ple samples in a single unit (saves resources) Barcoding or Indexing the samples Different libraries each with unique barcode to pool and sequence together Many barcodes in a single experiment (96 samples in one Illumina lane) Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

34 Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

35 Size selec.on Select the library fragment sizes (400 pb is ideal for op.mal cluster density and sequencer op.cs) Enrich further for fragments of the desired size and remove excess adapter, adapter dimers or other ar.facts (=purify liga.on products) Methods: Gel electrophoresis and magne.c bead- based Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

36 Size selec.on Gel electrophoresis Band excision Magne(c separa(on Magne.c beads Example of cut library targe.ng fragment size of 490 bp Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

37 PCR- enrichment To select for fragments containing adapters at both ends and to generate sufficient quan..es for sequencing To add sequences to the adapter- fragments including: PlaLorm- specific sequences for clonal amplifica.on (P5,P7) and indexes/barcodes/tags sequences for mul.plex sequencing (op.onal) Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

38 Quan.ta.ve and qualita.ve valida.on To verify if there is sufficent amount and quality DNA for sequencing Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

39 What is a good quality library? The one that contains a diverse set of DNA fragments = high complexity What is a good quan(ty library? Depends on library protocol Too much DNA? Low quality data Too liile DNA? Reduced coverage Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

40 Quan.ta.ve valida.on Samples must be normalized to have the same concentra.on qpcr Used when insufficient amount of DNA More sensi.ve than Qubit Time consuming Fluorometric method (Qubit) Provides faster results Less sensi.ve than qpcr Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

41 Qualita.ve valida.on Agilent Bioanalyzer (or a gel) Used to check size distribu.on Reads gel chips containing DNA samples in wells Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC

42 Home take message: Know where the bias occur! Pay aien.on to experimental design, so the sources of bias that can be eliminated have a minimal impact on the final analyses

43 Insert size vs. Library fragment size Image from Broad boot camp Fragmenta.on Adapters addi.on Size selec.on Quan.ta.on and QC