Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Size: px
Start display at page:

Download "Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies"

Transcription

1 Eric T. Weimer, PhD, D(ABMLI) Assistant Professor, Pathology & Laboratory Medicine, UNC School of Medicine Director, Molecular Immunology Associate Director, Clinical Flow Cytometry, HLA, and Immunology Laboratories at UNC Health Care CONFLICT OF INTEREST I have financial relationship(s) with: Advisory Board, Illumina AND My presentation does include discussion of off-label use. Commercial reagents for HLA typing by NGS are labeled for research use only, however I will be discussing the clinical application of these reagents Outline General NGS background and terms HLA region targeted enrichment NGS library preparation methodologies NGS chemistries NGS instrumentation and costs NGS data analysis 1

2 HLA Background Most polymorphic genes in the human genome Inherited as haplotypes Co-dominantly expressed Why use NGS for HLA? Sanger sequencing identified >10,000 alleles < 10% fully sequenced < 10% common > 40% have only been reported once High-throughput unambiguous typing Most NGS errors are substitutions Role of HLA Matching in BMT 70% of patients won t have donor in their family Spieser et al. Blood, Vol 87, No : pp

3 Next-generation Sequencing Terms Massively parallel sequencing: a technique in which many sequencing reactions occur and are detected simultaneously Library generation: process of creating DNA fragments with adapter sequences on both ends Adapters: short oligonucleotides that are attached to DNA to be sequenced and provide a mean to capture the sequence on platform Barcodes/Indices: molecular tagging of samples with unique sequence-based codes (~3+ bp) Coverage: percentage of bases called at predetermined depth with HLA Depth: number of individual sequence reads that align to a particular nucleotide Paired-end read mapping: independent read that are derived from the same library fragment Why use NGS for HLA? Sanger sequencing identified >14,000 alleles < 10% fully sequenced < 10% common > 40% have only been reported once High-throughput unambiguous typing Up to 384 patients within a single NGS run Sequencing clinically relevant areas UTRs and intron-exon boundary As of October 6, 2016, 18ASHI accredited labs for NGS Basic Principles of NGS Generate PCR amplicons Fragment DNA Prep fragments for sequencing (library prep) Immobilize on solid substrate Perform in situ clonal amplification Detect sequential incorporation of nucleotides (chemiluminesence, fluorescence, ph change) Applies to Ion Torrent and Illumina platforms Voelkerding, Dames, Durtschi. J Mol Diagn Sep;12(5):

4 High Quality DNA input = High Quality data out Varying DNA quantity requirements (40-400ng total DNA) Quality of DNA is independent of quantity» High quality DNA increases likelihood of successful sequencing run and higher quality data Quality most often determined by DNA fragmentation ssdna and protein contamination NGS requires quantifying dsdna» UV absorption (Nanodrop)» Intercalating dyes (QuBit, SYBR Green, QuantiFluor)» 5 hydrolysis probes (Taqman) with real-time quantitative PCR (qpcr)» Droplet digital emulsion PCR (ddpcr) NanoDrop vs QuBit Significant DNA quantitation differences between NanoDrop and QuBit NanoDrop (UV adsorption) less specific for dsdna compared to QuBit Simboloet al. PLoS One. 2013; 8(6): e62692 DNA Fragment Size Heavily fragmentation specimens can t be amplified by long-range PCR» DPB1 PCR product can be up to 10,000 bp» Most often from buccal swabs TapeStation or BioAnalyzer DNA size 4

5 Target Enrichment: HLA Region Commercial reagents utilize long-range PCR Multiplex PCR combining different HLA gene combinations Hybrid capture probes» Pull out specific regions of interest» Doesn t require amplification Generic NGS Library Preparation Image Courtesy of Agilent Technologies Nextera based library preparation: Tagmentation Simultaneous DNA fragmentation and adaptor ligation» Less hands-on time» Very sensitive to amount of DNA and incubation time Head et al. Biotechniques Feb 1;56(2):61-4 5

6 Sequencing by synthesis (SBS): Cyclic Reversible Termination Sequencing by synthesis (SBS): Single-nucleotide addition Single Molecule Sequencing 6

7 (2016) Summary of Illumina NGS Platforms Summary of ThermoFisher NGS Platforms Ion S5 costs: $65,000 (Heger, M. Thermo Fisher launches new systems to focus on plug and play targeted sequencing. GenomeWeb [online], (1 Sep 2015). Summary of single-molecule NGS Platforms PacBio RSII: $700,000 (from manufacturer) PacBio Sequel: $350,000 (from manufacturer) Oxford Nanopore MinION: $1,000 (Manufacturer's website) Oxford Nanopore PromethION: Still in beta testing 7

8 NGS Platform Error Rates Method Read length (bp) Single pass error rate (%) Time per run Ion Torrent h Illumina MiSeq h PacBio RS II: P6-C4 Oxford Nanopore MinION on average on average h 12 50h Adapted from Rhoads and Au. Genomics Proteomics Bioinformatics Oct; 13(5): Benefits of Paired-end Sequencing 300 bp Sequencing both end of single DNA molecule 800 bp Allows for better alignment over repetitive regions (homopolymer) Don t necessarily overlap NGS Quality Scores Quality scores the probability that a base is called incorrectly Each base in a read is assigned a quality score by a phred-like algorithm UNC NGS average: 95.6% ± 0.96 Q30 The Relationship Between Quality Score and Base Call Accuracy Quality Score Probability of Incorrect Base Call Inferred Base Call Accuracy 10 (Q10) 1 in 10 90% 20 (Q20) (Sanger) 1 in % 30 (Q30) (NGS) 1 in % 40 (Q40) 1 in 10, % 8

9 Sample Experimental Data Q30 Total Read (millions) Q Score NGS Data Analysis: Reference based alignment TGATGATATGGTCGTCACTGTCATGTTGGGGGCATAGGATATCCA CTGTCATGTTGGGGGCATA TGATGATATGGTCGTCACTGTCATGTTGG TGTTGGGGGCATAGGATATCCA CACTGTCATGTTGGGGGCATA ATGGTCGTCACTGTCATG GTCACTGTCATGTTGGGGGCATAGGATATCCA TGATGATATGGTCGTCACTG ATATGGTCGTCACTGTCA GTCAGTGTCATGTCGGGGG GGCCATAGGATATCCA ATTGGGGGCATAGGATATCCA TGATGATATGGTCGTCACTGTCA Depth of coverage: 7 4 de novo Assembly Non reference assisted data assembly Can be challenging with homopolyer stretches Accuracy of homopolymerbasecallsdecreases with increasing homopolymer length Chaisson, Wilson, Eichler. Nat Rev Genet Nov;16(11):

10 Summary NGS allows for high throughput sequencing in a cost efficient manner NGS success is dependent on the quality of input DNA NGS generates massive amounts of high-quality data and bioinformatics plays vital role in interpretation NGS is a technology with several intricate steps and understand each step is key to overall success 10