NEXT GENERATION SEQUENCING Whole Gene Sequencing Ingrid Faé Educational Session 3: Next generation sequencing Stockholm, Friday, June 27 th 2014 Department for Blood Group Serology and Transfusion Medicine
Second generation sequencing (Ion Torrent) Third generation sequencing (PacBio) Quality Assurance
Preliminary question Are mutations in exons 2,3 and 4 the only actionable mutations in the entire HLA gene? What impact do mutation in the other exons and intron of the HLA gene have on proteinfolding and the subsequent presentation of antigens? The ultimate solution for preventing ambiguities in genotyping is to sequence the entire HLA gene.
Ion Torrent PGM Chemistry and detection Whole gene approach Workflow Advantages/disadvantages
Chemistry
Detection
Super high resolution for single molecule-sequence-based typing of classical HLA loci at the 8-digit level using next generation sequencers T. Shiina, S. Suzuki, Y. Ozaki, H.Taira, E. Kikkawa, A. Shigenari, A.Oka T. Umemura, S. Joshita, O. Takahashi Y. Hayashi, M. Paumen, Y. Katsuyama, S. Mitsunaga, M.Ota, J. K. Kulski & H. Inoko Tissue Antigens, 2012, 80, 305 316
Work Flow gdna Amplicon Library RNA Library Fragmentation Size dependent Fragmentation (long amplicons) End repair (small amplicons) Prepare WT or mirna End repair (for physical shearing) Adapter ligation & nick repair Adapter ligation & nick repair Hybr./Adapter ligation Adapter ligation & nick repair Size selection Size selection Reverse Transcription Size selection Amlification (if needed) Amlification (if needed) Size selection Amlification (if needed) Qualify & quantify Qualify & quantify Amplification Qualify & quantify Qualify & quantify
Fragmentation Enzymatic fragmentation blunt ends Physical fragmentation end repair http://www.tebubio.com/userfiles/image/035/035%20i_endit_dnafragments_1.jpg
Adapter Ligation & Nick Repair
E-Gel
Emulsion PCR
Begin with the begin -E.Coli library
First Own Library
HLA typing on 314 chip
HLA typing on 316 chip
Analysis Software Solutions HLA TypeStreamT Analysis Software (Life Technology) NGSengine (GenDx) Omixon Conexio
NGSengine
Omixon
HLA TypeStreamT Analysis Software
Coverage
Match List
Flagged Positions
Advantages Whole gene sequencing possible Clonal sequences Automation Chip size Low to High throughput Costs
Disadvantages HLA/IMGT database currently incomplete Single urgent samples Emulsion PCR Length of reads GC rich regions Coverage Phase Amplification bias Remedy-> third generation sequencing?
3 rd Generation Sequencing Reaction of single molecules is measured less starting material no PCR -> PCR bias (uneven amplification of different alleles) Genom of single cells released signal - realtime detection (Protone or Fluorophore) Heliscop Sequencer PacBio (SMRT ) DNA Sequencing
Advantages Long reads Unambiguous de novo phasing of longrange sequencing reads Reduced sample manipulation
Disadvantages High priced equiment Errors, while frequent, occur in random locations and base composition Similar length of amplicon in one run
Quality Assurance PCR primer design Loss of alleles Quantification of DNA/PCR product Multiplex PCR monitoring Creation of artefacts should be prevented
Validation Validation Analytical sensitivity the minimum detectable concentration of the analyte Specificity freedom from interference by any element or compound other than the analyte Precision is a measure of random errors, and may be expressed as Repeatability is the closeness of agreement between mutally independent test results obtained with the same method on identical test material in the same laboratory by the same operator using the same equipment within short intervals of time. Reproducibility
Quality check Total Bases Key Signal Filtered: Low quality Number of filtered and trimmed base pairs reported in the output BAM file. Percentage of Live ISPs with a key signal that is identical to the library key signal. Low or unrecognizable signal.
Quality check AQ20 The percentage of reads that have a predicted quality score of Q20 or better. AQ20 score is the predicted quality of a Phred-like score of 20 or better, or one error in 100 bp. AQ17 The AQ17 Read Lengths graph is a histogram of read lengths, in bp units, that have a Phred-like score of 17 or better, or one error in 50 bp.
Acceptance of Data Criteria for acceptance of data must be specified Read length Minimal allele ratio Coverage Examples
Read Length
Minimal Allele Ratio
Coverage
Contamination Negative controls Extended contamination control Barcode change
External Quality Controls Dedicated for NGS Whole gene sequencing Amplicon sequencing technique Format of Results Alleles FastqFiles Raw data of the device
Summary 2 nd Generation Sequencing ->advantages Whole gene sequencing High throughput Costs 3 rd Generation Sequencing Long reads Phasekeeping Quality Assurance