Quality assurance in NGS (diagnostics)

Size: px
Start display at page:

Download "Quality assurance in NGS (diagnostics)"

Transcription

1 Quality assurance in NGS (diagnostics) Chris Mattocks National Genetics Reference Laboratory (Wessex)

2 Research Diagnostics

3 Quality assurance Any systematic process of checking to see whether a product or service is meeting specified requirements. This process considers design, development, production, and service. Are results accurate & complete Are they delivered in timely manner

4 NGS applications NGS is amenable to a wide range of applications including: Gene panels Exome analysis Tumour profiling / deep sequencing Non-invasive prenatal diagnosis Genome architecture / structural analysis Methylation analysis RNA sequencing Expression analysis..etc.. Genome analysis

5 Phases of QA Customer request clinical need Test design Development Validation Process quality control Data analysis Interpretation EQA

6 Design Choose appropriate targets Relevant Interpretable Reportable Actionable Choose appropriate technology and methodology Accuracy Capacity existing TAT Batching Consider technical limitations of platforms Cost

7 Development Ensure selectivity All targets covered Only targets covered Exclude interferences Consider SNPs in probes / primers Optimise technical parameters Minimise variation in coverage Maximise yield Use best reagents (taq, MIDs etc), processes Develop process provide: Suitable environments and environmental separation Suitable equipment Staff training

8 Validation Requirement of ISO15189 Sensitivity of new test currently offered e.g. BRCA1/BRCA2 by HRM, DHPLC and/or Sanger sequencing Qualitative test requires at least 300 control samples to show sensitivity 99% (95%CI) Can be at technical level as opposed to disease / gene specific

9 Process QC DNA sample Base calling Library prep Read mapping Targeting Re-align, refine Q scores Amplification Identify variants Mixing Filter variants Sequencing

10 Sample processing DNA sample Library prep Targeting Amplification Mixing Sequencing Quantification Nanodrop, Qbit Purity A260/A280, A260/A230 Identity check (SNP panel) rtpcr? Fragmentation profile - Bioanalyzer Process controls e.g. A-tailing, adapter ligation Targeting controls Quantification- nanodrop, Qbit, qpcr Fragmentation profile Bioanalyzer Tag counting

11 Data analysis Proprietary vs Open source Proprietary packages are much simpler to set up and operate and will require much less bioinformatic expertise. BUT NGS is an extremely complex process - analysis cannot be expected to be simple. Identifying quality issue that may impact on biological conclusions requires in-depth understanding biological problem and the technology And considerable data mining

12 Base calling Generally performed in real time Standard quality metric = Phred score Phred Accuracy Probability of error 20 99% 1: % 1: % 1: % 1:100000

13 Base calling Phasing noise Signal decay Mixed cluster Boundary effect Cross talk Fluorophore accumulation Commonly modelled biases for Illumina sequencing: Brief Bioinform Sep;12(5):

14 Base calling Accuracy of Phred scores assigned by basecallers: Brief Bioinform Sep;12(5):

15 Mapping Alignment algorithms should: Map reads with real variation Cope with sequencing errors Assign accurate mapping quality score Dependent on base calling quality score Re-calibration of initial base call quality scoring (SOAP, GATK)

16 Overall quality metrics Generally at run level (e.g. PICARD metrics) TOTAL_READS: The total number of reads including all PF and non-pf reads. When CATEGORY equals PAIR this value will be 2x the number of clusters. PF_READS: The number of PF reads where PF is defined as passing Illumina's filter. PCT_PF_READS: The percentage of reads that are PF (PF_READS / TOTAL_READS) PF_READS_ALIGNED: The number of PF reads that were aligned to the reference sequence. This includes reads that aligned with low quality (i.e. their alignments are ambiguous). PCT_PF_READS_ALIGNED: The percentage of PF reads that aligned to the reference sequence. PF_READS_ALIGNED / PF_READS PF_ALIGNED_BASES: The total number of aligned bases, in all mapped PF reads, that are aligned to the reference sequence. PF_HQ_ALIGNED_READS: The number of PF reads that were aligned to the reference sequence with a mapping quality of Q20 or higher PF_HQ_ALIGNED_BASES: The number of bases aligned to the reference sequence in reads that were mapped at high quality. PF_HQ_ALIGNED_Q20_BASES: The subest of PF_HQ_ALIGNED_BASES where the base call quality was Q20 or higher. etc, etc.

17 Overall quality metrics BMC Genomics Dec 2;11 Suppl 4:S7. NGSQC: cross-platform quality analysis pipeline for deep sequencing data. Metrics based on position (tile, panel etc) Colour Base distribution Quality score Paired end sequencing Quality score Mapped

18 Variant calling Simple threshold Complex probabilistic models based on base quality score mapping quality score Internal run data External data

19 Required read depth Assuming no allelic bias the theoretical read depth required to detect heterozygous variation with given accuracy can be calculated (binomial distribution) Limit of detection is limited by per base accuracy of sequencing (~99%) NB Calculations based on variation >25% of reads Quality Het call accuracy Probability of error Depth required Q20 99% 1: Q % 1: Q % 1: Q % 1:

20 Molecular IDs

21 Number of reads sampled Molecular IDs A/G Het % reads called A Nucleic Acids Research, 2011, Vol. 39, No. 12 e81

22 Confirmation Confirmation of positive results Sanger sequencing rtpcr assay Orthogonal NGS Potentially problematic for large gene panels Could be dropped on accumulation of data?

23 Conclusions QA begins at design stage Determine QA requirements during development Sample processing Data anlysis Validate at a technology level Do not blindly trust software analysis tools Understand the biological and technical issues Establishing QA is a considerable undertaking Consider longevity of utility Added benefit Hone QA to sufficiency