Deep Sequencing QC: An Component Study of the FDA-led Sequencing Quality Control Project Phase 2 (SEQC2)

Size: px
Start display at page:

Download "Deep Sequencing QC: An Component Study of the FDA-led Sequencing Quality Control Project Phase 2 (SEQC2)"

Transcription

1 Deep Sequencing QC: An Component Study of the FDA-led Sequencing Quality Control Project Phase 2 (SEQC2) Joshua Xu, Ph.D. Division of Bioinformatics and Biostatistics, NCTR/FDA zhihua.xu@fda.hhs.gov World CDx, Boston, October 18-19,

2 Disclaimer This presentation reflects the views of the author and should not be construed to represent FDA s views or policies. 2

3 MAQC/SEQC Consortium Projects An Overview An FDA-led community-wide consortium effort to assess technical performance and application of emerging technologies (e.g., genomics). Accomplishments: Started at 2005 and completed 3 projects by 2014 Evaluated 3 genomics technologies: microarrays (MAQC 1 and 2), GWAS (MAQC2) and RNA-seq (MAQC3/SEQC1) Produced 28 peer-reviewed articles, of which 11 published in Nat Biotechnol (2 are among the top 10 most cited papers in the past 20 years) Supported the FDA development of the guidance document The new MAQC4 projects: SEQC2 for whole-genome sequencing (WGS) and target gene sequencing (TGS) 3

4 SEQC2: Topics and Working Groups WG1: Somatic mutation WG2: Deep targeted sequencing WG3: Germline variants WG4: Difficult genes WG5: Personal genome WG7: Epigenetics Targeted RNA-seq (WG#2) 4

5 Accurate and reliable diagnostic tests are a foundation of medicine. Tumor early detection (cell/cell free) Monitor mutation accumulation Deep Sequencing Tumor evolution Oncology drug resistance Different mechanisms of drug resistance One of them through mutations in drug targets or related genes (e.g., EGFR, KRAS, MEK1, etc) Intrinsic or developed during treatment? Critical to detect these drug resistance mutants prior to treatment, even it is within a subclone Close is not enough! 5

6 Issues and Study Objectives FDA approved PCR tests with sensitivity ~5% NGS clinical labs: 3~10% Publications claimed sensitivity for deep NGS tests could reach 1% and even 0.02% Comprehensive QC is the only way for translating Deep Sequencing from lab development to clinical application!!! Reproducibility, accuracy, sensitivity Quality metrics and recommendations 6

7 Overcome Sequencing Errors Sequencing errors Per base random error from PCR and sequencing in the range 0.1% - 1%, but not uniform UMI (Unique Molecular Identifier/Indexing) Indexing individual template molecules before PCR amplification and deep sequencing Build consensus among reads sharing same index to remove PCR or sequencing errors Add another layer of complexity ( and source of variation) 7

8 Study Design with 3 Phases Benchmarking Samples Comprehensive QC Quality Metrics To enable QC assessments, transparent performance evaluation, and quality metrics development Intra-site repeatability Cross-site cross-panel reproducibility Accuracy and limit of detection Impact of depth and bioinformatics Develop quality metrics Correlation with performance metrics Recommendation Germline reference materials: GIAB, College of American Pathologists, trios, No clinical relevant reference materials for cancer 8

9 Phase 1 Benchmarking Samples Design goals for the set of 6 samples Covering most cancer related genes: A representative subset of mutations for each gene Across clinically investigated levels of mutant allele fraction: through dilution Sample A: Pool of DNA from 10 diverse cancer cell lines Cell lines: easy to get large amounts of pure DNA Equal amounts of DNA to target mutation fraction: 5% (diploidic) KRAS Companion diagnostic device was approved for this sensitivity Pooling increases the number of mutations: focusing on SNVs and small indels ~3 somatic mutations and ~3 germline variants Sequencing individual cell lines to get ground truth 9

10 Collaboration on Sample Design 10 Agilent UHRR cell lines LIVER LIPOSARCOMA BRAIN SKIN BREAST TESTIS CERVIX T-LYMPHOCYTE B-LYMPHOCYTE MACROPHAGES Sample B shall be a normal sample for an individual to minimize mutation background and maximize homogeneity. Sample B - Agilent Male DNA Control (Product #: ) Sample A Sample C 1:2 Sample D 1:5 Dilution Sample E 1:25 Sample B Sample Mutant Allele Fraction Range A 2.5% - 15% C 1.25% - 7.5% D 0.5% - 3% E 0.1% - 0.6% Sample F F 0.02% % 1:125 10

11 Cell Line Datasets WGS WES TGS acgh ddpcr WGS1: 10X Genomics lib prep ~70x average coverage WES1: Roche NimbleGen Exome panel two libraries per sample ~60 million PE150 reads per library WES2: IDT Exome panel two libraries per sample ~60 million PE150 reads per library TGS1: IDT xgen Pan-cancer Comprehensive Panel of 127 genes 10ng DNA input one library per sample 6nt molecule barcodes over 20 million PE100 reads per sample includes deep sequencing data for Sample A Agilent GenetiSure Cancer Research CGH+SNP Microarray (planned) experiments finished data being analyzed Bio-Rad In contacting Objectives: To get a list of mutations (true positives) and a list of true negatives in Sample A for about 1000 cancerrelated genes WES3: Agilent Exome panel Q2 Solutions two libraries per sample ~150 million PE100 reads per library WES4: Thermo Fisher AmpliSeq Exome panel Ion Torrent sequencing One library per sample >200x mean coverage per library hg19 BAM and VCF files are uploaded TGS2: Agilent ClearSeq Comprehensive Panel of 151 genes 100ng DNA input no library replicate about 0.8 million PE100 reads per sample more sequencing under discussion to bring the read count up to 2 million per sample includes deep sequencing data (~3000x) for Sample A TGS3:Thermo Fisher s Oncomine AmpliSeq Comprehensive Cancer panel of more than 133 genes need more information on DNA input no library replicate for cell line samples about 200x coverage library replicates and deep sequencing are done for Sample A hg19 BAM and VCF files are uploaded WGS: Whole Genome Sequencing WES: Whole Exome Sequencing TGS: Targeted Gene Sequencing acgh: Array Comparative Genomic Hybridization ddpcr: Droplet Digital PCR 11

12 Phase 2 Study Plan Template Lab (3) Panel (?) Sample & Input Amount Combination (6) Library Preparation (4) Depth Bioinformatics Sequencing Run Agreement upon the protocol for each panel across labs Samples for Pan-cancer panels: 3 input amounts for A: abundant, medium, and low B, C, AcroMetrix controls: abundant input 6 th sample: AcroMetrix synthetic controls (Microgenics) at 5% MAF in Sample B 12

13 Pan-cancer Panels AGL1: Agilent ClearSeq Comprehensive Cancer Panel A_100 A_30 A_10 B_100 C_100 AC5_100 X X X X X X IDT1: IDT xgen Pan-Cancer Panel X X X X X X IGT1: igenetech AIOnco-seq X X X X X X ILM1: Illumina TruSight Tumor X X X X X X 170 QGN1: Qiagen Comprehensive X X B_30 C_30 AC5_30 cancer panel ROC1: Roche SeqCap EZ Choice X X X X X X PHC Panel TFS1: Thermo Fisher Oncomine Comprehensive Assay v3 X X B_30 AC5_30 13

14 Phase 2 Samples for Liquid Biopsy Panels Six samples: Bf, Df, Ef: 25ng 4 th sample: Ef with 10ng 5 th sample - Ep: 25ng Sample Ef after extraction from plasma (2.5ml) 6 th sample: Ef with 50ng or Acrometrix controls spiked into Sample B at 0.1% Centralized sample preparation at UTSW Enzymatic fragmentation -> better ligation efficiency Gel-based size selection (160bp- 180bp) to mimic cfdna 1ng/ul to mimic concentration after DNA extraction from plasma 40ng/ml in plasma (for 5 th sample) 14

15 Liquid Biopsy Panels IDT2: IDT xgen Non-Small Cell Lung Cancer ROC2: Roche AVENIO ctdna Expanded Kit TFS2: Thermo Fisher Oncomine Lung cfdna Assay Bf_25 Df_25 Ef_25 Ef_10 Ep_25 Ef_50 X X X X X X X X X X X X X X X X X Or AC01_50 Thermo Fisher is interested to test additional samples: AC01_50 when Ef_50 is included, Ff_100 15

16 Accugenomics Spike-ins for 32 Targets Plasmid sequences with dinucleotide mutations every 50-60bp to easily distinguish the spike-ins from native sequences Always wild type at the targeted location Spike-ins are about 400bp to 1000bp long with additional flanking plasmid sequences Spiking in at 1:1 equivalent genome copies before sample distribution Had multiple pilot studies with multiple panels for both gdna and ctdna testing samples For ctdna testing samples (Sample B-F), fragmented spike-ins will be added to ctdna samples to mimic real world application. 16

17 Study Planning Panel Provider Panels Test Site Recruitment Sequencing Run Agilent AGL1 Q2 Solutions, Cornell, SciLife HiSeq at the test sites IDT IDT1 & IDT2 AstraZeneca, ARUP, UNC NovaSeq at Illumina igenetech (China) IGT1 GenePlus, GeneIS, GeneSmile HiSeq at the test sites Illumina ILM1 Research Dx, Greenwood Genetic Center, Garvin Institute Qiagen QGN1 Mayo, Dana Faber, Mol Genetics Lab (Germany) Roche Sequencing ROC1 & ROC2 INGEMM, London Health Sci Centre (Canada), Baylor, Garvin Institute Research Dx, Elim Biopharm, Royal Marsden Hospital NovaSeq at Illumina NovaSeq at Illumina Pan-Cancer: NovaSeq at Ilm Liq_Bio: NextSeq at the test sites Thermo Fisher TFS1 & TFS2 OmniSeq & Mount Sinai, CGI Ion S5 at the test sites 17

18 Timeline Phase 1 (09/ /2017) Phase 2a (11/ /2017) Phase 2b (05/ /2017) Phase 3 (01/ /2018) Refine and finalize study design Finalize Reference Samples design Pilot studies (by the end of Apr 17) Full study: Planning (May 17 Sept 17) Execution: Oct 17 Dec 17 Clinical application, data analysis, QC metrics, recommendations, manuscripts July-Sept 2017: test sites, sample preparation pilots and SOP, prepare and update MTAs Oct 2017: panel-specific SOPs, sign MTAs, sample and reagents distribution Oct-Dec 2017: experiment execution 18

19 Phase 2 Data Analysis Plan Baseline analysis Pre-analysis: Separate reads of Accugenomics spike-ins from testing samples Run the panel vendor s recommended pipeline (fixed before analysis) Evaluate cross-lab and intra-lab reproducibility Evaluate accuracy and detection sensitivity by comparing with ground truth Exploratory analysis Use Accugenomics spike-ins to infer MAF confidence intervals Performance assessment through AcroMetrix controls Down-sampling reads to evaluate the effect of seq depth For ctdna, trimming reads to evaluate the benefit of longer sequencing Explore other bioinformatics pipelines and compare the performance 19

20 Phase 3: Quality Metrics Pre-sequencing quality checks DNA samples library preparation, library yield, library loading, etc Quality metrics for data analysis Variants/mutation types and sequence context Coverage, Bioinformatics Orthogonal methods, e.g. digital droplet PCR (ddpcr) 20

21 SEQC2/WG#2: Deep Sequencing QC 1:2 1:5 1:25 1:125 >180 members Panel vendors Agilent IDT igenetech Illumina Qiagen Roche Thermo Fisher Spike-ins Accugenomics AcroMetrix Sequin FFPE U of Toledo Agilent Horizon 21

22 Interested to Join Our Study? Please contact Joshua Xu at Reference Samples Technical Reproducibility Transparent Assessment 22

23 Acknowledgements SEQC2 Consortium Particularly WG#2 participants Special thanks to Dr. Weida Tong (NCTR/FDA) Dr. Leming Shi (Fudan University) Dr. Don Johann (UAMS) Dr. Wendell Jones (Q2 Solutions) Dr. Quan-Zhen Li (UTSW) Dr. Jim Willey (Univ of Toledo) 23