Quality Control of High Throughput EST & SAGE Sequence Production

Size: px
Start display at page:

Download "Quality Control of High Throughput EST & SAGE Sequence Production"

Transcription

1 European Spotfire Users Conference - Versailles, April 25-26, 2002 Quality Control of High Throughput EST & SAGE Sequence Production Marcel de Leeuw IT director

2 Summary I. Application field II. Why quality control? III. EST example IV. SAGE example V. Decision making 2 Quality Control of High Throughput EST & SAGE Sequencing Production

3 I. Application field Genomic resources for gene expression analysis Expressed Sequence Tags (ESTs) Direct gene expression analysis Serial Analysis of Gene Expression (SAGE) Discriminative Analysis of Clone Signature (DACS) 3 Quality Control of High Throughput EST & SAGE Sequencing Production

4 I. Application field Expressed DNA T7 Insert SP6 Plasmid vector with insert Plasmid vector Hosts containing both host DNA & plasmid DNA Resistant colonies Growth medium with anti-biotic Analysis & Decision making 384 Amplified DNA 384 Customer GENOME express Wet Dry DNA extraction Sequencing reaction Sequencing Trimming, Masking, QC /384 96/384 96/ Quality Control of High Throughput EST & SAGE Sequencing Production

5 I. Application field T7 SP6 5prime vector EST insert 3prime vector T7 AACGTCTACCAAAAAAAAAAAAAAAA TTGCAGATGGTTTTTTTTTTTTTTTT SP6 Mis-oriented insert featuring a polyt 5 Quality Control of High Throughput EST & SAGE Sequencing Production

6 Summary I. Application field II. Why quality control and how? III. EST example IV. SAGE example V. Decision making 6 Quality Control of High Throughput EST & SAGE Sequencing Production

7 II. Why quality control and how? Keep the production process going spot equipment degradation before impact reject bad reagent lots Make the business model possible decision making : test plates before HT sequencing outsourcing : both customer and sub-contractor need to closely survey quality volume: small efficiency improvements yield 7 Quality Control of High Throughput EST & SAGE Sequencing Production

8 II. Why quality control and how? Process measurements dosage signal characteristics Basecaller output confidence in base call Knowledge about DNA contents vector parts in case of cloned DNA regularly spaced tags 8 Quality Control of High Throughput EST & SAGE Sequencing Production

9 II. Why quality control and how? Strong correlations of content with process QC GC or AT rich DNA needs sequencing reaction tuning hairpins & other primary structures stretches of A s in gene transcripts (ESTs) contamination from neighboring clones or previous plates Data mining issues volume complexity variation of libraries Need for an Advanced & Interactive Data Mining Application 9 Quality Control of High Throughput EST & SAGE Sequencing Production

10 Summary I. Application field II. Why quality control and how? III. EST sequencing IV. SAGE sequencing V. Decision making 10 Quality Control of High Throughput EST & SAGE Sequencing Production

11 IV. EST sequencing dashboard provides overview allowing to detect most common QC problems source source plate plate bias bias 5 5 vector vector masking masking coherence coherence details-on-demand (HTML) allowing inspection of individual reads read read length length histogram histogram machine machine bias bias global global pie pie 11 Quality Control of High Throughput EST & SAGE Sequencing Production

12 IV. EST sequencing main pane in-depth view of QC accept/reject decision QC QC rejected rejected sequence sequence QC QC passed passed sequence sequence sequencing sequencing difficulty difficulty Main 1000 tilt tilt & & size size indicate indicate 800 double double clones clones empty empty inserts inserts long long poly poly T T bad bad quality quality sequence sequence 12 Quality Control of High Throughput EST & SAGE Sequencing Production tlen long long poly poly A A

13 IV. EST sequencing QC inspection example bias in 384 well view 13 Quality Control of High Throughput EST & SAGE Sequencing Production

14 IV. EST sequencing QC inspection example (contd.) dbc (quality ratio before/past 5 vector end) vs. clone id quad - 1 quad quad - 3 quad vector end rnt.007a01 rnt.007o07 rnt.008m06 rnt.009k12rnt.010i19rnt.011g22 rnt.012e22 rnt.007a01 rnt.007o07 rnt.008m06 rnt.009k12rnt.010i19rnt.011g22 rnt.012e22 clone 14 Quality Control of High Throughput EST & SAGE Sequencing Production

15 IV. EST sequencing QC example : Trimming plot reveals plate bias, despite strong machine influence % low Q bases vs. length of trimmed sequence number of HQ reads strongly varies plate - 7 plate - 8 plate plate - 10 plate - 11 plate tlen plate 15 Quality Control of High Throughput EST & SAGE Sequencing Production

16 Summary I. Company & application field II. Why quality control and how? III. Approach IV. EST sequencing V. SAGE sequencing VI. Decision making 16 Quality Control of High Throughput EST & SAGE Sequencing Production

17 V. SAGE sequencing principle of operation base call, quality trimming vector mask enzyme site matching di-tag extraction >lsw6.001h06f (m13.f1) 225 (46, 270) 3.7%N >lsw6.001h06f (m13.f1) 225 (46, 270) 3.7%N ngxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcatgcc ngxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcatgcc AGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGA AGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGA GGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATT GGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATT AAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCG AAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCG TGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCC TGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCC CGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATG CGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATG ACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA ACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA SAGE concatemer <CATG> <CATG> <CATG> 5prime vector enzyme site di-tag rank 1 di-tag rank N 3prime vector 17 Quality Control of High Throughput EST & SAGE Sequencing Production

18 V. SAGE sequencing use of enzyme restriction site recognition provides indication of true base calling errors 20 di-tag rank in sequence di-tag start position in sequence (bp) 18 Quality Control of High Throughput EST & SAGE Sequencing Production

19 V. SAGE sequencing other specific SAGE QC criteria average phred quality vs. rank distribution of di-tag length di-tag length di-tag quality 500 di-tag average quality di-tag rank Quality Control of High Throughput EST & SAGE Sequencing Production

20 Summary I. Company & application field II. Why quality control and how? III. Approach IV. EST sequencing V. SAGE sequencing VI. Decision making 20 Quality Control of High Throughput EST & SAGE Sequencing Production

21 VI. Decision making Decision processes library quality library contents process quality Generate Expression Library Adapt Conditions Diagnostics insufficient Normalize and/or Subtract (EST) OK Check Library Check Quality Library Quality Insert size PolyA size (Redundancy) Create Library Lot or Test Plate Analyze Library Contents Customer 3prime Sequencing Demand for (EST) Clones of Interest GENOME express Extract Sequence Post-process OK Check Process Check Quality Process Quality Success rate Plate bias Machine bias Reprocess Selected Plates or Clones insufficient Diagnostics 21 Quality Control of High Throughput EST & SAGE Sequencing Production