RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Application Note: RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP) Introduction: Innovations in DNA sequencing during the 21st century have revolutionized our ability to obtain nucleotide information at an ever decreasing cost. Illumina and BGI/Complete Genomics are predicting a human genome will cost as little as $100 within the next ten years, representing a 10-fold increase in sequence output per sequencer run. The technology continues to increase the number of reads per sequencer run and therefore the number of samples that can be multiplexed. However, labs are unable to keep pace with sequencing capacity, and sample preparation, specifically library preperation, is the bottleneck. The average list price of library prep today is $30-$50 (USD) per sample and typically does not include ancillary consumables such as that for DNA fragmentation and purification. The steps involved in the most widely used library prep protocols include: 1. Fragmentation, 2. End-repair, 3. A-tailing, 4. Adapter ligation, 5. Amplification and 6. Size selection. Typically, there are a number of clean-up steps involved and every step is performed on individual samples. An alternative approach using transposase mediated tagmentation combines Key Benefits: High throughput: Each kit produces up to 960 samples per day Simple workflow: 5 pipette steps per sample (no fragmentation, repair or ligation) High quality data: Tunable to diverse gc content Low cost: Contact igenomx for introductory pricing (info@igenomx.com) Table 1: Throughput. Technology Daily throughput (manual)* Daily throughput (automated) steps per sample Fragment/ligate 16-32 96 54 Transposase 96 384 26 igenomx 960 960 <5 * One technician fragmentation and adapter addition in a single step, followed by transposon removal, PCR amplification and size selection. Figure 1: Workflow of Riptide high throughput rapid library prep (HT-RLP)

A number of groups have adapted the Nextera technology from Illumina for use in plasmid and bacterial library prep for high throughput sequencing. This approach requires expensive liquid handling equipment, precise quantification of template: enzyme ratios, and results in a loss of chromosome or molecule ends prior to sequencing. While these teams have managed to reduce the reagent cost through volume reduction, the savings have come at the cost of sequence coverage and uniformity, limiting the actual ability for increased sample throughput in multiplexed sequencer runs due to the oversequencing penalty required for completeness and variant identification. igenomx High Throughput Rapid Library Prep (HT-RLP) matches the throughput and cost potential offered by the latest high throughput DNA sequencers. Study Design: Experimental Design 96 technical replicates of E. Coli strain MG6550 were used as the template. 50ng of isolated gdna was used per sample in a 10uL reaction in each well of a 96-well microtiter plate (Table 2). Workflow igenomx mastermix, including primers, NTPs, buffers and polymerase are added to each well. DNA is denatured and igenomx random primers are hybridized to template DNA. Primer construct includes: 5 - Illumina P5 adapter- 8bp sample barcode N12-3. A 30 minute primer extension reaction is performed (no cycling). Extended products are terminated using a small ratio of biotinylatedddntps to native NTPs. ddntp termination controls fragment length, reduces the potential for chimera formation or intramolecular interactions, and minimizes bias during random priming. Aliquots from each well are then pooled without normalization. The product pool is subjected to a bead cleanup to remove small products and unincorporated biotin-ddntps. Products from the 96 sample library pool are then captured on streptavidin coated beads, washed and converted into a dual-adapter library through a second strand displacing primer extension reaction. The primer that is bound closest to the bead displaces upstream products. All displaced product and excess reactants are washed away. PCR is used to amplify material and incorporate full length Illumina adapters. Final products from the 96 sample pool are size selected and ready for flow cell loading and sequencing. The full protocol takes approximately 3 hours with less than 20 minutes of hands-on time for all 96 samples. Multiple 96 sample plates can be processed in serial or parallel for up to 960 samples. There is no chemical or physical fragmentation, no end repair, no A-tailing, and no ligation steps involved. Individual samples do not need normalization prior to pooling and there are no timedependent reactions. An optional index barcode can be used for pooling multiple plates of 96 samples on a single flow cell. 2 x 300 paired end sequencing was performed using the Illumina MiSeq and v3 chemistry. The igenomx BaseSpace QC application was used for analysis. Library construct is 5 - P5 adapter sample barcode insert P7 adapter 3 (Figure 2). Table 2: Design of representative data. Organism Genome size E Coli MG6550 4.6Mb (Haploid) %GC 50% DNA input/replicate 50ng # of replicates 96 Sequencer/chemistry Illumina MiSeq/2x300 Figure 2: Schematic of library construct.

Results: Greater than 66% of bases sequenced were at a quality of Q30 or greater. On average, each sample received 483,756 reads with 90% of samples having read counts within a 4-fold range. This was achieved without normalization of individual sample yields (Figure 3). The mean depth of coverage per sample was 19x with a range of 8x to 47x. Median depth was 16x across all 96 samples with a average standard deviation of coverage of 12x. Reference coverage averaged 99.48% for all 96 samples with a range of 98.65% to 99.74%. Summary statistics are found in Table 3. Coverage uniformity is high in that each sample had greater than 90% of bases covered within 1 standard deviation of the mean (Figure 4). The prep contains reagents for both low and mid/ high GC organisms as well as an option for samples with unknown GC content. Optimized GC coverage of multiple organisms can be found below. All samples sequenced to a mean depth of approximately 20x result in greater than 99% genome coverage. Data on sequence coverage (Table 4) and GC performance (Figure 5-11) can be found in the following appendix. Table 3: Mapping and coverage summary statistics. Metric Average across 96 samples Range (low to high for all 96) % mapped 99.18% 98.98%-99.29% Mean coverage 19x 8x-47x Figure 4: Coverage histogram. Random downsampling to 100,000 reads, all 96 samples are plotted Discussion: High throughput multiplexed sequencing is now matched with high throughput multiplex library prep. The features include low sample input, high throughput, automationready workflow with dramatically reduced cost. There is no DNA fragmentation, end repair, A-tailing, ligation and the associated clean-up steps with each of these enzymatic processes. The technology allows for 96 samples processed in three hours and over 1,000 samples per day without the need for expensive capital equipment. The assay is automation friendly. The cost, throughput and performance may justify the conversion of many PCR and Sanger sequencing tests over to high throughput NGS technology. Median coverage 16x 3x-44x Stan. Dev. coverage 12x 4x-21x % at 1x 99.48% 98.65%-99.74% To learn more, or to purchase Riptide, visit igenomx.com/products % at 5x 95.09% 74.93%-99.21% % at 10x 77.73% 27.38%-98.69%

Appendix Table 4: Coverage for all samples. Sample/ Barcode % Duplication Mean Median STD Dev % 1x %5x %10x AT GC 1 1.34 28 25 14 99.66 98.99 96.73 0 3.39 2 2.12 28 26 14 99.68 99 96.94 0.04 2.39 3 1.6 33 31 16 99.74 99.21 98.34 0 2.89 4 1.20 13 12 9 99.5 93.69 64.74 0 4.57 5 1.21 18 15 11 99.54 97.19 81.42 0.05 4.66 6 1.73 18 16 12 99.53 96.88 81.03 0.01 5.69 7 0.72 9 8 5 99.15 81.13 34.29 0.23 1.01 8 1.22 12 10 8 99.37 91.38 56.92 0 3.62 9 1.75 30 26 17 99.65 98.94 96.45 0 4.75 11 0.98 13 11 8 99.39 93.2 63.52 0.01 3.64 12 1.21 14 12 8 99.48 94.02 67.21 0 2.88 13 1.60 16 13 11 99.23 93.55 71.94 0 5.57 14 1.68 20 17 12 99.64 97.7 86.66 0 4.6 15 1.71 26 22 17 99.58 98.28 92.15 0 5.46 16 0.91 18 16 9 99.65 98.21 86.79 0 2.98 17 2.19 18 15 12 99.44 95.91 78.1 0 4.93 18 1.40 18 15 13 99.52 96.04 78.39 0.04 7.82 19 2.91 47 44 21 99.68 99.2 98.69 0.07 1.68 20 2.01 27 25 4 99.68 98.99 96.18 0 3.11 21 1.51 15 14 9 99.52 96.33 75.76 0.01 2.79 22 1.26 22 18 15 99.59 97.88 87.44 0 6 23 1.13 13 11 8 99.45 93.09 61.97 0 3.74 24 1.92 32 27 20 99.68 98.93 96.3 0 5.7 25 1.15 16 14 10 99.59 96.99 80 0 4.15 26 1.88 22 8 16 99.48 96.84 83.72 0.05 6.83 27 1.27 11 9 7 99.24 87.73 48.31 0.06 2.62 28 1.88 26 23 15 99.59 98.7 94.26 0.01 3.9 29 1.65 26 23 15 99.59 98.68 94.58 0 4.64 30 1.04 24 21 13 99.64 98.65 93.56 0 4.72 31 1.36 16 14 11 99.5 95.47 73.93 0.03 6.61 32 1.79 25 21 17 99.56 98.46 91.73 0 5.3 33 0.96 16 4 9 99.55 97.07 78.76 0 2.98 34 1.27 16 3 13 99.34 92.79 67.83 0.23 8.13 35 0.87 11 9 9 99.02 82.73 44.51 0.1 7.66 36 2.08 22 19 13 99.62 98.46 91.86 0.06 7.8 37 1.25 15 12 13 99.46 91.68 64.09 0.04 7.46 38 1.48 26 3 14 99.56 98.75 95.12 0 3.54 39 1.43 31 28 16 99.68 99.09 97.53 0 3.8

Table 4: Coverage for all samples (continued). Sample/ Barcode % Duplication Mean Median STD Dev % 1x %5x %10x AT GC 40 1.45 19 16 13 99.55 96.54 80.62 0 7.1 41 1.11 15 13 11 99.49 94.23 69.16 0.04 6.63 42 1.01 26 21 19 99.51 97.67 88.57 0 4.3 43 1.52 18 15 12 99.57 96.44 78.53 0.11 5.45 44 1.18 12 10 8 99.34 90.75 56.77 0.01 5.66 45 0.91 8 7 5 98.82 74.93 27.38 1.31 0.34 46 1.15 16 13 11 99.44 93.93 69.41 0.2 6.73 47 0.90 13 11 9 99.46 93.3 63.18 0.01 5.74 48 1.31 19 17 11 99.55 98.12 88.07 0.01 2.83 49 1.12 11 9 8 99.32 87.79 49.99 0 4.87 50 1.34 20 17 13 99.6 97.57 85.06 0 5.68 51 0.99 14 12 9 99.39 94.15 67.16 0 5.14 52 1.34 17 14 11 99.56 95.91 76.64 0 5.8 53 1.06 8 7 5 98.65 76.3 30.08 0.08 2.27 54 2.20 29 25 18 99.63 98.78 94.64 0 5.23 55 1.57 28 25 16 99.65 98.78 95.45 0 5.47 56 1.99 25 21 17 99.61 98.27 90.5 0.01 5.21 57 1.67 23 20 13 99.53 98.28 91.89 0 4.33 58 1.82 24 20 17 99.61 98.15 89.25 0 6.63 59 1.72 22 19 13 99.52 97.97 89.5 0 5.33 60 1.77 21 18 13 99.55 97.76 86.63 0 5.1 61 1.72 23 20 13 99.59 98.49 92.53 0 4.42 62 1.50 31 27 17 99.61 98.87 96.84 0 5.09 63 1.81 15 13 10 99.4 94.52 70.81 0 4.36 64 2.02 29 5 19 99.59 98.7 94.76 0 5.45 65 1.32 27 25 14 99.62 98.91 96.22 0 4.06 66 1.36 19 16 14 99.48 95.9 78.71 0 7.33 67 1.68 23 19 16 99.48 97.32 86.32 0 7.43 68 1.50 15 12 12 99.3 92.5 66.79 0.16 7.83 69 1.11 12 10 10 99.15 87.02 53.37 0 7.87 70 1.07 12 10 8 99.2 88.3 52.21 0 5.87 71 0.70 9 8 7 98.93 80.54 38.31 0 5.99 72 1.32 14 12 10 99.44 94.11 67.63 0 4.39 73 1.72 16 14 11 99.39 95 73.79 0.01 5.34 74 1.68 21 19 12 99.61 98.34 90.01 0 4.53 75 1.46 16 14 10 99.26 94.82 75.44 0.02 3.61 76 0.95 16 14 8 99.5 97.03 80.57 0.39 1.24 77 0.88 12 10 7 99.31 90.96 57.12 1.28 0.54 78 1.31 13 11 9 99.43 92.86 62.77 0.04 5.46 79 2.21 26 22 16 99.67 98.59 93.2 0 4.65

Table 4: Coverage for all samples (continued). Sample/ Barcode % Duplication Mean Median STD Dev % 1x %5x %10x AT GC 80 1.04 11 9 7 99.21 88.24 49.69 1.6 0.53 81 1.95 27 24 16 99.57 98.6 94.39 0 4.2 82 1.19 21 19 11 99.55 98.48 92.13 0.01 2.66 83 1.84 26 25 12 99.63 98.86 96.4 0.05 1.97 84 1.40 12 10 9 99.29 87.87 52.54 0.07 7.1 85 1.28 18 15 11 99.51 97.04 80.8 0.02 6.34 86 0.95 17 15 10 99.56 97.52 81.68 0 2.86 87 1.59 16 14 9 99.59 96.51 75.93 0 4.76 88 1.21 10 9 7 99.19 84.4 42.92 0.03 4.09 89 1.76 21 19 12 99.6 98.35 90.3 0.02 3.3 90 0.84 16 14 10 99.53 96.99 78.94 0 4.09 91 1.75 19 16 12 99.61 97.58 83.57 0 5.77 92 0.83 17 15 9 99.55 97.38 82.35 0 3.3 93 1.02 18 17 9 99.62 98.23 88.1 0.01 2.95 94 2.21 34 31 16 99.68 99.05 97.93 0.02 1.85 95 2.27 20 18 13 99.48 97.36 85.69 0 3.47 96 1.76 15 12 10 99.39 93.5 67.81 0 5.22 Figure 5: Clostridium difficile (29.1%GC). Figure 6: Staphylococcus aureus (32.7%GC). Figure 7: Helicobacter pylori (39%GC). Figure 8: Escherichia coli (50.8%GC). Figure 9: Klebsiella pneumoniae (57.6%GC). Figure 10: Mycobacterium tuberculosis (65.6%GC). igenomx.com