Optimizing an NGS Assay for HBV Drug Resistance on the Illumina MiSeq

Size: px
Start display at page:

Download "Optimizing an NGS Assay for HBV Drug Resistance on the Illumina MiSeq"

Transcription

1 Optimizing an NGS Assay for HBV Drug Resistance on the Illumina MiSeq Authors: G Ritchie 1,2, M Payne 1,2, L Merrick 1, T Bush 1, C Lowe 1,2 1. Division of Microbiology and Virology, Providence Health Care, Vancouver 2. Department of Pathology and Laboratory Medicine, University of British Columbia 1

2 Introduction Hepatitis B virus (HBV) is a major cause of chronic hepatitis and hepatocellular carcinoma worldwide. antiviral drugs have been developed which are very effective. However, mutations may occur resulting in antiviral drug resistance (AVDR). AVDR detection important switch to more effective drug. We previously developed a NGS test on the Roche 454. This assay has been transitioned to the Illumina MiSeq. Our test consists of amplicon sequencing of multiple samples on each sequencing run. 2

3 Mutations in HBV that confer resistance to antiviral drugs European Association for the Study of the Liver Mohanty SR et al Nat Clin Pract Gastroenterol Hepatol 3: AVDR mutations can be detected by sequencing a 400bp amplicon from the polymerase gene, which contains all the codons which are considered to be important for AVDR according to EASL guidelines. 3

4 Illumina Sequencing HBV amplicon Adaptor sequences Adaptor sequences Flow Cell Illumina technology. The amplicon DNA to be sequenced is shown. Adaptor sequences are added to the DNA to be sequenced, by including these sequences in the PCR primers. The sequencing flow cell contains a Lawn of probes which bind the P7 P5 adaptor sequences of the DNA. Then these probes act as primers to amplify the DNA to form a cluster of about 1000 identical DNA molecules. These clusters are sequenced by synthesis using Fluorescent-lableled dntps. Each sequencing cycle adds another Fl-dNTP and a picture is taken to identify the dntp added to each cluster. 4

5 Amplicon Sequencing of multiple samples Amplicon sequencing can be difficult on the Miseq since reads look very similar from sample to sample Difficulty in distinguishing one cluster from another Errors in base-calling and sample identification result Cluster density Diversity Indexing The critical factors to consider in designing the assay are diversity, cluster density and indexing of samples. 5

6 Cluster Density The first factor is cluster density. Too few clusters may result in higher quality reads, but wasting a lot of potential sequence reads. And the point of NGS is to generate a lot of data. Too many clusters and the quality goes down, errors in base calling and for multiple samples, sample mis-assignment may occur Trick is to find a happy medium. And this may be different depending on the type of experiment being done 6

7 Diversity Diverse/balanced ACGT present at similar % Low diversity Mostly single base per cycle - amplicons Here are examples of a highly diverse library and a low div library. These circles represent the clusters on the flow cell being sequenced. In the diverse library, in each sequencing cycle there is a similar number of all 4 bases being read. In a low diversity library such as in our amplicon library, in each cycle most clusters will have the same base read. Diversity is important in the first few cycles of the first read, since the software calibrates the run and determines the location of each cluster during these cycles If 2 clusters overlap and have the same sequence, then the software may think that there is only one cluster present and may mix up the samples and cause high base calling error rates. 7

8 PhiX addition to increase diversity Randomly cut DNA phage library diverse, well balanced PhiX needed for Added diversity Setup/calibration of run, error rate calculations Cluster identification Illumina recommended 10% for amplicon sequencing Some of this diversity problem can be fixed by adding PhiX to the run. PhiX is a DNA phage that has been randomly cut to produce a very diverse balanced library. This is then added to the amplicons to add diversity to the run. 8

9 HBV Drug Resistance Sequencing on Miseq Polymerase gene (820bp target) Nested PCR (400bp) J.Clin.Micro 2016, 54(1, ) Our PCR design was modified from our successful 454 ngs design. Which we published in JCM, we did a nested PCR. Where the first PCR amplified an 800 bp region of the pol gene with conventional primers, then several cycles of nested PCR amplify 400bp and add the adaptor sequences necessary for the MiSeq. The primers also contain the indexes which are needed to identify the sample from which the reads come from. 9

10 HBV amplicon HBV specific HBV Amplicon This is our final product that goes on the sequencer with all the adaptor sequences attached. There are 4 sequencing reads for every cluster. The first read is the insert in the forward direction through the HBV specific part of the primer and continuing 250 bp through the insert. The second and third reads are the 8bp indexes which identify the sample The final read sequences the insert in the reverse direction. 10

11 Dual Indexing Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 For1 Rev1 For1 Rev2 For1 Rev3 For2 Rev1 For2 Rev2 For2 Rev3 For3 Rev1 For3 Rev2 For3 Rev3 We used dual indexing to identify the samples. Using unique combinations of For and Rev indexes to identify samples from the index reads. In this way the same For index was used for multiple samples. Likewise for the Rev indexes. But the combination of For and Rev is unique for each sample. 11

12 First design Conditions illumina recommended for amplicon sequencing Dual indexes 10% PhiX 900 K clusters/mm 2 Positive control: plasmid HBV sequence (no sequence variation) Each run contained 20 patient samples with various AVDR mutations We first used the Illumina recommended conditions for amplicon sequencing. 12

13 Results The following errors were seen: reads were assigned to the negative control even though it was amplification negative and was not added to the run. This could all be attributed to sample mis-assignment where the indexes get misread. 2. Error rates in the plasmid control were very high. Also due to misassignment and base calling errors. The results were rather poor to begin with. We had a lot of sample mis-assignment and base calling errors. 13

14 High Error rates in plasmid control using Illumina recommended conditions. Codon L180 M204 N236 Error rate (%) This table shows some of the errors that we saw in the plasmid control. Just looking at 3 codons in the sequence. 3 codons that we better get right because they are codons involved in resistance. The error rates were well over 1%. When our aim is to call mutations down to the 1% level, error rates over 1% are clearly unacceptable. By comparison, our previous assay on the Roche454 had error rates below 0.1% for these codons. 14

15 Problems are Soluble. - Steven Pinker, Enlightenment Now - Cluster Density - Diversity - Indexing we tried to solve this problem by modifying the 3 critical factors: cluster density, diversity and indexing. 15

16 Reduce Cluster Density Density reduced from 900K clusters/mm2 to 400K 900K density might be fine for certain applications, but may be too high for amplicon sequencing. The simplest solution is to reduce the cluster density. So we reduced the density from 900K to 400K. Even at 400K, we would still have plenty of reads per sample. If the density is lower this should reduce the amt of overlapping clusters and improve the results. 16

17 HBV amplicon sequencing with barcodes HBV specific BC BC Next, we added barcodes to the primers. In between the sequencing primer site and the HBV target specific site. In this way, the first 4 bases read in each insert read are unique for each sample. Now If there is any discrepancy between these bc reads and the indexes, the data from that cluster can be thrown out of the analysis. 17

18 Adding barcodes of 0-4 bp Barcodes result from 0-4 bp added at beginning of insert read Phased or Staggered reads increased diversity HBV specific Barcode insert BMC Microbiol. 2015; 15: 125. The additional wrinkle is that not all inserts are 4 bp. They are from 0-4 bp. If we add a 4bp insert, shown in red, then the bc is those 4 bases. If there is no insert, then the bc is the first 4 bases of the HBV read. This results in phased or staggered reads of the amplicon and increased diversity. A similar idea was published in 2015 looking at 16S amplicon sequencing. 18

19 Effect of adding Barcodes to reads Without the bc added we had a situation where all the insert reads are read in the same phase or reading frame and look very similar with little diversity. When we add a variable number of bases to the beginning of the reads then the reads get staggered and we have greater diversity. Especially in the beginning of the read which is important for identifying clusters. In the first read, lets say it is sample 11, no bases were added. The first 4 bases of the HBV sequ identify the sample. In the second read, 4 bases were added, and these 4 bases identify the sample. In this way the first 4 bases are unique for each sample, and we have added diversity throughout the read 19

20 Unique Dual Indexing dual index unique dual index Sample 1 For1 Rev1 For1 Rev1 Sample 2 For1 Rev2 For2 Rev2 Sample 3 For1 Rev3 For3 Rev3 Sample 4 For2 Rev1 For4 Rev4 Sample 5 For2 Rev2 For5 Rev5 Sample 6 For2 Rev3 For6 Rev6 Sample 7 For3 Rev1 For7 Rev7 Sample 8 For3 Rev2 For8 Rev8 Sample 9 For3 Rev3 For9 Rev9 The third thing we did was use unique dual indexing. Originally we used unique combinations of For and Rev indexes to identify samples from the index reads. But if one index is misread, there is a potential for sample misassignment. of the reads to the wrong sample. Now use both unique For indexes and Rev indexes. So that this mis-assignment risk is minimized. 20

21 Increase PhiX concentration in library Illumina recommended 10% We increased it to 33% PhiX needed for Added diversity Calibration of run Cluster identification Lastly, we increased the diversity of the library by adding more phix to the run. Illumina recommended adding 10%, We increased it to 33%. Since PhiX adds diversity, aids in calibrating the run, and helps in cluster identification, we decided adding more would be beneficial. In fact, previous versions of the Miseq software recommended going as high as 50% PhiX for amplicon sequencing. 21

22 Error rates in plasmid control after assay optimization. Non-optimized assay Optimized and 4bp-Barcode Filtered Codon L180 M204 N236 L180 M204 N236 error rate (%) How did these changes affect the results. Here are the results before and after optimization. And the error rates have been dramatically reduced. Some of this may be reducing mis-assignment of samples and some related to improving read quality. Now the error rates are well below 0.2%. 22

23 Discussion After assay optimization, low level mutations representing as little as 1% of the viral load could be reliably determined since the error rates were <0.2%. Lower cluster density and higher PhiX reduces the number of sample reads. However, even reads per sample will result in 100 mutation reads at 1% mutation load. There is added expense for unique dual indexes ($80/primer). (10 primers vs 50 primers for 25 samples) However, these primers can be used for multiple runs. With error rates now below 0.2%, we can confidently call mutations as low as 1% of the viral load.. The only drawbacks to our optimization is 1. Our optimization technique does reduce the number of usable reads from a run. However, we still average more than clusters per sample. At that level a mutation present at 1% will be represented by 100 reads. 2. The unique dual index primer expense is higher. For example, for 25 samples we would only need 10 primers for dual indexing, but we would need 50 primers for unique dual indexing. However, the primers can be used for multiple runs so amortized over the life of the instrument, the cost per sample is not much. 23

24 Conclusion: Problems are Soluble NGS amplicon sequencing can be an excellent assay for viral drug resistance testing. Care must be taken to optimize assay on Miseq After optimization our assay was performing very well. I didn t discuss the validation, but I can assure you that the validation data looked very good as well. NGS sequencing is ideal for viral drug resistance testing since the sensitivity and specificity of detection is far better than sanger or innolipa. 24