Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells

Size: px
Start display at page:

Download "Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells"

Transcription

1 SUPPLEMENTARY Brief Communication INFORMATION In the format provided by the authors and unedited. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells 1,2,3 Mridusmita Saikia, Philip Burnham 1,3, Sara H. Keshavjee 1, Michael F. Z. Wang 1, Michael Heyang 1, Pablo Moral-Lopez 2, Meleana M. Hinchman 2, Charles G. Danko 2, John S. L. Parker 2 and Iwijn De Vlaminck 1 * 1 Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA. 2 Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA. 3 These authors contributed equally: Mridusmita Saikia, Philip Burnham. * vlaminck@cornell.edu Nature Methods

2 Supplementary Figure 1 Protocol for converting Drop-seq primer beads to DART-seq primer beads. Double-stranded toehold probes with a poly-(da) ssdna overhang are annealed to a subset of oligos on the surface of Drop-seq primer beads (left). The toehold is ligated to the bead with T4 DNA ligase. The complementary toehold strand is removed after ligation via heating. DART-seq beads (right) then contain custom primers as well as oligo(dt) tails from the original Drop -seq bead.

3 Supplementary Figure 2 A fluorescence hybridization assay characterizes the efficiency and tunability of the DART-seq ligation reaction. a) Assay design to test the efficiency of custom primer ligation to Drop-seq primer beads: (1) DART-seq beads are created withthe addition of custom primers at various concentrations, (2) oligos complementary to the custom primer, and labeled with Cy5, are hybridized to the DART-seq beads, and (3) the fluorescence signal of 3,000 beads is measured with a Qubit 3.0 fluorometer. b) Fluorescence signal as a function of the quantity of Cy5 oligos.

4 Supplementary Figure 3 DART-seq is reproducible between biological replicates. Per-base sequence coverage of viral genome segments relative to the number of host transcripts detected, measured by DART-seq and Drop-seq, is shown. Dotted lines indicate the position of the custom primer target sites for DART-seq design 1, and bolded boxes represent the addition of custom primers from DART-seq design 2 (see main text). No viral s equences were detected in a noninfected cell line.

5 Supplementary Figure 4 DART-seq does not significantly alter the detection of poly(a)-tailed mrna. (a-b) Number of UMIs (a) and unique genes (b) detected as a function of cell rank for DART-seq and Drop-seq for the same sample (CD19 + B cells). (c-d) Violin plots of the number of UMIs (c) and genes detected (d) in DART-seq and Drop-seq assays for the same sample (CD19 + B cells). To make these comparisons, we sampled both datasets to the same number of raw sequences ( ). The violin plots show the probability density distributions of the respective variables.

6 Supplementary Figure 5 DART-seq outperforms Drop-seq in the detection of heavy- and light-chain sequences for B cells within human PBMCs. Sigmoidal plots indicate the percentage of B cells for which heavy- and/or light-chain transcripts were detected as a function of the UMI count per cell. Cells were binned by the number of UMIs detected (bin width 200 UMIs, 0 2,400 UMIs per cell, bins with fewer than 20 cells omitted, 26 2,396 cells per bin). Distributions were fit with a sigmoid curve (described in Methods).

7 Supplementary Figure 6 DART-seq captures a diverse variable region in immune repertoires of B cells. Comparison of variable isoforms detected with DART-seq (164 cells, PBMC dataset) within Ig heavy and Ig light variable regions. Each column represents a separate, detected variable subtype, normalized by the total number of variable regions detected with respect to light or heavy chains.

8 Supplementary Figure 7 VH and VL expression in single B cells is correlated. The fraction of heavy chain (VH) transcripts versus the fraction of variable light chain (VL) transcripts detected in single cells is depicted (PBMC dataset, B cells for which the complete CDR3L and CDR3H region was detected, n = 120). The blue line represents the best fit from linear regression (shaded area represent 95% confidence interval). Pearson correlation (P << ).