Supplementary Information Supplementary Figures

Size: px
Start display at page:

Download "Supplementary Information Supplementary Figures"

Transcription

1 Supplementary Information Supplementary Figures Figure S. The number of reads mapped to the and models for 76 human plasmablasts (AW-AW dataset) using bowtie reconstructed from (A) (B) (C) IMGT_mapped and (D) Recombinome_mapped methods. Also shown are the ratios of the number of reads mapped for the top two and models for each method. The dashed line indicates a two-fold ratio. The median is shown as a red line.

2 A 6 6 Rank of Heavy chain model (n=76) (n=76) Ig chains (n=7) (n=76) Ig chains IMGT_mapped 7 7 Rank of Light chain model Rank of Heavy chain model C Ratio of number of reads mapped between the top two models Rank of Light chain model Rank of Heavy chain model D (n=76) (n=76) (n=76) (n=76) Ig chains Recombinome_mapped Rank of Heavy chain model Ratio of number of reads mapped between the top two models 7 Rank of Light chain model B 7 Ratio of number of reads mapped between the top two models 6 7 Ratio of number of reads mapped between the top two models 7 6 Rank of Light chain model 7 6 Ig chains Figure S

3 A +Unmapped Recombinome_mapped IMGT_mapped 9 reconstructed % Productive chains Ig chains B C (n = ) n= n= n= (n = ) n=9 n=9 Number of % Sequence % Nucleotide identity mismatches coverage n=8 n=9 n= U er ilt nf U er IG ilt nf U Reconstruction Method _m nm + ap ap pe pe d d IG _m ap pe R e d m co ap m pe bi d no m e_ IM G T_ m ap pe d _m nm + ap ap pe pe d d IG _m ap pe R d m eco ap m pe bi d no m e_ IM G T_ m ap pe d U ed Number of gaps ed Number of gaps n=8 IG Number of % Sequence % Nucleotide identity mismatches coverage n= Reconstruction Method Figure S. Recovery and accuracy of BALDR reconstruction. (A) The recovery rate of productive Ig chains by bioinformatic reconstruction shown as a percentage of all analyzed cells (76 plasmablasts) for (heavy) and (light) chains reconstructed from (AW-AW dataset). (B) Comparison of reconstructed transcripts with PCR nucleotide sequence by blastn (top high scoring segment pair was selected). The number of reconstructed chains used for nt alignment for each method is indicated on the top. Sequence coverage calculated as: alignment length x / length of PCR sequence. (C) blastn results for alignment of accurately reconstructed to PCR. Black horizontal lines in (B) and (C) indicate conglomerate of data points at % or.

4 Elapsed time for Trinity Assembly (seconds) s + Unmapped Reconstruction Method Recombinome_ mapped IMGT_mapped Figure S. Elapsed time in seconds for Trinity assembly of 76 plasmablasts (AW-AW dataset). The assemblies were run by executing 8 jobs simultaneously, each with 8 threads and GB RAM on Amazon Web Services EC m.6xlarge instances (Intel Xeon E-676 v, 6 cores and 6GB RAM. The median is shown as a red line. The time reported by the bash time command is shown.

5 A B No Normalization Normalization % Accuracy of VDJ & CDR assignment compared to PCR Unmapped Recombinome_ mapped Reconstruction Method IMGT_mapped % Accuracy of VJ & CDR assignment compared to PCR Unmapped Recombinome_ mapped Reconstruction Method IMGT_mapped Figure S. Effect of in silico read normalization on Ig reconstruction accuracy in human plasmablasts (AW-AW dataset). Trinity assembly was carried out with and without normalization. Accuracy of Ig reconstruction of the (A) and (B) was determined by comparison to the and sequences obtained from nested RT-PCR and Sanger sequencing.

6 A B C 7 6 Rank of Heavy chain model 7 6 Rank of Light chain model Ratio of number of reads mapped between the top two models 7 6 Ig chains Figure S. A single dominant transcript model for and also obtained in conventional human CD9+ B cells The number of reads mapped to the reconstructed models using bowtie for (A) and (B) chains and the ratio of top two models for and (C). The dashed line indicates a two-fold ratio. The median is shown as a red line. 6

7 A Recovery Filter-Non-IG +Unmapped Productive Chain reconstructed (%) n= n= s Both & n= n= GC B cells Both & n= 9 n= Memory B cells Both & B Accuracy %Accuracy of VDJ/VJ & CDR assignment compared to PCR n = n=7 n = n= +Unmapped s GC B cells C Computational Time Elapsed time for Trinity Assembly (seconds) Filter-Non-IG + Unmapped Filter-Non-IG + Unmapped Filter-Non-IG + Unmapped s GC B cells Memory B cells Figure S6. Recovery and accuracy of Ig transcript reconstruction in rhesus macaques. (A) The percentage of productive chains reconstructed using the four different methods. (B) Concordance of V(D)J gene annotation and CDR nucleotide sequence of Ig transcripts obtained from, +Unmapped and methods with nested RT-PCR sequences for plasmablasts and GC B cells. (C) Elapsed time in seconds for Trinity assembly. The assemblies were run by executing jobs simultaneously, each with 8 threads and GB RAM on a local Dell PowerEdge R6 Server (Intel Xeon E-6 v, 6 cores/ threads, 96 GB RAM). The time reported by the bash time command is shown. 7

8 Supplementary Tables Supplementary Table S. Datasets used in the study. Dataset Species Cell Type Num of cells SE/ PE Read Length Average Depth (million) AW* Human s 86 PE.86 (.6-.9) AW- Human s 76 SE.6 AW (.-.) VH** Human CD9+ 6 PE 76.7 (.-.9) BL6.* Rhesus Germinal SE.7 Center (.8-.) BL6.* Rhesus Memory SE.8 (.-.) BL8 Rhesus s PE.9 (.-.8) Sequencer Illumina HiSeq Illumina HiSeq Illumina MiSeq Illumina HiSeq Illumina HiSeq Illumina HiSeq Nested RT-PCR Heavy Light Both heavy & light from the same cell 96 # # 6 8 NA NA NA 7 8 *Datasets sequenced twice to get greater depth. The average depth reported is after combining the two runs **VH dataset single cells were sequenced by a modified SMARTer protocol while the 6 single cells were sequenced by the conventional SMARTer protocol (see Methods) # The nested RT-PCR sequences were obtained from the amplified SMART-Seq v library 8

9 Supplementary Table S. Accuracy of Ig reconstruction for AW-AW human plasmablast dataset Method Productive chains Concordance of V(D)J annotation & CDR sequence with nested RT-PCR sequences Heavy (n=76) Light (n=76) Heavy (n=) Light (n=) Both heavy & light from the same cell (n=96) (99.%) 9 (98.9%) +Unmapped (99.%) 9 (99.%) 9 (97.9%) Recombinome_mapped (96.%) 9 (99.%) 9 (9.8%) IMGT_mapped (97.7%) 76 (9.8%) (89.6%) (99.%) 9 (99.%) (9.7%) 8 (88.%) 9

10 Supplementary Table S. Accuracy of Ig reconstruction for AW-AW human plasmablast dataset considering clonotypes. Method Concordance of V(D)J annotation & CDR sequence with nested RT- PCR sequences $ Heavy (n=67) Light (n=8) Both heavy & light from the same cell (n=8) Yes No Yes/ No* Yes No Yes/ No* Yes No Yes/ No* (98.8%) (98.%) Unmapped (98.%) (98.8%) (96.6%) Recombinome_ 6 79 mapped (9.%) (98.8%) (9.%) IMGT_mapped 6 79 (9%) 8 (86.6%) (98.8%) 6 79 (98.8%) (9.%) 9 (8.%) 6 $ The cells are collapsed into clonal families and the accuracy is determined for each family. * Cases where all members in a clonal family did not have the same V(D)J and CDR as the corresponding PCR sequences

11 Supplementary Table S. Accuracy of Ig reconstruction for CD9+Lin- B cells (VH dataset). Method Productive chains Heavy (n=6) 6 +Unmapped 6 Recombinome_mapped (97.%) IMGT_mapped (9.%) (9.%) Light (n=6) Concordance of V(D)J annotation & CDR sequence with nested RT-PCR sequences Heavy Light Both heavy & (n=) (n=) light from the same cell (n=6) (96.8%) (96.%) (96.8%) (96.) 9 (9.%) (9.) 8 (9.%) 8 (9.%) (9.) (9.)

12 Supplementary Table S. Accuracy of Ig reconstruction for the BALDR pipeline (+Unmapped) compared to the BASIC method Dataset AW- AW AW- AW Cell Type CD7hiCD8hi CD7hiCD8hi Read length VH CD9 + Lin- PE 76 AW AW AW AW AW AW CD7hiCD8hi CD7hiCD8hi CD7hiCD8hi CD7hiCD8hi CD7hiCD8hi CD7hiCD8hi Concordance of V(D)J annotation & CDR sequence with nested RT-PCR sequences Both heavy & Heavy chain Light chain light from the same cell BASIC BALDR BASIC BALDR BASIC BALDR SE 6/ / (%) (99.%) 7/ / SE (9%) (98.%) 9/ / (9.%) (96.8%) SE / / (97%) SE 7 9/ / (8.%) SE 8/ / (8.%) PE 9/ / (8.% ) PE 7 6/ / PE (76.%) / (7.%) / 8/ / 76/ (.%) (98.6%) 6/ 9/ (97.%) (99.%) / / / / / 9/ / (9.%) 9/ / (9.%) 9/ 9/ (9.%) (9.%) / / /96 (7.%) 87/96 (9.6%) /6 (9.) / (9.7%) / (87.%) 8/ (7%) 9/ (79.%) 7/ (7.8%) 7/ (7.8%) The comparison between BALDR and BASIC was carried out for the AW_AW (76 cells; PCR available: and ), VH (6 cells; PCR available: and ) and AW(86 cells; PCR available: and ). The number of productive reconstructed Ig chains having the same V(D)J & CDR assignment as the corresponding PCR sequence are indicated out of the total number of available PCR sequences. 9/96 (97.9%) 9/96 (96.8%) /6 (96.%) / / / / / /

13 Supplementary Table S6. Accuracy of Ig reconstruction for AW human plasmablast datasets. Chain Method Paired end Single end 7 7 Heavy (n=) Light (n=) Both heavy & light from the same cell (n=) +Unmapped Recombinome_mapped (9.%) IMGT_mapped (97.%) +Unmapped Recombinome_mapped IMGT_mapped 9 (9.%) +Unmapped Recombinome_mapped (8.%) IMGT_mapped (9.7%) (9.%) (97.%) (97.%) 9 (9.%) (9.7%) (9.7%) (7.6%) (9.%) 8 (.9%) (78%) (9.7%) (.8%) (.8%) (97.%) (97.%) (6.8%) 9 (9.%) (9.7%) (9.7%) (6.%) (88.%) (97.%) (.7%) 8 (9.7%) 7 (9.%) (87.%) (9.7%) (.%) (.9%) (.%) (%) (8.%) 8 (9.7%) (.7%) (.8%) 7 (9.%) (%)

14 Supplementary Table S7. Recovery and Accuracy of Ig reconstruction for rhesus macaque single cell datasets. Cell type Method Number of productive chains reconstructed (n=) / Heavy Light Both heavy and light / / Concordance of V(D)J annotation & CDR sequence with nested RT- PCR sequences Heavy Light Both heavy and light / 7/7 8/8 Filter-Non-IG / / / / 7/7 8/8 +Unmapped / / / / 7/7 8/8 / / / / (9.7%) 7/7 8/8 GC B cells (n=) 8/ (8.8%) 8/ (8.8%) / (7.7%) / 9/ (9%) 7/8 (87.%) Filter-Non-IG 7/ (8.8%) 8/ (8.8%) / (7.7%) / 9/ (9%) 7/8 (87.%) +Unmapped 8/ (8.8%) 7/ (8.8%) / (7.7%) / 9/ (9%) 7/8 (87.%) / (6.%) / (7.7) / (.) 9/ (.9%) 8/ (8%) /8 (7.%) Memory B cells (n=) 7/ (8.8%) / 7/ (8.8%) NA NA NA Filter-Non-IG 7/ (8.8%) / (97%) 7/ (8.8%) NA NA NA +Unmapped 7/ (8.8%) / (9.9%) 6/ (78.8%) NA NA NA / (6.6%) 9/ (87.9%) / (6.6%) NA NA NA