Supplementary Materials. for. array reveals biophysical and evolutionary landscapes

Size: px
Start display at page:

Download "Supplementary Materials. for. array reveals biophysical and evolutionary landscapes"

Transcription

1 Supplementary Materials for Quantitative analysis of RNA- protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes Jason D. Buenrostro 1,2,4, Carlos L. Araya 1,4, Lauren M. Chircus 1,3, Curtis J. Layton 1, Howard Y. Chang 2, Michael P. Snyder 1, William J. Greenleaf 1,* 1 Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA 2 Program in Epithelial Biology and the Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA 3 Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA 4 These authors contributed equally to this work *Correspondence to: wjg@stanford.edu Supplemental Discussion: We hypothesize that the changes in association rate for MS2 binding might come from three possible sources: 1) electrostatic changes to the binding surface reducing the probability of a productive collision, 2) a destabilization of the RNA hairpin stem causing fraying and fewer productive collisions or 3) destabilization of the hairpin coupled with formation or competition with alternate secondary structure, reducing the number of productive collisions. In principle, these structures might be long- lived 1,2 (longer- lived than k off), thus effectively sequestering a fraction of the RNA population, and also shifting the apparent K d. This latter possibility would also reduce the predicted maximum fluorescence intensity (F max) of the cluster under saturating conditions by effectively sequestering a population of RNA structures. While we do see substantial variation in the fit F max, we observe a correlation of this F max with predicted association rates (see Supplementary Table 2,3), suggesting that 35% of the average variance in association rates is potentially due to this third explanation. These observations suggest the data provided here may 1

2 also provide a rich resource for modeling the RNA hairpin and alternate structure formation, an area of inquiry beyond the focus of this work. To promote effective use of the method, we would highlight some potential limitations. For instance, some RNA binding proteins may also have affinity to single- and double- stranded DNA or to RNA polymerase, thus such proteins ought to be avoided or proper controls ought to be performed. The Illumina sequencing instrument has a green (532 nm) and red (660 nm) laser and we found that changing optical filters was straightforward. Tagging RNA binding proteins with SNAP substrates increased assay flexibility and allowed us to chase with unlabeled MS2 during the dissociation experiment using the same protein preparation. In addition, we anticipate long RNA s can be transcribed and assayed on the flow cell. In previous studies we have seen DNA clusters up to ~1.6 kb in length 3. The adapters for RNA synthesis that we used are 180 bases in length (Supplementary Fig. 1) suggesting we might expect to generate up to 1.4 kb of RNA. However, clustering and RNA synthesis efficiencies may need to be optimized for longer RNAs. 2

3 Supplementary Figure 1: Library construction schematic. Library construction occurred in three major steps. First, we made the degenerate oligo into a double stranded molecule by annealing and extending. Second, we bottlenecked the library to approximately 9x10 5 unique molecules. Third, we PCR amplified the degenerate library and add the 3 sequencing adaptor. The final library contained all sequences required for DNA sequencing and RNA synthesis. 3

4 Supplementary Figure 2: Barcode representation and error. Bottlenecked libraries were diluted to approximately 8x10 5 molecules prior to PCR amplification. Bottlenecking yielded a median of 15 clusters per unique barcode. 4

5 Supplementary Figure 3: Visualizing RNA generation on sequenced DNA clusters. a, After sequencing, the flow cell contained residual fluorescence from incorporated fluorescent nucleotides during sequencing. b, To remove residual fluorescence, we denatured and washed away the labeled DNA strand created during sequencing using 0.1N NaOH. c, After denaturation, clusters could still be resolved, to further remove residual fluorescence we treated the flow cell with cleavage mix (supplemental methods), leaving nearly undetectable residual fluorescence. d, Following preparation of the flow cell for RNA generation, we annealed an Alexa647 labeled DNA oligo to the stall sequence of the single- stranded DNA (ssdna). This produced a reference point reflecting the total amount of DNA in each cluster. e, Second, we denatured the annealed fluorescent oligo and synthesized double- stranded DNA (dsdna). We also hybridized an unlabeled DNA oligo to the stall sequence to block hybridization of the Alexa647 labeled oligo to ssdna. f, Third, we transcribed RNA and annealed the same Alexa647 labeled oligo onto the newly synthesized RNA. The efficiency of RNA generation was approximated as the fluorescence at each cluster following RNA transcription relative to the fluorescence annealed probe to ssdna (~30-40%). 5

6 Supplementary Figure 4: Data Analysis Workflow. a, Sequencing cluster centers were derived from the fastq files from the sequencing run. X/Y and tile positions were extracted from the fastq header lines. Data were cross- correlated with the observed images to define a global offset. Images were then cleaned to mask any saturated pixels. Images were broken into smaller sub regions (24x24 pixels) and the fluorescence was fitted to a sum of overlapping 2D Gaussians. This process was repeated for all 120 tiles of the GAIIx sequencing lane and across the 26 image series (3,120 images). b, Binding images were normalized for RNA content using the all RNA image (Alexa647 oligo hybridized to the stall sequence). Data was aggregated across the image series by cluster ID, and the fluorescence values for each cluster across concentrations was fit to a binding curve. The fit binding energies were grouped by hairpin sequence, and median binding energies for each sequence were reported. 6

7 Supplementary Figure 5: Correlating sequencing data and fitting 2D Gaussians to acquired images. We found that a simple cross- correlation was sufficient to map x/y positions from the sequencing data to both the a, all RNA image and the b, MS2 binding images (cluster centers shown in green). Shown are unaligned images and cluster centers (left), the cross- correlation value (middle), and the resulting mapped cluster centers (right). The plotted cluster centers were adjusted using the least squares image fit. Images were fit to 2D Gaussians and generated the following distribution for the relevant parameters: c, the fit amplitude and d, the fit standard deviation from a representative tile. Integrating these values generated e, the distribution of the integrated fluorescence. 7

8 Supplementary Figure 6: Representative dissociation fits. a, The - 5C consensus variant had a half- life of 8.39 minutes. b, c, Two single mutants with half- lives 4.24 and 1.91 minutes, respectively. 8

9 Supplementary Figure 7: Distribution of mean square error (MSE) across all variants. Mean square error (MSE) for each hairpin variant was calculated using the median of each single cluster fit. We found that most variants fit well to a binding curve (median MSE was 8.7x10-6 ), however, a subset of variants fit poorly (MSE >0.025). We excluded this set (N=335) from further analysis. 9

10 Supplementary Figure 8: Comparison of measured, bulk in vitro binding energies to measurements from the RNA array. a, Filter binding assay workflow. b, c, With this filter binding assay, we measured the binding energy of 5 RNA variants, including the consensus sequence (- 5C), with an error of less than 1 kcal/mol. These measurements correlated well with on- chip (RNA array) measurements (R=0.92). The slope of the best fit line is 0.76 with a 95% confidence interval of (0.18, 1.35). In addition, the - 5U,- 10U variant did not show binding in vitro or on chip, consistent with the literature 4. The binding energies measured on the RNA array also correlate well with those reported in the literature (R=0.94), and the slope of the best fit line is 1.08 with a 95% confidence interval of (0.81, 1.34). K d s measured on the RNA array were shifted ~0.7 kcal/mol with respect to reported values, and a similar shift was observed for the K d s measured via filter binding assay using the same protein prep. We speculate that these relatively small discrepancies between the affinity measurements in the literature and measured by the RNA array may be due to differences in fraction of active protein, or due to subtle differences in binding buffers. 10

11 Supplementary Figure 9: Measurement error. Using the consensus sequence (- 5C), we calculated error as a function of cluster number. Error was determined by subsampling to n clusters and calculating the range of the bootstrapped confidence interval on the median. Calculated error on the median binding energy decayed as 0.67/sqrt(n). Using the consensus as a standard (shown here), we found that 15 clusters was sufficient to yield an error 0.17 k BT for the consensus. We conducted similar analysis for all variants measured (see Table 2 for estimated errors). 11

12 Supplementary Figure 10: Analysis of cluster size and observed dissociation kinetics. To ensure cluster size or RNA density did not affect dissociation kinetics, we grouped clusters containing the consensus variant (- 5C) into 10 equal sized bins by decreasing fluorescence intensity (brightness), a metric of cluster size. We found that cluster size did not have a substantial impact on calculated dissociation rates. The reported value for all clusters of the consensus sequence is marked in red. 12

13 Supplementary Figure 11: The effect of single mutants on binding affinity. G is calculated for all single mutants relative to the binding energy of the consensus sequence. Single mutants A, C, G and U are shown as green, blue, yellow and red bars, respectively. Values displayed at k BT are less than or equal to k BT G. 13

14 Supplementary Figure 12: Inferring RNA structure from epistasis signatures and modeling mutation effects. a, Matrix of mean epistasis scores for i,j position pairs. Position pairs with the reciprocal, highest mean epistasis scores 1 s.d. from zero highlight base- paired positions (orange) in the MS2 hairpin. b, We selected position- informative sets of variants (N=121) at single- stranded and base- paired positions, and generated a binary matrix with annotation terms describing the presence (1) or absence (0) of six specific primary and structure defects in each variant. For each position, we regressed the specific contribution (ω) of each defect to describe the G of binding among position- informative variants. Predicted (model) and experimental differential binding energies are shown. Variants and parameters exclusive to base- paired positions are highlighted (orange). 14

15 Supplemental Figure 13: Biased mutational trajectories are position dependent. Bias between mutational paths containing A:G vs. C:U and G:U vs. A:C intermediates were measured using the probability ratio between the evolutionary trajectories. Strong enrichment of G:U intermediates was seen at the - 12/+1, - 11/- 1 and - 8/- 3 positions and strong depletion of A:G trajectories was observed at the - 12/+1 position. 15

16 Supplemental References 1. Solomatin, S. V., Greenfeld, M., Chu, S. & Herschlag, D. Multiple native states reveal persistent ruggedness of an RNA folding landscape. Nature 463, (2010). 2. Herschlag, D. RNA chaperones and the RNA folding problem. 270, (1995). 3. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA- binding proteins and nucleosome position. Nat Meth 10, (2013). 4. LeCuyer, K. A., Behlen, L. S. & Uhlenbeck, O. C. Mutants of the Bacteriophage MS2 Coat Protein That Alter Its Cooperative Binding to RNA. Biochemistry 34, (1995). 16