regression t value two sample t values regression line

Size: px
Start display at page:

Download "regression t value two sample t values regression line"

Transcription

1 Suppl. Table 1: Testing SCRE sequence information without the structural constraint SCRE t test on difference of coefficients SCRE t value sequence only t value Dm Dm Dm Dm Dm Dm Smg Smg N > 1000, t values may be interpreted as z scores in a normal distribution Calculated using multivariate regression on best correlating data for each SCRE

2 regression t value two sample t values regression line Supplementary Figure 1 mrna levels vs. sequence score plots with comparison of regression and two sample t values: For each SCRE, residuals were calculated from its best correlating microarray experiment such that only the effect of the named SCRE (plus unexplained error) had not been subtracted from the data. Lines for the regression (green) and regression t value (blue; alt. Y axis) are plotted. Then, using each observed sequence score as a threshold to divide the genes in two groups, a pooled twosample t test was performed on the expression data (red line; alt. Y axis). The two sample t test t values only surpassed the regression t value (slightly) for Dm1, showing that regression recovers correspondences between regulatory sequence content and microarray values without the need to choose a sequence score threshold and the associated reduction in statistical power due to multiple hypothesis correction.

3 Suppl. Table 2: Summary of significant annotations (P value 10 2 ) for SCRE containing target mrnas Gene Ontology Phenotype In Situ catalytic activity Vts1 4, Vts1 5 plasma membrane transmembrane transporter activity oxidoreductase activity carbohydrate metabolic process plasma membrane stage 1 3 maternal Smg 4, Smg 5 development: imaginal disc, nervous system ion binding sexual reproduction cell cycle reproductive system maternal effect eye antennal disc wing stage 4 6 no staining stage 4 6 rapidly degraded stage 4 6 late extended germ band embryo any stage nervous system adenyl nucleotide binding localization gastrula embryo extended germ band embryo any stage female reproductive system Dm1 transport developmental process dorsal closure embryo organelle organization and biogenesis Dm2 ribonucleotide binding developmental process post translational protein modification egg imaginal precursor any stage female reproductive system any stage central nervous system localization development endomembrane system Dm3 organelle membrane adenyl nucleotide binding membrane adenyl nucleotide binding Dm4 localization transmembrane transporter activity cell communication eye morphogenesis any stage nervous system neuron differentiation reproductive system any stage epithelium Dm5 wing disc development oogenesis maternal effect tracheal system organ development eye antennal disc appendage neuron differentiation any stage germ layer sexual reproduction eye any stage neurogenic region Dm6 eye development wing disc development reproductive system tracheal system any stage ectoderm organ morphogenesis appendage maternal effect

4 P value = 0. While P value of last matrix added to the model < stopping P value, For all experiments, For all stem loop oligonucleotide motifs in the search space, For all genes, Count stem loop motif in mrna sequence. Update the motif count vs. expression Pearson correlation. Select the stem loop motif that had the strongest correlation in any experiment. For those experiments where the best motif had a correlation within 1 std. error of best experiment, For each position in the best motif with a nucleotide defined, For motifs one nucleotide different from the best motif only at the current nucleotide position, If the correlation is in the same direction as the best motif, Update the geometric mean of the relative affinity of that nucleotide at that position. Output the calculated stem loop position specific affinity matrix. For all genes, Calculate mrna sequence scores for the stem loop matrix. For all experiments. Perform multivariate regression after adding the new matrix to the linear model. Output multivariate fit of full all stem loop matrix model (coefficients and t values). P value = P value for the new stem loop matrix in the multivariate fit for the best experiment. Repeat. Supplementary Figure 2 StructRED pseudocode

5 Supplementary Figure 3 mrna levels vs. mean stem loop folding energies For all mrnas, each sequence window that contained the consensus CNGG(N) of Vts1p or Smaug flanked by three hybridizing base pairs (including G U pairs) was scored for the folding free energy (ViennaRNA package; 1) or hydrogen bond count over a 10bp stem centered at the same position. The mean of these free energies and hydrogen bond counts per mrna were then plotted against microarray measured Vts1p binding (GEO accession GSE3741) or Smaug depended mrna regulation (wt vs. smg, 4 6hr, GEO accession GSE8910). For both metrics of stem loop folding energy and for both proteins, the mean stem loop folding energies were significant predictors of Vts1p/Smaug activity (P value < 10 9 ). The R 2 's for the fits are low (< 0.023), but the significant correlation suggests that incorporating a continuous (rather than the current binary) term in the StructRED model could improve its ability to explain mrna levels. While implementing such a parameter would be non trivial, this preliminary analysis suggests it may be fruitful. 1. I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125:

6 Supplementary Figure 4 Genome wide relative density of Drosophila SCREs in different regions of mrnas For each Drosophila SCRE, sequence scores were calculated in the UTRs and the CDSs for every mrna sequence. The scores were then divided by the length of sequence region and genome wide averages were calculated. Shown here is a SCRE wise comparison of those region specific score densities. For Dm3 through Smg 5, the regions of highest score density are the same as the regions that are most predictive of the respective functional genomics measurements (Figure 5). Dm1 and Dm2 did not follow this trend.