Microarray Experiment Design

Size: px
Start display at page:

Download "Microarray Experiment Design"

Transcription

1 Microarray Experiment Design Samples used, extract preparation and labelling: AML blasts were isolated from bone marrow by centrifugation on a Ficoll- Hypaque gradient. Total RNA was extracted using TRIzol Reagent (Gibco), followed by clean up on RNeasy mini/midi columns (Qiagen). Quality of RNA was assessed using the Agilent 2100 Bioanalyzer system prior to labelling. For each sample, a biotin-labelled crna target was synthesized starting from 2-5µg of RNA. Double stranded cdna synthesis was performed with GIBCO SuperScript Custom cdna Synthesis Kit, and biotin-labelled antisense RNA was transcribed in vitro using Ambion s In Vitro Transcription System, including Bio- 16-UTP and Bio-11-CTP (Enzo) in the reaction. All steps of the labelling protocol were performed following indications suggested by Affymetrix. The size and the accuracy of quantitation of targets were checked using the Agilent 2100 Bioanalyzer system. Hybridization procedures and parameters Hybridisation mix for target dilution (100 mm MES, 1 M [Na +], 20 mm EDTA, 0.01% Tween 20) was prepared as indicated by Affymetrix, including premixed biotin-labelled control oligo B2 and biob, bioc, biod and cre controls at a final concentration of 50 pm, 1.5 pm, 5 pm, 25 pm and 100 pm respectively. Targets were diluted in hybridisation buffer at a concentration of 150µg/ml and denatured at 99 C prior to introduction into the GeneChip cartridge. Each biotin-labelled target was then hybridized to a HG-U133A array. Hybridisations were performed for hours at 45 C in a rotisserie oven. GeneChip cartridges were washed and stained in the Affymetrix fluidics station following the EukGE-WS2 standard protocol (including Antibody Amplification):

2 1. Wash 10 cycles of 2 mixes/cycle with Wash Buffer A (6X SSPE, 0.01% Tween 20) at 25 C 2. Wash 4 cycles of 15 mixes/cycle with Wash Buffer B (100 mm MES, 0.1 M [Na+], 0.01% Tween 20) at 50 C 3. Stain the probe array for 10 minutes in SAPE solution (10 µg/ml SAPE in 100 mm MES, 1 M [Na +], 0.05% Tween 20, 2 mg/ml BSA) at 25 C 4. Wash 10 cycles of 4 mixes/cycle with Wash Buffer A at 25 C 5. Stain the probe array for 10 minutes in antibody solution (Normal Goat IgG 0.1 mg/ml, 6. Biotinylated antibody 3 µg/ml, 100 mm MES, 1 M [Na +], 0.05% Tween 20, 2 mg/ml BSA) at 25 C 7. Stain the probe array for 10 minutes in SAPE solution at 25 C 8. Final Wash 15 cycles of 4 mixes/cycle with Wash Buffer A at 30 C 9. Scan the GeneChip. Images were scanned using an Affymetrix GeneArray Scanner, using default parameters. The resulting images were analysed using Affymetrix Microarray Suite version 5 (MASv5). Measurement data and specifications Absolute analysis was performed for each chip with MASv5 software using default parameters, scaling all images to a value of 500. Report files were extracted for each chip, and performance of labelled targets was evaluated on the basis of several values (scaling factor, background and noise values, % present calls, average signal value, etc). The median value for all scaling factors was 3.51, with a standard deviation of 1.4. Elaboration of results

3 Absolute files were elaborated with GeneSpring v 6.1 (Silicon Genetics). We divided the 78 AML samples into two groups of 39 cases each: the first group ( training set ; 39 cases) was used to identify genes that function as putative predictors of NPM status, whereas the second group ( test set ; 39 cases) was used to assess the validity of the identified predictors (see text, Table 1 and Table S1 for patient characteristics). To analyze the training set, unsupervised hierarchical clustering (according to Pearson s Correlation) was performed on 7,197 selected genes that are expressed (scored as present or marginal in >1 patient) and whose expression levels change by a factor of 2 fold (compared to the median expression level) in at least 4 of 39 patients (10.3%). The strongest clustering parameter was NPM localization (Fig. 1A). Similar results (not shown) were obtained after changing the group of genes used in the clustering algorithm (e.g., by altering the minimum number of patients required by one or both criteria) and/or the sample composition of the training set, indicating that the separation of NPMc+ from NPMc- AMLs is a stable feature. We then performed an analysis of variance (1-way ANOVA) with the aim of identifying the genes that best discriminate the NPMc+ and NPMc- AML subgroups. We used a parametric test (Welch t-test), with a p-value cut-off of 0.05, and a multiple testing correction (MTC) using a 5% Benjamini and Hochberg False Discovery Rate, and obtained 369 probe sets, corresponding to 330 non-redundant genes (Table S2). The use of a more conservative MTC (i.e. Bonferroni) yielded a list of 29 genes that corresponded to the subset of the 369 genes previously identified with the lowest p-values. To cross-check if the predictors identified by 1-way ANOVA are reliable, we used an independent method to reanalyze the results obtained with the training set. We used the SAM software, applying a fold-change cut-off greater

4 than 2 and a q-value (which represents the probability of false regulation) of <2%, and identified 160 probe sets, corresponding to 140 non-redundant genes, as putative predictors of NPM status (Table S3). Of these, 104 were present in the list of 369 Probe Sets identified by 1-way ANOVA; 9 additional predictor genes were in common between the two lists, albeit identified by different Probe Sets, for a global overlap of 70%. To further validate the reliability of these putative predictors, we performed leave one out cross-validation of the training set using both lists of overlapping predictors (comprising 104 and 113 Probe Sets, respectively), with 38/39 cases correctly predicted in both cases (data not shown). We used the 369 putative predictors to study an independent set of 39 AML ( test set, Table 1 and Table S1), including 29 NPMc+ and 10 NPMc- AML. Like the training set, this group of patients was heterogeneous for FAB subtype and FLT3 mutations, but 6/29 NPMc+ cases presented rare chromosomal abnormalities (add1, del11, inv3, del(9), and trisomy 8 in two cases). Efficient segregation of NPMc+ from NPMc- cases in the hierarchical clustering of the test set was likewise obtained using the 160 gene list generated by SAM (data not shown). Validation of results Validation of microarray data was performed by RT-PCR using TaqMan GeneExpression Assays and a PE ABI PRISM 7900 Real-Time PCR system. Each sample was run in triplicate and mean values were used for further calculations. 18s rrna was used as endogenous control to calibrate the amount of input RNA across different samples. Expression values for each gene in each sample are expressed as 2 - ΔCT (ΔCT = difference between the mean threshold cycle for each specific gene and for the 18S control ribosomal RNA gene). Figure 1D shows

5 expression levels calculated as deviation from the median. Results largely overlap with those obtained for the same data sets in microarray experiments (Figure 1C).