Background Analysis and Cross Hybridization. Application

Size: px
Start display at page:

Download "Background Analysis and Cross Hybridization. Application"

Transcription

1 Background Analysis and Cross Hybridization Application Pius Brzoska, Ph.D. Abstract Microarray technology provides a powerful tool with which to study the coordinate expression of thousands of genes in a single hybridization reaction. However, probe specificity can have a direct influence on the accuracy of transcript quantitation. Therefore, it is essential to account for non-specific binding and cross-hybridization when analyzing data from microarray analysis. Agilent s in situ microarrays with 25mer probes and unique control probes allow assessment of specific expression profiles of cellular transcripts. Experiments presented here show that Agilent s 25mer probe design and expression system offer a robust platform for the analysis of specific gene expression. The experiments also demonstrate the utility of Agilent s positive and negative controls in evaluating microarray data. In particular, two types of negative control probes allow the assessment of general background hybridization and transcript-specific cross hybridization. Introduction As microarray technology becomes increasingly prevalent in biotechnological research and development, there is a growing need for high throughput and automated analytical systems. Successful development of such systems requires a thorough understanding of the effect of non-specific binding and cross-hybridization on the signal intensity assigned to each array feature. Agilent Technologies has developed an in situ microarray plaform consisting of 25mer probes synthesized directly on the microarray. These oligomers are derived from the nucleotide sequences of specific gene transcripts. The microarrays enable high throughput gene expression analysis while offering a high level of automation. With the in situ array platform, the user can accurately assess the validity of microarray data by using Agilent s automatcontinued on page 2 1 Publication EN

2 ed Feature Extraction software program in combination with specific positive, negative, and deletion control array features (1,2). Microarray analysis typically requires choosing an appropriate background from which to estimate the noise of the system for accurate estimation of feature signal intensity. Agilent s Feature Extraction software allows the user to choose either local or global background during feature analysis. Local background is the non-specific binding of labeled probes to the array in the immediate vicinity of the feature. Global background includes nonspecific binding to the substrate as well as non-specific binding to a 25mer. One advantage of the global background subtraction capability is that it allows the subtraction of a set of negative control features developed by Agilent specifically for this purpose. In the studies presented in this note, this proprietary set of negative control features was used to control for background noise. Finally, in addition to the set of global negative control features, the hybridization signal from each 25mer oligonucleotide is controlled for by a specific deletion control probe. This deletion control probe is identical to the original probe except that the nucleotide from position 12 has been deleted. In this note, we describe and analyze the contribution of the various background signals to the specific hybridization signal. We also demonstrate the utility of the global negative control strategy. Methods A 8455 feature in situ-synthesized array was made by inkjet printing at Agilent s manufacturing facility probes represented human sequences, 895 probes were control probes. The array consisted of 1260 transcript-specific probes representing 380 human genes from the RefSeq (3). Probes (also called 25mers ) are oligonucelotides of 25 bases in length. 992 optimized probes (representing 331 genes) were included in the array; these optimized probes were selected via Agilent s iterative optimization process, in which probes are evaluated for specific signal intensity. Optimized probes bind to specific, expressed transcripts and have a clean signal to noise ratio. 268 probes (representing 49 genes) which were included in the array had not been optimized. Each probe was printed in triplicate on the microarray along with a corresponding deletion control probe designed to measure cross-hybridization. Each deletion control probe is exactly matched to the transcript probe except that the nucleotide at position 12 in the perfect match probe has been deleted. The arrays also included positive and negative control features for measuring background hybridization. Hybridization was performed using 2µ of Cyanine 3 and Cyanine 5-labeled target sample cdna generated from Agilent s Linear Amplification kit and 200ng Hela cell mrna (Clontech). The amplification yield was 120 fold. Hybridization was performed using Agilent s hybridization chambers, hybridization oven and protocols. The resulting slides were washed, according to Agilent s protocols, and scanned with Agilent s high resolution scanner. Feature extraction was performed using Agilent s Feature Extraction software. continued on page 3 2 Publication EN

3 Results and Discussion Experiments described here have been performed on an in situ microarray containing probes which were optimized to detect transcripts expressed in a panel of human cell lines. These probes revealed a high expression level and a high signal to background ratio. In addition, the arrays contained several probes that were not optimized. The probes matching the transcripts are referred as perfect match probes. For each probe on the array, a probe-specific deletion control was also included; this deletion control is identical to the perfect match 25mer probe, except that the base from position 12 has been deleted. Hybridization was further controlled by the global negative and positive controls. The global negative control is a set of proprietary probes that do not hybridize while the positive control is an endogenous probe that always hybridizes. The array consisted of 1260 probes in triplicates with their corresponding deletion controls, representing 380 genes. 992 probes (331 genes) were optimized while 268 (49 genes) were not optimized. Background hybridization is derived both from non-specific binding and from cross-hybridization of probes to homologous gene transcripts. Figure 1 shows a typical hybridization of Hela cells in a self vs self experiment. [In the paragraph below, it s not clear which type of background is being discussed in the various sentences. More detailed descriptions are needed just a few clarifying words here and there.] Table 1 summarizes the probe specificities in data from two representative in situ microarray experiments. Agilent s Feature Extraction software culls two pieces of background binding information from the microarray data. The first background binding data reveals significant signal intensities over the global background, which one can interpret as binding of transcripts to a probe, in which the transcripts may or may not match the probe sequence exactly. Significant signal intensities are determined by means of a doublesided t-test, in which the mean background signal is compared to the feature intensity in each channel. This analysis determines whether a signal from a perfect match probe is higher than background signal. The second type of background binding data reveals significant signal intensities of perfect match over deletion control probe, which one can interpret as binding of a transcript that perfectly matches the probe sequence. An algorithm in the Feature Extraction software compares the signal intensity of a feature to the signal intensity of its corresponding deletion control. In the experiments presented here, a global background subtraction using Agilent s proprietary negative control features was chosen. Table 1 lists optimized and non-optimized probes separately. continued on page 4 3 Publication EN

4 Experiment Number of probes Number of probes Number of probes total (optimized/ with significant signal expressed over not optimized) intensities (optimized/ DelCtrl (optimized/ not optimized) not optimized) 1 992/ / / / / /166 Table 1: Expression in typical self vs. self experiments. More than 90% of the probes have a significant signal over global background % of the optimized probes and 60% of the non-optimized probes show a significant signal over the corresponding deletion control. The remaining probes either have a very low expression signal or an unusually high deletion control signal. A high deletion control signal could be the result of cross-hybridization to a related transcript sequence. Probe signal intensity distribution curve We compared the distribution of signal intensities yielded by the different probe types in a typical experiment, including perfect match probe, deletion control probe, positive and negative control probes, as shown in figure 2. Deletion controls and negative controls show a similar distribution, with most probes providing signals of less than 600. This result indicates that the majority of the deletion control features do not hybridize to sample transcripts. However, some of the deletion control probes bind with a higher signal intensity than the generic negative controls, as evidenced by a shoulder on the deletion control curve [signal intensity range ]. The positive controls show a bell shaped distribution with no overlap to the negative controls. The perfect match probes show signals comparable to and greater than the positive control signal intensities. Figure 2: Distribution of controls and perfect match probes. The distribution curves indicate that binding properties of the negative controls and the majority of deletion controls are indistinguishable. However, a small fraction of the deletion controls ( ~ 20%) shows a high signal [greater than 600], indicating that these probes and their corresponding perfect match probes might not be a good choice for monitoring specific gene expression. This type of analysis allows the user to eliminate poor probes from the study or, if such probes are included, to interpret the resulting data judiciously. Positive and significant probes with a high deletion control signal are not due to crosshybridization. One possible hypothesis to explain the high signal from some deletion control probes is that the probes are binding to homologous transcripts. continued on page 5 4 Publication EN

5 Blast Score Distribution Figure 3: Distribution of deletion controls by similarity to other genes. However, Agilentís probe design process controls for cross-hybridization to other known members of the respective genome. To underscore this point, in figure 3 the computed similarities of deletion control probes with high signal were compared to deletion controls with low signal. The probes were compared using BLAST against NCBIís Unigen unique, which is a nonredundant, yet comprehensive cdna database. The scores of the hits were summed up and the distribution of the high and low signal deletion controls is shown in figure 3. The distribution curves indicate that high and low signal deletion controls are nearly identical and do not differ significantly. In order to increase our confidence that the deletion controls indeed function as expected, we performed microarray analysis of the yeast genome, using knockout mutants and various control probes as described above. The knockout gene was part of a gene family, whose members show high degree of homology to each other. Perfect match and deletion control probes that corresponded to the knockout gene were synthesized. In a hybridization experiment comparing yeast knockout mutants with their corresponding wild-type control strains, we found that deletion controls reflect the expression levels correctly. Neither the deletion control probe nor the perfect match probe revealed any signal in the knockout mutants. The knockout mutant with the spiked in wildtype mrna showed a hybridization signal against the perfect match probe, but not against the deletion control (data not shown). The data indicate that cross hybridization of known transcripts is not the reason for the high signal of the deletion controls. Unknown genes that were not represented in the Unigen database might cause the signal. Literature: 1) Lee PS, Lee KH Genomic analysis. Curr Opin Biotechnol Apr;11(2): Review. 2) van Hal NL, Vorst O, van Houwelingen AM, Kok EJ, Peijnenburg A, Aharoni A, van Tunen AJ, Keijer J. The application of DNA microarrays in gene expression analysis. 3) 5 Publication EN June 2001