Detection and Restoration of Hybridization Problems in Affymetrix GeneChip Data by Parametric Scanning

Size: px
Start display at page:

Download "Detection and Restoration of Hybridization Problems in Affymetrix GeneChip Data by Parametric Scanning"

Transcription

1 100 Genome Informatics 17(2): (2006) Detection and Restoration of Hybridization Problems in Affymetrix GeneChip Data by Parametric Scanning Tomokazu Konishi Faculty of Bioresource Sciences, Akita Prefectural University, Shimo-Shinjyo, Akita , Japan Abstract Gene expression microarray data often include problems caused by uneven hybridization and dust contamination. Such problems should be removed prior to analysis to prevent degradation of analytical accuracy and false positive results. This paper presents a parameter-scanning algorithm to detect such defects on the basis of the character of data distributions. The cell data is thoroughly scanned using a window algorithm, and windows with an index value greater than a threshold are recognized as defects and removed from the array data. The index is found from the differences between the target and an ideal standard of hybridization obtained as a trimmed mean among experiments, representing the statistical center of differences in each section. The threshold is derived as a screening level designated by the operator, but has only limited effect on the effectiveness of data cancellation. The validity of the algorithm and the effects of data cancellation are tested using GeneChip data obtained from a series of experiments. The algorithm is demonstrated to greatly improve the reproducibility of measurements, and removes only a small number of faultless data. Keywords: microarray data analysis, removal of errors, type I error, false positive, flag, data filter 1 Introduction Hybridization is the basis of gene expression microarray analysis and, while widely used, is not free from technical problems. For example, some hybridizations form a doughnut-like geometric pattern around the center of chip images [11]. Such patterns often result in reduced signals from certain areas of the chip, appearing similar to surface scratching that may be attributed to the entrainment of dust. Although analytical programs that identify such problems have been proposed, the methods are destructive, resulting in the total cancellation of the array chip data when large defects are present [7, 14]. The dchip package [10] implements several automated algorithms for recognizing and removing outliers during model-based data normalization [6]. The algorithms find patterns in the responses among perfect match (PM) and mismatch (MM) probes for each gene, and cells and probe sets that disagree with the resultant patterns are identified as outliers. However, this approach is based on a series of mathematical models that are derived from a very simplified view of both biological fundamentals and the composition of the data. Furthermore, the appropriateness of the models and the calculation methods are difficult to check rigorously as there is no objective indicator for how well the models, which inevitably contain parameters for handling noise, describe the experimental system. One of the reasons why the recognition of hybridization flaws remains ad hoc is that such problems, even if occupying a large proportion of the chip area, are believed to be harmless to the signal or the scaled probe value, which reflect the transcript level. Furthermore, in a GeneChip, a transcript is measured by approximately ten pairs of adjacent PM and MM cells, with pairs dispersed across the chip [12]. Thus, a failure will simultaneously ruin both the PM and MM probe of the relevant pair, but will not ruin more than one probe pair for a gene. The signal is found by several calculation

2 Detection and Restoration of Hybridization Problems 101 algorithms based on different philosophies, and most pay attention to outliers caused by such probe failures. For example, Affymetrix MAS5 [12] finds the signal as a weighted trimmed mean among probe pairs, while RMA [3] finds the signal by a median polish of PM values. It is desirable, however, that problems be recognized and removed from data prior to analysis in order to prevent loss of accuracy in signal data. Trimmed means and medians are robust only if the outliers occur in both directions (i.e. positive and negative) at the same frequency. This is rarely the case in practice, as problems often produce outliers that reflect the cause. For example, bright spots will appear if the problem is caused by fluorescent material, while dark spots will appear if the chip surface has been damaged. These types of defects will affect the results by breaking the robustness of calculations. Such defects also have a direct effect on analyses when the target is not the gene signal but the cell data, as in the case of analyzing processing variants of mrna. Microarray preparation problems thus present a barrier to progress in advanced analyses of GeneChip data. This article introduces a method that finds out the troubles as local tendency of cell data in comparisons of each array to an ideal standard of hybridization. Cells at the identified locations of the troubles are cancelled before data normalization. The cancellations will not affect the original distribution of the array data, since the cancellations are independent to signal intensities. Consequently, remained data will be able to be used for analyses. The following section explains an algorithm that finds and removes the troubles. The troubles are distinguished from biological effects by means of data distribution. The algorithm bases on several verifiable assumptions of which appropriateness is tested with GeneChip data in the Results section. 2 Methods 2.1 Algorithm The proposed parametric scanning algorithm for identifying microarray problems is as follows. A standard, ideal array is selected, and indices representing the size of distinct regions in each chip are determined. Regions with indices larger than a threshold value in reference to the standard are recognized as problem areas. The standard is found as a set of trimmed means among hybridizations. The experiments are simply normalized by dividing the respective median values (including both PM and MM cells) and taking logarithms. The trimmed means of data for each cell in the array are calculated, the resulting set of means is adopted as the ideal standard of hybridization. If the means are calculated using a sufficiently large number of array data, the values can be considered stable and to be suitable for a standard. No particular distributions are expected in the ideal standard. Differences between simply normalized array data and the standard are then found for each cell. These differences may represent both biological responses and experimental noise. The distribution of the differences is expected to be approximately normal, since the logarithms of biological changes appropriately measured and normalized obey a normal distribution [4, 5]. The differences are therefore z-normalized using robust estimators of the distribution parameters, and the distributions are checked on quantile-quantile (QQ) plots. The indices are found by using the medians of the z-normalized differences among neighboring cells on an array. The matrix of the differences is rearranged to reflect the physical order of the chip, and data are collected via a moving window that simulates scanning through a pseudo image of the chip to find the medians. The window median is robust to biological responses, since neighboring cells on a chip do not have biological relationships. In contrast, experimental problems that hide or add signals at the window will affect the window median. The window medians will obey a normal distribution in a strict sense, according to the effect described by the central limiting theorem. Although this model does not expect particular distributions for problems, affected windows will produce outliers in the normal distribution of the matrix medians. The indices are found by normalizing the matrix

3 102 Konishi medians. There is a difficulty in the normalization; width of the distribution of matrix medians is not robust to problems. Indeed, the width may increase with the number of problems. If the distribution is simply z-normalized, the number of recognized problems will be reduced. However, this effect can be readily avoided by finding the width from that of the distribution of the differences among cells. In principle, a width of 0.25 was predicted in the present study for the medians of a window of 25 cells (see simulation section of the data supplement [16]). Here, the width of the distribution of cell differences is robust with respect to problems, since large problems will produce outliers that will not affect the distribution at the central quantiles. In practice, the distributions for cells are not perfectly normal, having long tails possibly due to systematic additive noise in the data. However, the proper width can be estimated robustly from the proper quantiles. Consequently, the effect of the problems can be excluded by estimating the width of the distribution of indices according to the distribution for cells. Systematic noise as well as hybridization problems may change the compensation 0.25 to somewhat larger values. In this article, a constant of 0.31 was used, obtained as the mode in actual measurements and being smaller than many other values that may have been affected by many problems (Figure 1). All indices were adjusted by dividing by this constant. The threshold is derived by a test level decided by analysis prior to the operation, similar to screening levels in other statistical tests. The parametric nature of data handling makes it possible to estimate how many indices will be larger (and smaller) among half a million results. The program will ask the operator how many windows should be expected. If an array is problem-free, the expected number of windows will be recognized by the random neighboring of biological responses on the chip. In practice, the affected indices will not obey the normal distribution and will more likely take values that exceed the threshold. 2.2 Program A program for the parametric scanning method is available in the data supplement [16] in the form of a function for R [8]. The function requires the library "affy" [1], which is available from BioC [14]. An outsourcing service is available as a part of data normalization [17]. Figure 1: A Histogram of standard deviations for medians of moving windows. The mode is 0.31, larger than the expected value of Figure 2: Coincidences between two sets of ideal standards for leaves analyzed by two different laboratories.

4 Detection and Restoration of Hybridization Problems Data Source and Data Processing Arabidopsis GeneChip data were obtained from TAIR [13]. Leaf data from two research groups were used in the comparison of the ideal standard of hybridizations: 15 arrays for the rosette leaf used in drawing expression maps [9], and 18 arrays of day-old control plants in infection experiments by Dr. F. Ausubel's group [13]. Human data [2] were obtained from the public domain resource at RCAST, University of Tokyo [15]. PM data for the arrays were normalized according to the threeparameter method [4]. 3 Results 3.1 Verification of Assumptions Stability of Hybridization Standard The method compares each datum with the ideal standard of hybridization, which should represent a stable pattern of the sample tissue. If the pattern is truly stable, the pattern will coincide with that of other standards determined using different sets of data on identical tissue. To confirm this coincidence, the standards obtained using data from two research groups were compared. Both groups determined the transcriptome of leaves, one as part of an atlas of plants, and the other as a control for infection experiments. Standards were obtained as trimmed means of the median-normalized log data. The results were compared on a scatter plot with 1,000 corresponding cell data (Figure 2). The coincidence between laboratories was thus confirmed. Some other examples of inter- and intralaboratory comparisons are presented in the data supplement [16], showing likely correspondences. Such coincidences cannot be obtained by chance; for example, standards found from different tissues have different tendencies, which will appear as wide scatter in the plot (data supplement [16]). Such tendencies may show a tissue-dependence of the standard, and attention should be paid in a practical usage of the program (see Discussion) Normality of Differences between Array Data and the Standard The proposed method assumes that the differences between each datum and the ideal standard of hybridization will be distributed normally in a rough sense. This assumption was confirmed by means of QQ plots for the data distribution. The distributions had long tails, which may reflect the systematic additive noise of measurement. However, all of the distributions were coincident with the theoretic values at -1.5 to 1.5 (Figure 3 and data supplement [16]), indicating that more than 85% of the data obeyed the normal distribution. As problems and noise influence the distribution, hybridizations with large problems had a narrower range of coincidence, as observed in the case shown in Figure 3 (ATGE 14C) Normality of Distribution of Indices The method also assumes that the indices, which are derived from the medians of the moving windows, will be distributed normally when large problems are not present. This assumption was also confirmed by means of QQ plots (Figure 4 and data supplement [16]). The distributions observed were roughly normal, as expected from the central limiting theorem. The standard deviation of 0.31, determined from many hybridizations (Figure 1), afforded good compensation for the width of the distribution and slope of the plot (Figure 4, panels at the left and the center). As expected, the width of the distribution increased with the severity of the problems (Figure 4, ATGE 14C, right).

5 104 Konishi Figure 3: Distribution of differences between hybridizations and standards. Straight line at y = x denotes the normal distribution. Data are denser at the center of the plots. Only 2.3%, 0.1%, and 0.003% of data have z-scores of 2, 3 and 4. Figure 4: Distribution of index values. 3.2 Confirmation of Method Improvement of Reproducibility in Repeated Experiments If parametric scanning effectively eliminates problems from data, it will reduce the fluctuations found in duplicate experiments. This effect was checked using sets of repeated measurements on Arabidopsis leaves [13]. Before and after cancellation, PM data were normalized using the SuperNORM algorithm [17], which is based on a three-parameter method [4]. The resultant z-scores were compared on scatter plots (Figure 5), from which it is clear that the proposed method eliminated the diffusions found in the plots (Figure 5, left) and achieves the expected reproducibility (center). As in other statistic tests, some clean and faultless data were also cancelled by parametric scanning. In a sense, this is a cost required to find something by means of statistical tests. However, in this algorithm, the number of cancelled clean data was not large. The nature of the cancelled data was checked from the reproducibility of experiments (Figure 5, right). The number of data on the plots increased as the quality of hybridization decreased. The cancelled data did not display narrow concentrations to the y = x line, but were instead dispersed (Figure 5, right). Coincidence was observed only when many cell data were cancelled (Figure 5, lower right), and the data concentrated on the y = x line were only a limited part of the cancelled data. Some of the fluctuations found in the examples shown in Figure 5 were critically large. Such examples were not exceptions among the many examinations. Figure 6 compares the numbers of

6 Detection and Restoration of Hybridization Problems 105 cancelled data under different expectations. It is obvious that the extreme examples have not been taken from outliers. Other examples are available in the data supplement [16]. The improvement of reproducibility was further checked from the reductions in the standard deviations for the differences in z-scores between the corresponding PM cells of paired hybridizations. To minimize the effect of additive noise and saturation of measurements, standard deviations were calculated using normalized values (0 to 1). The effect was checked on a scatter plot (Figure 7), which clearly shows that parametric scanning reduces the standard deviation in the differences among obtained z-scores. Figure 5: Reproducibility in repeated experiments. A combination of experiments is shown in each row. Left: original data. Center: remaining data. Right: cancelled data. PM data (n = 10,000) randomly selected from the indicated pairs of arrays are shown. The expectation value for the cancellation was 2 windows.

7 106 Konishi Figure 6: Numbers of cancelled cells at expec- Figure 7: Standard deviations of differences tations of 2 and 20 windows (50 and 500 cells, among cell data in reproducibility measurerespectively). Data sources: rectangles [9], cir- ments. cles [13] and triangles [2] Comparison with Other Algorithms The method was evaluated against the same sets of arrays treated using other automated methods in the dchip package [10] rather than new experimental data. All the spikes and outliers recognized by dchip were cancelled using the PM-only model, and the data were normalized in an identical manner. As shown in Figure 8, dchip gave lower reproducibility (Figure 8, left), showing weaker detection power. This does not necessary means that dchip preserves faultless data; it cancelled the complete set of cells for certain genes ( % of the total), while no gene was totally cancelled by the parametric scan (see data supplement [16]). In such genes, no information will be retained for analysis Sensitivity of Threshold Parameter The number of data actually cancelled in each hybridization was not clearly dependent on the threshold parameter, which is a test level decided by the operator. The number of cancelled data was much larger than that of the expectations estimated from the threshold parameter (Figure 6), reaching as high as a quarter of the total number of cells (tens of thousands), even when the expectation was 50 cells of 2 windows. However, the number of cancelled cells did not increase by ten times when the expectation was increased from 2 to 20. The relationship between the expectation and the actual number of cancellations became poorer as the number of canceled data increased. Processing of data obtained from three different laboratories suggested a stable relationship between cancelled windows at the two expectations (Figure 7). It should be noted that the expected numbers, which appears at (1.7, 2.7) in the plot, briefly satisfies the extrapolated relationship (Figure 7). The number of cancelled data may depend on the quality of hybridization, as the number of cancellations was observed to be higher when major problems were found (Figure 5). The cancelled windows often formed clusters in the chip, suggesting a single cause within the cluster (Figure 9). Such clusters were found regardless of the value of the expectation parameter. The frequencies and area of cancellation differed among data from the different laboratories (Figure 6). The data measured in one particular laboratory (triangles in the figure) were clearly larger than from the other laboratories.

8 Detection and Restoration of Hybridization Problems 107 Many of the clusters may represent polishing of the chip surface or uneven hybridization. It is likely that the differences in the frequencies of problems are due to the differences in protocols and skills in wet experiments, which will differ according to the laboratory and the time of preparation. These problems were highlighted by high index values, producing many cancelled windows in the case of severely defected cells even when the expectation was rather small. The results above are considered evidences showing the insensitivity of parametric scanning to the value of the expectation parameter, that is, the proposed method appears to have good fidelity with respect to problem detection. Such insensitivity implies objectivity in the algorithm, since the threshold is the only parameter subject to operator selection. 4 Discussion On the basis of the observations above, the proposed method is recommended for practical use on all GeneChip expression data prior to normalization. The assumptions in the approach were validated through analysis of data distributions, and the only arbitrary parameter was shown to have limited effect on the results. Furthermore, through tests in many additional experiments (not shown), the parameter scanning method has been found to be very effective in eliminating hybridization problems. The appropriateness of the method can be checked in every analysis, with the data required for the checking process supplied by the software (data supplement [16]). The numbers of cancelled data are always larger than the expectation, suggesting that most hybridizations have problems of some sort. The problems detected had patterns indicative of surface polishing, uneven hybridization, Figure 8: Reproducibility in data treated by the dchip package. Results using the PM-only models are shown. The corresponding original data are presented in Figure 5 (left). Left: remaining data. Right: cancelled data. PM data (n = 10,000) randomly selected from the indicated pairs of arrays are shown. Results using PM-MM models are presented in the data supplement [16]. and errors in the fabricated cell structure. Symmetric patterns of clusters surrounding the center of the chip (Figure 9, lower right) can be identified as polishing artifacts [11]. In such a case, the signals in the affected area are always distinctively lower and thus insensitive to the expectation value. Cases with advanced degree of surface polishing will form the common doughnut-like cluster pattern. In contrast, clusters with indefinite shape are more likely indicative of uneven hybridization. Within the cluster, data has a tendency to increase or decrease, producing diffusion in the scatter plot with experimental reproducibility (Figure 5). Such unevenness can be derived from several sources, and some of the distinctive regions are insensitive to the expectation value while some are not (Figure 9, ATGE_14_C). The differences in sensitivity correspond to the differences in the magnitude of the defect. Defects detected as smaller clusters or isolated windows may have been formed by dust. Again, some of these features are distinct while others are not. Errors in the chip structure can be identified as repeated clusters in the same parts of multiple chips, forming regular shapes often surrounded by straight lines. Many such defects are not problems but control cells designed and placed on the

9 108 Konishi Figure 9: Positions of cancelled windows in a chip. Four typical examples at the indicated expectations are shown. Upper left: hybridization with relatively small numbers of cancellations. Upper right: uneven hybridization. Lower left: regular shapes with straight boundaries. Lower right: clusters at symmetric positions. chip, although some may be caused by problems, appearing in all chips with similar batch numbers (i.e. same manufacturing lot). Such problems might be caused by product errors that have not been detected in quality controls and can result in serious problems. In the case shown in Figure 5, the huge upward diffusion is attributed to this sort of failure (Figure 9, lower left). The proposed method will reduce false positives in microarray data analyses. Such errors are not unique to microarray analyses, but the multiplicity of tests in conducted using microarrays increases the seriousness of errors. Multiplicity is realized through the comprehensiveness of the microarray and other post-genomic analyses, which generate distinctively different targets of analyses compared to conventional methods measures only a limited number of gene products. In the hyper-multiple comparisons, a large number of false positives will hinder analysis, producing both intra- and interlaboratory contradictions in the observations. For example, permitting type-i error at a probability of 1%, half a million double-sided tests will produce 10,000 errors. Ignoring hybridizations problems will greatly increase this expectation (Figure 5). Additionally, such problems will affect data normalization and the summarized data for genes. Consequently, hybridization problems should be detected and eliminated before normalization. The proposed method will rescue clean data from a failure-free region of hybridization, and the data remaining after cancellation can be normalized and used for further analysis. The resultant data set showed fair coincidence with the corresponding pairs in reproducibility experiments (Figure 5, center). The total cost of experiments will be reduced in comparison to an ad hoc approach to cancellation of genes in arrays and/or entire arrays. The R program in the data supplement [16] will be affected by the tissue effect [10] in discovery of

10 Detection and Restoration of Hybridization Problems 109 the ideal standard of hybridization. That is, the standards will differ according to the differentiation of cells in the sample. Such an effect will occur when treating small numbers of arrays together with large number of arrays on a different tissue. Additionally, treating data using less than four arrays is not encouraged, since the standard cannot be considered stable. The stability of the standard can be checked using the approach shown in Figure 2, and the tissue effect can be noticed by a marked increase in cancellations without producing the clusters of cancelled windows found in Figure 9. Such problems can be avoided by finding the standard separately from the recognition process. Practically, two alternative ways can be employed to discover the ideal standard: using randomly selected samples among various tissues of many arrays, and by finding tissue-specific standards and using these for the corresponding arrays. References [1] Gautier, L., Cope, L., Bolstad, B. M., and Irizarry, R. A., affy analysis of Affymetrix GeneChip data at the probe level, Bioinformatics, 20: , [2] Ge, X., Yamamoto, S., Tsutsumi, S., Midorikawa, Y., Ihara S., Wang S., and Aburatani H., Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues, Genomics, 86: , [3] Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P., Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Res., 31:e15, [4] Konishi, T., Three-parameter lognormal distribution ubiquitously found in cdna microarray data and its application to parametric data treatment, BMC Bioinformatics, 5:5, [5] Konishi, T., A thermodynamic model of transcriptome formation, Nucleic Acids Res., 33: , [6] Li, C. and Wong, W., Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection, Proc. Natl. Acad. Sci. USA, 98:31-36, [7] Psarros, M., Heber, S., Sick, M., Thoppae, G., Harshman, K., and Sick, B., RACE: Remote Analysis Computation for gene Expression data, Nucleic Acids Res., 33:W638-W643, [8] R Development Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, [9] Schmid, M., Davison, T. S., Henz, S. R., Pape, U. J., Demar, M., Vingron, M., Scholkopf, B., Weigel, D., and Lohmann, J., A gene expression map of Arabidopsis development, Nat. Genet., 37: , [10] [11] [12] [13] [14] [15] [16] [17]

Introduction to gene expression microarray data analysis

Introduction to gene expression microarray data analysis Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful

More information

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. References Summaries of Affymetrix Genechip Probe Level Data,

More information

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU March 12, 2003 Outline Review of technology Why study probe level

More information

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Outline. Analysis of Microarray Data. Most important design question. General experimental issues Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,

More information

Mixed effects model for assessing RNA degradation in Affymetrix GeneChip experiments

Mixed effects model for assessing RNA degradation in Affymetrix GeneChip experiments Mixed effects model for assessing RNA degradation in Affymetrix GeneChip experiments Kellie J. Archer, Ph.D. Suresh E. Joel Viswanathan Ramakrishnan,, Ph.D. Department of Biostatistics Virginia Commonwealth

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays A Distribution Free Summarization Method for Affymetrix GeneChip Arrays Zhongxue Chen 1,2, Monnie McGee 1,*, Qingzhong Liu 3, and Richard Scheuermann 2 1 Department of Statistical Science, Southern Methodist

More information

What does PLIER really do?

What does PLIER really do? What does PLIER really do? Terry M. Therneau Karla V. Ballman Technical Report #75 November 2005 Copyright 2005 Mayo Foundation 1 Abstract Motivation: Our goal was to understand why the PLIER algorithm

More information

Parameter Estimation for the Exponential-Normal Convolution Model

Parameter Estimation for the Exponential-Normal Convolution Model Parameter Estimation for the Exponential-Normal Convolution Model Monnie McGee & Zhongxue Chen cgee@smu.edu, zhongxue@smu.edu. Department of Statistical Science Southern Methodist University ENAR Spring

More information

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Background Correction and Normalization Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Feature Level Data Outline Affymetrix GeneChip arrays Two

More information

DNA Microarray Data Oligonucleotide Arrays

DNA Microarray Data Oligonucleotide Arrays DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental

More information

Description of Logit-t: Detecting Differentially Expressed Genes Using Probe-Level Data

Description of Logit-t: Detecting Differentially Expressed Genes Using Probe-Level Data Description of Logit-t: Detecting Differentially Expressed Genes Using Probe-Level Data Tobias Guennel October 22, 2008 Contents 1 Introduction 2 2 What s new in this version 3 3 Preparing data for use

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays GENE EXPRESSION MONITORING TECHNICAL NOTE New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays Introduction Affymetrix has designed new algorithms for monitoring GeneChip

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

Lecture #1. Introduction to microarray technology

Lecture #1. Introduction to microarray technology Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing

More information

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Outline. Array platform considerations: Comparison between the technologies available in microarrays Microarray overview Outline Array platform considerations: Comparison between the technologies available in microarrays Differences in array fabrication Differences in array organization Applications of

More information

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY K. L. Knudtson 1, C. Griffin 2, A. I. Brooks 3, D. A. Iacobas 4, K. Johnson 5, G. Khitrov 6,

More information

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture Humboldt Universität zu Berlin Microarrays Grundlagen der Bioinformatik SS 2017 Lecture 6 09.06.2017 Agenda 1.mRNA: Genomic background 2.Overview: Microarray 3.Data-analysis: Quality control & normalization

More information

Gene Signal Estimates from Exon Arrays

Gene Signal Estimates from Exon Arrays Gene Signal Estimates from Exon Arrays I. Introduction: With exon arrays like the GeneChip Human Exon 1.0 ST Array, researchers can examine the transcriptional profile of an entire gene (Figure 1). Being

More information

affy: Built-in Processing Methods

affy: Built-in Processing Methods affy: Built-in Processing Methods Ben Bolstad October 30, 2017 Contents 1 Introduction 2 2 Background methods 2 2.1 none...................................... 2 2.2 rma/rma2...................................

More information

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor. MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor. Claudio Isella, Tommaso Renzulli, Davide Corà and Enzo Medico May 3, 2016 Abstract Many microarray experiments compare

More information

Oligonucleotide microarray data are not normally distributed

Oligonucleotide microarray data are not normally distributed Oligonucleotide microarray data are not normally distributed Johanna Hardin Jason Wilson John Kloke Abstract Novel techniques for analyzing microarray data are constantly being developed. Though many of

More information

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET Johns Hopkins University, Dept. of Biostatistics Working Papers 3-17-2006 FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYETRIX GENECHIP CONTROL DATASET Rafael A. Irizarry Johns Hopkins Bloomberg School

More information

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer Nagwan M. Abdel Samee 1, Nahed H. Solouma 2, Mahmoud Elhefnawy 3, Abdalla S. Ahmed 4, Yasser M. Kadah 5 1 Computer Engineering

More information

Microarray Technique. Some background. M. Nath

Microarray Technique. Some background. M. Nath Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique

More information

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000

More information

Bioinformatics III Structural Bioinformatics and Genome Analysis. PART II: Genome Analysis. Chapter 7. DNA Microarrays

Bioinformatics III Structural Bioinformatics and Genome Analysis. PART II: Genome Analysis. Chapter 7. DNA Microarrays Bioinformatics III Structural Bioinformatics and Genome Analysis PART II: Genome Analysis Chapter 7. DNA Microarrays 7.1 Motivation 7.2 DNA Microarray History and current states 7.3 DNA Microarray Techniques

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

Quality Control Assessment in Genotyping Console

Quality Control Assessment in Genotyping Console Quality Control Assessment in Genotyping Console Introduction Prior to the release of Genotyping Console (GTC) 2.1, quality control (QC) assessment of the SNP Array 6.0 assay was performed using the Dynamic

More information

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays Materials are from http://www.ohsu.edu/gmsr/amc/amc_technology.html The GeneChip high-density oligonucleotide arrays are fabricated

More information

Technical Note. Performance Review of the GeneChip AutoLoader for the Affymetrix GeneChip Scanner Introduction

Technical Note. Performance Review of the GeneChip AutoLoader for the Affymetrix GeneChip Scanner Introduction GeneChip AutoLoader AFFYMETRIX PRODUCT FAMILY > > Technical Note Performance Review of the GeneChip AutoLoader for the Affymetrix GeneChip ner 3000 Designed for use with the GeneChip ner 3000, the GeneChip

More information

Bioinformatics and Genomics: A New SP Frontier?

Bioinformatics and Genomics: A New SP Frontier? Bioinformatics and Genomics: A New SP Frontier? A. O. Hero University of Michigan - Ann Arbor http://www.eecs.umich.edu/ hero Collaborators: G. Fleury, ESE - Paris S. Yoshida, A. Swaroop UM - Ann Arbor

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data April 14, 2015 SPH 247 Statistical Analysis of Laboratory Data 1 Basic Design of Expression Arrays For each gene that is a target for the array, we have

More information

3.1.4 DNA Microarray Technology

3.1.4 DNA Microarray Technology 3.1.4 DNA Microarray Technology Scientists have discovered that one of the differences between healthy and cancer is which genes are turned on in each. Scientists can compare the gene expression patterns

More information

Predicting Microarray Signals by Physical Modeling. Josh Deutsch. University of California. Santa Cruz

Predicting Microarray Signals by Physical Modeling. Josh Deutsch. University of California. Santa Cruz Predicting Microarray Signals by Physical Modeling Josh Deutsch University of California Santa Cruz Predicting Microarray Signals by Physical Modeling p.1/39 Collaborators Shoudan Liang NASA Ames Onuttom

More information

Use of DNA microarrays, wherein the expression levels of

Use of DNA microarrays, wherein the expression levels of Modeling of DNA microarray data by using physical properties of hybridization G. A. Held*, G. Grinstein, and Y. Tu IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 Communicated by Charles

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing Gene Regulation Solutions Microarrays and Next-Generation Sequencing Gene Regulation Solutions The Microarrays Advantage Microarrays Lead the Industry in: Comprehensive Content SurePrint G3 Human Gene

More information

Comparison of Affymetrix GeneChip Expression Measures

Comparison of Affymetrix GeneChip Expression Measures Johns Hopkins University, Dept. of Biostatistics Working Papers 9-1-2005 Comparison of Affymetrix GeneChip Expression Measures Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health, Department

More information

A learned comparative expression measure for Affymetrix GeneChip DNA microarrays

A learned comparative expression measure for Affymetrix GeneChip DNA microarrays Proceedings of the Computational Systems Bioinformatics Conference, August 8-11, 2005, Stanford, CA. pp. 144-154. A learned comparative expression measure for Affymetrix GeneChip DNA microarrays Will Sheffler

More information

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression

More information

Quality Measures for CytoChip Microarrays

Quality Measures for CytoChip Microarrays Quality Measures for CytoChip Microarrays How to evaluate CytoChip Oligo data quality in BlueFuse Multi software. Data quality is one of the most important aspects of any microarray experiment. This technical

More information

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology. SIMS2003 Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School Introduction to Microarray Technology. Lecture 1 I. EXPERIMENTAL DETAILS II. ARRAY CONSTRUCTION III. IMAGE ANALYSIS Lecture

More information

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods Companion lecture to the textbook: Fundamentals of BioMEMS and Medical Microdevices, by Prof., http://saliterman.umn.edu/

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Gene Expression Analysis Superior Solutions for any Project

Gene Expression Analysis Superior Solutions for any Project Gene Expression Analysis Superior Solutions for any Project Find Your Perfect Match ArrayXS Global Array-to-Go Focussed Comprehensive: detect the whole transcriptome reliably Certified: discover exceptional

More information

Reliable classification of two-class cancer data using evolutionary algorithms

Reliable classification of two-class cancer data using evolutionary algorithms BioSystems 72 (23) 111 129 Reliable classification of two-class cancer data using evolutionary algorithms Kalyanmoy Deb, A. Raji Reddy Kanpur Genetic Algorithms Laboratory (KanGAL), Indian Institute of

More information

Estoril Education Day

Estoril Education Day Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/

More information

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

Exploration, normalization, and summaries of high density oligonucleotide array probe level data Biostatistics (2003), 4, 2,pp. 249 264 Printed in Great Britain Exploration, normalization, and summaries of high density oligonucleotide array probe level data RAFAEL A. IRIZARRY Department of Biostatistics,

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

DNA Arrays Affymetrix GeneChip System

DNA Arrays Affymetrix GeneChip System DNA Arrays Affymetrix GeneChip System chip scanner Affymetrix Inc. hybridization Affymetrix Inc. data analysis Affymetrix Inc. mrna 5' 3' TGTGATGGTGGGAATTGGGTCAGAAGGACTGTGGGCGCTGCC... GGAATTGGGTCAGAAGGACTGTGGC

More information

A GENOTYPE CALLING ALGORITHM FOR AFFYMETRIX SNP ARRAYS

A GENOTYPE CALLING ALGORITHM FOR AFFYMETRIX SNP ARRAYS Bioinformatics Advance Access published November 2, 2005 The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

TAGUCHI APPROACH TO DESIGN OPTIMIZATION FOR QUALITY AND COST: AN OVERVIEW. Resit Unal. Edwin B. Dean

TAGUCHI APPROACH TO DESIGN OPTIMIZATION FOR QUALITY AND COST: AN OVERVIEW. Resit Unal. Edwin B. Dean TAGUCHI APPROACH TO DESIGN OPTIMIZATION FOR QUALITY AND COST: AN OVERVIEW Resit Unal Edwin B. Dean INTRODUCTION Calibrations to existing cost of doing business in space indicate that to establish human

More information

A Greedy Algorithm for Minimizing the Number of Primers in Multiple PCR Experiments

A Greedy Algorithm for Minimizing the Number of Primers in Multiple PCR Experiments A Greedy Algorithm for Minimizing the Number of Primers in Multiple PCR Experiments Koichiro Doi Hiroshi Imai doi@is.s.u-tokyo.ac.jp imai@is.s.u-tokyo.ac.jp Department of Information Science, Faculty of

More information

Expression Array System

Expression Array System Integrated Science for Gene Expression Applied Biosystems Expression Array System Expression Array System SEE MORE GENES The most complete, most sensitive system for whole genome expression analysis. The

More information

Soybean Microarrays. An Introduction. By Steve Clough. November Common Microarray platforms

Soybean Microarrays. An Introduction. By Steve Clough. November Common Microarray platforms Soybean Microarrays Microarray construction An Introduction By Steve Clough November 2005 Common Microarray platforms cdna: spotted collection of PCR products from different cdna clones, each representing

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Feature Selection of Gene Expression Data for Cancer Classification: A Review Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

ADVANCED STATISTICAL METHODS FOR GENE EXPRESSION DATA

ADVANCED STATISTICAL METHODS FOR GENE EXPRESSION DATA ADVANCED STATISTICAL METHODS FOR GENE EXPRESSION DATA Veera Baladandayuthapani & Kim-Anh Do University of Texas M.D. Anderson Cancer Center Houston, Texas, USA veera@mdanderson.org Course Website: http://odin.mdacc.tmc.edu/

More information

HELP Microarray Analytical Tools

HELP Microarray Analytical Tools HELP Microarray Analytical Tools Reid F. Thompson October 30, 2017 Contents 1 Introduction 2 2 Changes for HELP in current BioC release 3 3 Data import and Design information 4 3.1 Pair files and probe-level

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip Reproducibility of Extraction of Genomic DNA from Whole Blood samples in EDTA using FUJIFILM membrane technology on the QuickGene-810

More information

Inherent variation in the reactions, type of enzymes used. Depends on the type of labeling and procedures, as well as the age of the labels.

Inherent variation in the reactions, type of enzymes used. Depends on the type of labeling and procedures, as well as the age of the labels. 332 Experimental design, analysis of variance and slide quality assessment in gene expression arrays Sorin Draghici*, Alexander Kuklin, Bruce Hoff & Soheil Shams Address BioDiscovery Inc 11150 West Olympic

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Michigan State University February 15, 2005 Outline for Today's Talk A brief introduction to

More information

Quantitative PCR Analysis of Meat Speciation Data Using the 2 - Cq Method

Quantitative PCR Analysis of Meat Speciation Data Using the 2 - Cq Method Quantitative PCR Analysis of Meat Speciation Data Using the 2 - Cq Method Abstract In recent years the food production industry has been placed under intense scrutiny. This has led to an increased pressure

More information

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6 Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6 The example data set used in this tutorial consists of 6 technical replicates from the same human cell line, 3 are SP1 treated, and 3

More information

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology - Methods of Biomaterials Testing Lesson 3-5 Biochemical Methods - Molecular Biology - Chromosomes in the Cell Nucleus DNA in the Chromosome Deoxyribonucleic Acid (DNA) DNA has double-helix structure The

More information

Comparison of Normalization Methods in Microarray Analysis

Comparison of Normalization Methods in Microarray Analysis Comparison of Normalization Methods in Microarray Analysis Comparison of Normalization Methods in Micro array Analysis By Rong Yang, B.S. A Project Submitted to the School of Graduate Studies in Partial

More information

Introduction to Assay Development

Introduction to Assay Development Introduction to Assay Development A poorly designed assay can derail a drug discovery program before it gets off the ground. While traditional bench-top assays are suitable for basic research and target

More information

DNA Microarray Technology

DNA Microarray Technology CHAPTER 1 DNA Microarray Technology All living organisms are composed of cells. As a functional unit, each cell can make copies of itself, and this process depends on a proper replication of the genetic

More information

Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks

Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks BIOINFORMATICS Vol. 23 ISMB/ECCB 2007, pages i282 i288 doi:10.1093/bioinformatics/btm201 Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks Wei Keat

More information

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays Application Note Authors Bahram Arezi, Nilanjan Guha and Anne Bergstrom Lucas Agilent Technologies Inc. Santa

More information

An Automatic Microarray Image Gridding Technique Based on Continuous Wavelet Transform

An Automatic Microarray Image Gridding Technique Based on Continuous Wavelet Transform An Automatic Microarray Image Gridding Technique Based on Continuous Wavelet Transform Emmanouil Athanasiadis 1, Dionisis Cavouras 2, Panagiota Spyridonos 1, Ioannis Kalatzis 2, and George Nikiforidis

More information

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

Measuring transcriptomes with RNA-Seq. BMI/CS 776  Spring 2016 Anthony Gitter Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu Overview RNA-Seq technology The RNA-Seq quantification problem Generative

More information

Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data

Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data Protocol Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data Kazuro Shimokawa, Rimantas Kodzius, Yonehiro Matsumura, and Yoshihide Hayashizaki This protocol was adapted from

More information

What you still might want to know about microarrays. Brixen 2011 Wolfgang Huber EMBL

What you still might want to know about microarrays. Brixen 2011 Wolfgang Huber EMBL What you still might want to know about microarrays Brixen 2011 Wolfgang Huber EMBL Brief history Late 1980s: Lennon, Lehrach: cdnas spotted on nylon membranes 1990s: Affymetrix adapts microchip production

More information

New Stringent Two-Color Gene Expression Workflow Enables More Accurate and Reproducible Microarray Data

New Stringent Two-Color Gene Expression Workflow Enables More Accurate and Reproducible Microarray Data Application Note GENOMICS INFORMATICS PROTEOMICS METABOLOMICS A T C T GATCCTTC T G AAC GGAAC T AATTTC AA G AATCTGATCCTTG AACTACCTTCCAAGGTG New Stringent Two-Color Gene Expression Workflow Enables More

More information

Enhanced Biclustering on Expression Data

Enhanced Biclustering on Expression Data Enhanced Biclustering on Expression Data Jiong Yang Haixun Wang Wei Wang Philip Yu UIUC IBM T. J. Watson UNC Chapel Hill IBM T. J. Watson jioyang@cs.uiuc.edu haixun@us.ibm.com weiwang@cs.unc.edu psyu@us.ibm.com

More information

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Reesab Pathak Dept. of Computer Science Stanford University rpathak@stanford.edu Abstract Transcription factors are

More information

An overview of image-processing methods for Affymetrix GeneChips

An overview of image-processing methods for Affymetrix GeneChips An overview of image-processing methods for Affymetrix GeneChips Jose M. Arteaga-Salas 1,$, Harry Zuzan 2,$, William B. Langdon 1,3, Graham J. G. Upton 1 and Andrew P. Harrison 1,3,* 1 Department of Mathematical

More information

Overview Three-Sigma Quality

Overview Three-Sigma Quality Manufacturing Environment Reliability: The Other Dimension of Quality Today s manufacturers face: William Q. Meeker Iowa State University Fall Technical Conference Youden Memorial Address October 7, 00

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.

More information

Guidelines for setting up microrna profiling experiments v2.0

Guidelines for setting up microrna profiling experiments v2.0 Guidelines for setting up microrna profiling experiments v2.0 December 2010 Table of contents 2 Experimental setup................................................ 3 Single-color experiments........................................

More information

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme Interactomics and Proteomics 1. Interactomics The field of interactomics is concerned with interactions between genes or proteins. They can be genetic interactions, in which two genes are involved in the

More information

Biological immune systems

Biological immune systems Immune Systems 1 Introduction 2 Biological immune systems Living organism must protect themselves from the attempt of other organisms to exploit their resources Some would-be exploiter (pathogen) is much

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

For research use only. Not for use in diagnostic procedures. AFFYMETRIX UK Ltd., AFFYMETRIX, INC.

For research use only. Not for use in diagnostic procedures. AFFYMETRIX UK Ltd., AFFYMETRIX, INC. AFFYMETRIX, INC. 3380 Central Expressway Santa Clara, CA 95051 USA Tel: 1-888-362-2447 (1-888-DNA-CHIP) Fax: 1-408-731-5441 sales@affymetrix.com support@affymetrix.com AFFYMETRIX UK Ltd., Voyager, Mercury

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Supplemental Information. Boundary Formation through a Direct. Threshold-Based Readout. of Mobile Small RNA Gradients

Supplemental Information. Boundary Formation through a Direct. Threshold-Based Readout. of Mobile Small RNA Gradients Developmental Cell, Volume 43 Supplemental Information Boundary Formation through a Direct Threshold-Based Readout of Mobile Small RNA Gradients Damianos S. Skopelitis, Anna H. Benkovics, Aman Y. Husbands,

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

DEPArray Technology. Sorting and Recovery of Rare Cells

DEPArray Technology. Sorting and Recovery of Rare Cells DEPArray Technology Sorting and Recovery of Rare Cells Delivering pure, single, viable cells The DEPArray system from Silicon Biosystems is the only automated instrument that can identify, quantify, and

More information