Heterogeneity of Variance in Gene Expression Microarray Data
|
|
- Jonah Gilmore
- 6 years ago
- Views:
Transcription
1 Heterogeneity of Variance in Gene Expression Microarray Data DavidM.Rocke Department of Applied Science and Division of Biostatistics University of California, Davis March 15, 2003 Motivation Abstract One important problem in the analysis of gene expression microarray data is that the variation in expression under constant conditions is not stable from gene to gene. Recently variance stabilizing transformations have been developed that can remove the systematic dependence of the variance on the mean, but it appears that there is still considerable variance heterogeneity that can interfere with global analysis of expression data. Results We develop a method consisting of a variance stabilizing data transformation followed by empirical Bayes estimation of gene-specific variances that is more powerful than using data from that gene alone, but does not suffer from the bias caused by the use of global error models. Availability R code will be available from the author by or on the website Contact dmrocke@ucdavis.edu. 1
2 1. Introduction Consider a set of microarray experiments of n arrays each with p genes. For each gene considered separately we entertain a statistical model which is linear in a set of factors or variables that are attached to the arrays, so that the statistical model is common to all genes. Assume that the expression data have been transformed so that the variances neither increase nor decrease systematically with the mean expression of the gene. Given a statistically hypothesis framed within the linear model for each gene, there is almost always an exact or approximate F -test, in which the numerator can be calculated from the cell means of the data for a particular gene, and the denominator (if the test is conductedinisolationforeachgene)isafunctionofthedeviationsofthedatafromthe cell mean (Kerr 2003). An alternative approach with variance-stabilized data is to obtain the numerator of the test from the particular gene, but obtain the denominator from a global error model (in this case, constant variance). This increases the power of the tests considerably because the variance estimates will be based on thousands of points, not just a few. However, it introduces possible biases if the variances are not truly homogeneous (Kerr 2003; Kerr, Martin, and Churchill 2000). A compromise between power and bias may be obtained by using variance estimates for the denominators of the F -test that are a compromise themselves between the gene-specific variance and the global variance. 2. A Motivating Example We consider an experiment in which cell lines in four conditions are to be compared. There are two observations for each of the four conditions consisting of an Affymetrix U95A GeneChip for each sample. For the sake of illustration, we will consider the MAS 4.0 average difference summary, one main advantage of which is that it does not artificially compress the low-level data. One goal of the analysis is to determine what genes are differentially expressed among the four conditions. A standard approach if we consider only one gene would be to perform a one-way analysis of variance (ANOVA). However, a standard assumption of that standard analysis is that the variance at the different levels is the same. In the case of microarray data, there is a strong dependence of the variance on the mean, as is shown for these data by Figures 1 and 2, which give the difference of replicates in a gene-by-group condition vs. the sum. This type of variability can be removed by the generalized log (glog) transform introduced independently by Durbin et al. (2002), Hawkins (2002), Huber et al. (2002), and Munson (2001), and further developed in Durbin and Rocke (2003a; 2003b), Geller et al. (2003) and Rocke and Durbin (2003a; 2003b). Figure 3 shows the same sum/difference data after transforming by the glog with a parameter of λ = 1225 estimated by maximum likelihood (Durbin and Rocke 2003a). 2
3 MSE Source TWER FWER FDR Gene-Specific Global Posterior Table 1: Number of genes out of 12,625 significant at the 5% level for three methods of estimating the MSE in a microarray experiment. Column 2 is the raw p-values with test-wise error rate (TWER) 5%. Column 3 give the family-wise error rate (FWER) using the Bonferroni inequality, and column 4 is the set of genes nominated as significant by the false-discovery-rate (FDR) method of Benjamin and Hochberg (1995; see also Reiner et al. 2003). At this point, one could reasonably perform an ANOVA for gene i using the model z ijk = β j + ² jk, (2.1) whereherethez ijk are additively-normalized, glog-transformed expression values. In this way, we obtain 12,625 F-tests of the null hypothesis of equal expression for all groups in which we compare the mean square for groups from gene i (MSG i )tothemeansquarefor error from gene i (MSE i ) by referring the ratio MSG i /MSE i to an F distribution with 3 and 4 degrees of freedom. This procedure should be valid, and after an adjustment for multiplicity, the results could be used directly. Figure 4 gives a histogram of the 12,625 p-values showing that certainly some of them represent real effects. The first line of figures in Table 1 shows that, at the 5% level, 1 gene is significant using the Bonferroni method, and 18 are significant using the FDR method of Benjamin and Hochberg (1995; see also Reiner et al. 2003). A possible objection to this procedure is that we are losing power by not employing information from other genes. If we employ the perspective of Kerr, Martin, and Churchill (2000), we could estimate the model z ijk = µ i + n k + β ij + ² ijk (2.2) where the z ijk are glog-transformed (unnormalized) expression values, the normalization is part of the ANOVA (the n k terms), and the group effects are in the gene-by-group interaction terms β ij (Kerr 2003). This analysis gives as another mean square for error that we could use as a denominator, in which case the F-statistics for each gene separately would have 3 and 50,493 df. Figure 5 shows the histogram of the p-values using this method. The excess of very small F-statistics is a sign that the model is incorrect. In this case, the assumption that all genes have the same MSE is almost certainly false. Use of an average MSE, when small or large ones will be more appropriate, will lead to an excess of p-values at both ends. In the second line of figures in Table 1, the number of genes nominated as significant is much greater for each of the three methods than when the gene-specific MSE 3
4 is used. It is likely that some of these are mistakes, being due to a large true gene-specific MSE being coupled with using an average MSE as a denominator instead of an unbiased gene-specific MSEestimate. The average value over all 12,625 genes of the MSE is , which is also the residual MSE from the global model. If the 4df estimates from each gene had the distribution predicted from normality and constant true variance, the variance of these MSE estimates across genes would be 2σ 4 /ν = (0.1017) 2 /2 = Instead, it is , nearly 10 times the size it should be. Of the two simple explanations for this: nonnormality and heterogeneity of variance, the latter is the simpler possibility. We now proceed to account for this situation using a standard empirical Bayes estimate for the individual gene MSE. 3. The Modeling Setup Given n genes indexed by i, supposethatthetruevarianceoftheeffect of interest for gene i is σi 2.Foreachi we obtain a ν degree-of-freedom estimate s2 i of σ2 i. We will work in the Gaussian framework for convenience, in which case we may assume that s 2 i has a gamma distribution with parameters τ (the mean) and a = ν/2 (the shape parameter). Again for simplicity,wetreatthecasewhereν is constant across genes. Though the case where ν varies is not conceptually more difficult, the computations are more complex. We model these individual values σi 2 = τ i as random with an inverse gamma distribution with parameters α and η = αβ. Notethatη isthemeanoftheinverseofτ (the reciprocal variance 1/τ is sometimes called the precision). With this as a prior distribution, and an observed value s 2 i, the posterior distribution for τ is proportional to e 1/τβ τ ν/2+α+1 (3.1) where β 2 = xν +2α/η Thus, the posterior distribution is inverse gamma, like the prior, with parameters (3.2) Also 1 η α = ν/2+α (3.3) β = 2 xν +2α/η (3.4) η = α β = ν +2α xν +2α/η (3.5) xν +2α/η = ν +2α µ ν = x + 1 µ 2α ν +2α η ν +2α 4 (3.6) (3.7)
5 Now x here is an observed value of s 2 i,and1/η is the reciprocal of the mean prior precision, which is thus an estimate of the center of the prior distribution for τ i = σi 2.Also ν is the degrees of freedom of s 2 i and 2α is the equivalent degrees of freedom of the prior. Thus, the posterior estimate of the variance used here will be a weighted average of the individual variance and the prior mean reciprocal precision, each weighted by its degrees of freedom. This method of estimation of a variance using an inverse gamma conjugate prior is completely standard (Carlin and Lewis 2000; Gelman et al. 1995), and has been used previously in a microarray context by Baldi and Long (2001). The first two references give more detail on the derivation of the posterior in this case. 4. Empirical Estimation of the Prior To complete the empirical Bayes estimation procedure, we need to specify how we estimate the parameters of the prior from the ensemble of variances. If each observed variance s 2 i has a gamma distribution F i with parameters τ and a = ν/2, and if the prior distribution G of τ is inverse gamma with parameters α and β then E(s 2 i ) = V (s 2 i ) = 1 β(α 1) 2(α 1)/ν +1 β 2 (α 1) 2 (α 2) (4.1) If an ensemble of variances has mean M and variance V, then a method of moments estimate of α and β is given by solving M = V = 1 β(α 1) 2(α 1)/ν +1 β 2 (α 1) 2 (α 2) (4.2) for α and β. This leads to ˆα = M 2 (1 2/ν)+2V V 2m 2 /ν 1 ˆβ = M(ˆα 1) (4.3) as method-of-moments estimates. If the variances were homogeneous, then we would have that V 2M 2 /ν. If the either the denominator or the numerator is negative, that is presumably a sign that there is not an important amount of heterogeneity in the variances. However, usually both will be bounded well away from zero. 5
6 5. The Example Continued For the example data set, the mean of the 12,625 values of the residual MSE is and the variance of the same collection is Using (4.3), we obtain ˆα = ˆβ = ˆη = /ˆν = The degrees of freedom of the prior is 2α =4.615, so for each gene i,weobtainan8.6dfmse estimate by taking a weighted average of the 4df MSE from the ANOVA of that gene (with weight 4/8.6), and the prior best estimate (with weight4.6/8.6). Figure 6 shows the histogram of the p-values obtained by this method, which shows no sign of distortion at the high p-value end. Comparing the three methods shown in Table 1, we see that the global MSE estimate rejects the most genes, but Figure 5 shows that these rejections cannot be trusted. The posterior best estimate MSE identifies a much larger number of genes as differentially expressed than using 4df gene-specific MSE s, without apparent signs of problems with maintaining thesizeofthetests. 6. Concluding Remarks Bayesian and empirical Bayesian methods are frequently proposed for the analysis of microarray data (for example, Baldi and Long 2001; Broët et al. 2002; Efron et al. 2002; Ibrahim et al. 2002; Newton et al. 2001, 2003; Theilhaber et al. 2001). What is proposed here is a sort of minimal empirical Bayesian approach. We do not need to put a prior distribution on the mean expression across genes or on the probability of positive expression, since this is handled by the multiplicity-adjusted F-tests. Our approach resembles most closely the treatment in Baldi and Long (2001). However, their use of the log transform resulted in substantial dependence of the variance on the mean, whereas by use of the glog transform, we have removed at least most of this dependence. This makes the Bayesian model fit the data better than in their case. We have written code in the R language (Ihaka and Gentleman 1996) that implements many of the required calculations in standard situations. They will be available from the author by or on the website Acknowledgements The research reported in this paper was supported by grants from the National Science Foundation (ACI , and DMS ) and the National Institute of Environmental Health Sciences, National Institutes of Health (P43 ES04699). 6
7 Appendix: The Gamma and Inverse Gamma Distributions The gamma distribution with parameters α and β has density The first two moments are given by f X (x) = xα 1 e x/β Γ(α)β α (.1) E(X) = αβ = τ (.2) V (X) = αβ 2 = τ 2 /α (.3) The inverse gamma distribution with parameters α and β is the distribution of Y =1/X where X is gamma distributed with parameters α and β. The density of Y is The first two moments are given by f Y (y) = e 1/yβ Γ(α)β α y α+1 (.4) E(Y ) = V (Y ) = 1 β(α 1) 1 β 2 (α 1) 2 (α 2) (.5) (.6) We will re-parametrize in terms of α and η = αβ, which is the mean of the reciprocal of the inverse gamma variate. We then have that the density is f Y (y) = e α/yη Γ(α)(η/α) α y α+1 (.7) The first two moments are given in this parametrization by E(Y ) = V (Y ) = α η(α 1) α 2 η 2 (α 1) 2 (α 2) (.8) (.9) References Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inference of gene changes, Bioinformatics, 17,
8 Benjamani, Y. and Hochberg, Y. (1995) Controlling the false discovery rate, Journal of the Royal Statistical Society, Series B, 57, Broët, P., Richardson, S., and Radvanyi, F. (2002) Bayesian hierarchical model for identifying changes in gene expression from microarray experiments, Journal of Computational Biology, 9, Carlin,B.P.andThomas,L.A.(2000)Bayes and Empirical Bayes Methods for Data Analysis, Second Edition, New York: Chapman and Hall. Durbin, B.P., Hardin, J.S., Hawkins, D.M., and Rocke, D.M. (2002) A variance-stabilizing transformation for gene-expression microarray data, Bioinformatics, 18, S105 S110. Durbin, B. and Rocke, D. M. (2003a) Estimation of transformation parameters for microarray data, Bioinformatics, in press. Durbin, B. and Rocke, D. M. (2003b) Exact and approximate variance-stabilizing transformations for two-color microarrays, submitted for publication. Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2002) Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Association, 96, Geller, S.C., Gregg, J.P., Hagerman, P.J., and Rocke, D.M. (2003) Transformation and normalization of oligonucleotide microarray data, submitted for publication. Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995) Bayesian Data Analysis, New York: Chapman and Hall. Hawkins, D.M. (2002) Diagnostics for conformity of paired quantitative measurements, Statistics in Medicine, 21, Holder,D.,Raubertas,R.F.,Pikounis,V.B.,Svetnik,V.,andSoper,K.(2001) Statistical analysis of high density oligonucleotide arrars: A SAFER approach, GeneLogic Workshop on Low Level Analysis of Affymetrix GeneChip Data. Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A., and Vingron, M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics, 18, S96 S104. Ibrahim, J.G., Chen, M.-H., and Gray, R.J. (2002) Bayesian models for gene expression with microarray data, Journal of the American Statistical Association, 97, Ihaka, R. and Gentleman, R. (1996) R: A language for data analysis and graphics, Journal of Computational and Graphical Statistics, 5, (See 8
9 Kerr, M.K. (2003) Linear models for microarray data analysis: Hidden similarity and differences, University of Washington Biostatistics Working Paper 190. Kerr, M.K., Martin, M., and Churchill, G.A. (2000) Analysis of variance for gene expression microarray data, Journal of Computational Biology, 7, Munson, P. (2001) A Consistency Test for Determining the Significance of Gene Expression Changes on Replicate Samples and Two Convenient Variance-stabilizing Transformations, GeneLogic Workshop on Low Level Analysis of Affymetrix GeneChip Data. Newton,M.A.,Kendziorski,C.M.,Richmond,C.S.,Blattner,F.R.,andTsui,K.W.(2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data, Journal of Computational Biology, 8, Newton, M.A., Noueiry, A., Sarkar, D., and Ahlquist, P. (2003) Detecting differential gene expression with a semiparametric heirarchical mixture model, manuscript. Reiner, A., Yekutieli, D. and Benjamini, Y. (2003) Identifying differntially expressed genes using false discovery rate controllling procedures, Bioinformatics, 19, Rocke, D., and Durbin, B. (2001) A model for measurement error for gene expression arrays, Journal of Computational Biology, 8, Rocke, D. and Durbin, B. (2003) Approximate variance-stabilizing transformations for gene-expression microarray data, Bioinformatics, in press. Theilhaber, J., Bushnell, S., Jackson, A., and Fuchs, R. (2001) Bayesian estimation of fold changes in the analysis of gene expression: The PFOLD algorithm, Journal of Computational Biology, 8,
10 List of Figures 1. Absolute difference in replicates versus the sum for the 12,625 4 gene-by-group combinations. 2. Absolute difference in replicates versus the rank of the sum for the 12,625 4 geneby-group combinations. 3. Absolute difference in replicates versus the rank of the sum for the 12,625 4 geneby-group combinations after transformation by the glog with λ = Histogram of p-values for 12,625 F-tests using gene-specific MSE. 5. Histogram of p-values for 12,625 F-tests using global MSE. 6. Histogram of p-values for 12,625 F-tests using posterior best-estimate MSE. 10
11 Difference Sum Raw Data
12 Difference Rank of Sum Raw Data
13 Difference Rank of Sum Glog of Data
14 Histogram of Gene-Specific p-values Raw p-values Frequency
15 Histogram of Global p-values Raw p-values Frequency
16 Histogram of Posterior p-values Raw p-values Frequency
Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods
Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods EPP 245/298 Statistical Analysis of Laboratory Data October 11, 2005 1 The
More informationSome Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods
Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods BST 226 Statistical Methods for Bioinformatics January 8, 2014 1 The -Omics
More informationSome Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods
Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods SPH 247 Statistical Analysis of Laboratory Data April 21, 2015 1 The -Omics
More informationDesign and analysis of experiments with high throughput biological assay data
Seminars in Cell & Developmental Biology 15 (2004) 703 713 Design and analysis of experiments with high throughput biological assay data David M. Rocke Division of Biostatistics, University of California,
More informationSTATISTICAL CHALLENGES IN GENE DISCOVERY
STATISTICAL CHALLENGES IN GENE DISCOVERY THROUGH MICROARRAY DATA ANALYSIS 1 Central Tuber Crops Research Institute,Kerala, India 2 Dept. of Statistics, St. Thomas College, Pala, Kerala, India email:sreejyothi
More informationMicroarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world.
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Preprocessing and normalization A trailer show of the rest of the microarray world Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6
More informationIntroduction to microarrays
Bayesian modelling of gene expression data Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) Philippe Broët (INSERM, Paris) In collaboration with Anne-Mette Hein, Natalia
More informationSignificance testing for small microarray experiments
CHAPTER 8 Significance testing for small microarray experiments Charles Kooperberg, Aaron Aragaki, Charles C. Carey, and Suzannah Rutherford 8.1 Introduction When a study has many degrees of freedom it
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure
More informationGene Expression Data Analysis
Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based
More informationIntroduction to Quantitative Genomics / Genetics
Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current
More informationIdentification of biological themes in microarray data from a mouse heart development time series using GeneSifter
Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study
More informationNima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi
Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics
More informationCS-E5870 High-Throughput Bioinformatics Microarray data analysis
CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the
More informationData-Adaptive Estimation and Inference in the Analysis of Differential Methylation
Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics
More informationReview Statistical tests for differential expression in cdna microarray experiments Xiangqin Cui and Gary A Churchill
Review Statistical tests for differential expression in cdna microarray experiments Xiangqin Cui and Gary A Churchill Address: The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA. Correspondence:
More informationComparison of Microarray Pre-Processing Methods
Comparison of Microarray Pre-Processing Methods K. Shakya, H. J. Ruskin, G. Kerr, M. Crane, J. Becker Dublin City University, Dublin 9, Ireland Abstract Data pre-processing in microarray technology is
More informationDownloaded from:
Lewin, A; Richardson, S; Marshall, C; Glazier, A; Aitman, T (2006) Bayesian modeling of differential gene expression. Biometrics, 62 (1). pp. 1-9. ISSN 0006-341X DOI: https://doi.org/10.1111/j.1541-0420.2005.00394.x
More informationDesigning a Complex-Omics Experiments. Xiangqin Cui. Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham
Designing a Complex-Omics Experiments Xiangqin Cui Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham 1/7/2015 Some slides are from previous lectures of Grier
More informationSeven Keys to Successful Microarray Data Analysis
Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment
More informationMixture modeling for genome-wide localization of transcription factors
Mixture modeling for genome-wide localization of transcription factors Sündüz Keleş 1,2 and Heejung Shim 1 1 Department of Statistics 2 Department of Biostatistics & Medical Informatics University of Wisconsin,
More informationLab 1: A review of linear models
Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need
More informationBayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples
Biometrics 62, 10 18 March 2006 DOI: 10.1111/j.1541-0420.2005.00397.x Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples Raphael Gottardo, 1, Adrian E. Raftery,
More informationGene Expression Data Analysis (I)
Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment
More informationDesigning Complex Omics Experiments
Designing Complex Omics Experiments Xiangqin Cui Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham 6/15/2015 Some slides are from previous lectures given by
More informationOutline. Analysis of Microarray Data. Most important design question. General experimental issues
Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,
More informationOptimal alpha reduces error rates in gene expression studies: a meta-analysis approach
Mudge et al. BMC Bioinformatics (2017) 18:312 DOI 10.1186/s12859-017-1728-3 METHODOLOGY ARTICLE Open Access Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach J. F.
More informationIntroduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics
Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized
More informationNormalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612
Normalization Getting the numbers comparable The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression
More informationFEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET
Johns Hopkins University, Dept. of Biostatistics Working Papers 3-17-2006 FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYETRIX GENECHIP CONTROL DATASET Rafael A. Irizarry Johns Hopkins Bloomberg School
More informationMulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.
MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor. Claudio Isella, Tommaso Renzulli, Davide Corà and Enzo Medico May 3, 2016 Abstract Many microarray experiments compare
More informationMeta-analysis combines Affymetrix microarray results across laboratories
Comparative and Functional Genomics Comp Funct Genom 2005; 6: 116 122. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.460 Conference Paper Meta-analysis combines
More informationBackground and Normalization:
Background and Normalization: Investigating the effects of preprocessing on gene expression estimates Ben Bolstad Group in Biostatistics University of California, Berkeley bolstad@stat.berkeley.edu http://www.stat.berkeley.edu/~bolstad
More informationA Statistical Framework for the Analysis of Microarray Probe-Level Data
Johns Hopkins University, Dept. of Biostatistics Working Papers 3-1-2005 A Statistical Framework for the Analysis of Microarray Probe-Level Data Zhijin Wu Department of Biostatistics, Johns Hopkins Bloomberg
More informationExam 1 from a Past Semester
Exam from a Past Semester. Provide a brief answer to each of the following questions. a) What do perfect match and mismatch mean in the context of Affymetrix GeneChip technology? Be as specific as possible
More informationSTATISTICAL ANALYSIS OF 70-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS
STATISTICAL ANALYSIS OF 7-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS Hongmei Jiang 1, Jianlin Wang, Lu Tian, Z. Jeffrey Chen, and R.W. Doerge 1 1 Department
More informationBootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments
Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments M. Kathleen Kerr The Jackson Laboratory Bar Harbor, Maine 469 U.S.A. mkk@jax.org Gary A. Churchill 1
More informationBayesian Analysis of Comparative Microarray Experiments by Model Averaging
Bayesian Analysis (2006) 1, Number 4, pp. 707 732 Bayesian Analysis of Comparative Microarray Experiments by Model Averaging Paola Sebastiani, Hui Xie and Marco F Ramoni Abstract. A major challenge to
More informationV10-8. Gene Expression
V10-8. Gene Expression - Regulation of Gene Transcription at Promoters - Experimental Analysis of Gene Expression - Statistics Primer - Preprocessing of Data - Differential Expression Analysis Fri, May
More informationQTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.
QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA www.biostat.jhsph.edu/ kbroman Outline Experiments, data, and goals Models ANOVA at marker
More informationExperimental Design Day 2
Experimental Design Day 2 Experiment Graphics Exploratory Data Analysis Final analytic approach Experiments with a Single Factor Example: Determine the effects of temperature on process yields Case I:
More informationArticle: Differential Expression with the Bioconductor Project
Section: Computational Methods for High Throughput Genetic Analysis - Expression profiling Article: Differential Expression with the Bioconductor Project Anja von Heydebreck 1, Wolfgang Huber 2, Robert
More informationNew Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays
GENE EXPRESSION MONITORING TECHNICAL NOTE New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays Introduction Affymetrix has designed new algorithms for monitoring GeneChip
More informationPreprocessing Methods for Two-Color Microarray Data
Preprocessing Methods for Two-Color Microarray Data 1/15/2011 Copyright 2011 Dan Nettleton Preprocessing Steps Background correction Transformation Normalization Summarization 1 2 What is background correction?
More informationJoint Estimation of Calibration and Expression for High-Density Oligonucleotide Arrays
Joint Estimation of Calibration and Expression for High-Density Oligonucleotide Arrays Ann L. Oberg, Douglas W. Mahoney, Karla V. Ballman, Terry M. Therneau Department of Health Sciences Research, Division
More informationMicroarray analysis challenges.
Microarray analysis challenges. While not quite as bad as my hobby of ice climbing you, need the right equipment! T. F. Smith Bioinformatics Boston Univ. Experimental Design Issues Reference and Controls
More informationLecture 2: March 8, 2007
Analysis of DNA Chips and Gene Networks Spring Semester, 2007 Lecture 2: March 8, 2007 Lecturer: Rani Elkon Scribe: Yuri Solodkin and Andrey Stolyarenko 1 2.1 Low Level Analysis of Microarrays 2.1.1 Introduction
More informationQTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.
QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA www.biostat.jhsph.edu/ kbroman Outline Experiments, data, and goals Models ANOVA at marker
More informationOligonucleotide microarray data are not normally distributed
Oligonucleotide microarray data are not normally distributed Johanna Hardin Jason Wilson John Kloke Abstract Novel techniques for analyzing microarray data are constantly being developed. Though many of
More informationIntroduction to Bioinformatics. Fabian Hoti 6.10.
Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction
More informationMicroarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and
More informationComparative analysis of RNA-Seq data with DESeq2
Comparative analysis of RNA-Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA-Seq Discovery find new transcripts find transcript boundaries find splice junctions Comparison Given
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review
More informationTHREE LEVEL HIERARCHICAL BAYESIAN ESTIMATION IN CONJOINT PROCESS
Please cite this article as: Paweł Kopciuszewski, Three level hierarchical Bayesian estimation in conjoint process, Scientific Research of the Institute of Mathematics and Computer Science, 2006, Volume
More informationFrom CEL files to lists of interesting genes. Rafael A. Irizarry Department of Biostatistics Johns Hopkins University
From CEL files to lists of interesting genes Rafael A. Irizarry Department of Biostatistics Johns Hopkins University Contact Information e-mail Personal webpage Department webpage Bioinformatics Program
More informationSome Statistical Issues in Microarray Gene Expression Data
From the SelectedWorks of Jeffrey S. Morris June, 2006 Some Statistical Issues in Microarray Gene Expression Data Matthew S. Mayo, University of Kansas Medical Center Byron J. Gajewski, University of Kansas
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction
More informationCombining ANOVA and PCA in the analysis of microarray data
Combining ANOVA and PCA in the analysis of microarray data Lutgarde Buydens IMM, Analytical chemistry Radboud University Nijmegen, the Netherlands Scientific Staff: PhD students: External PhD: Post doc:
More informationMultiple Testing in RNA-Seq experiments
Multiple Testing in RNA-Seq experiments O. Muralidharan et al. 2012. Detecting mutations in mixed sample sequencing data using empirical Bayes. Bernd Klaus Institut für Medizinische Informatik, Statistik
More informationRaking and Selection of Differentially Expressed Genes from Microarray Data
Proceedings of the 6 WSEAS International Conference on Mathematical Biology and Ecology, Miami, Florida, USA, January 8-, 6 (pp4-45) Raking and Selection of Differentially Expressed Genes from Microarray
More informationStatistical Methods in Bioinformatics
Statistical Methods in Bioinformatics CS 594/680 Arnold M. Saxton Department of Animal Science UT Institute of Agriculture Bioinformatics: Interaction of Biology/Genetics/Evolution/Genomics Computer Science/Algorithms/Database
More informationGene expression data analysis in clinical cancer research
Gene expression data analysis in clinical cancer research L analisi dell espressione genica nella ricerca oncologica Philippe Broët 1 INSERM U47 and Faculty of Medicine Paris-Sud broet@vjf.inserm.fr Summary:
More informationA note on oligonucleotide expression values not being normally distributed
Biostatistics (2009), 10, 3, pp. 446 450 doi:10.1093/biostatistics/kxp003 Advance Access publication on March 10, 2009 A note on oligonucleotide expression values not being normally distributed JOHANNA
More informationParameter Estimation for the Exponential-Normal Convolution Model
Parameter Estimation for the Exponential-Normal Convolution Model Monnie McGee & Zhongxue Chen cgee@smu.edu, zhongxue@smu.edu. Department of Statistical Science Southern Methodist University ENAR Spring
More informationAdjusting batch effects in microarray expression data using empirical Bayes methods
Biostatistics (2007), 8, 1, pp. 118 127 doi:10.1093/biostatistics/kxj037 Advance Access publication on April 21, 2006 Adjusting batch effects in microarray expression data using empirical Bayes methods
More informationAffymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy
Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT
More informationUniversity of Groningen
University of Groningen Evaluation of an Affymetrix High-density Oligonucleotide Microarray Platform as a Measurement System van den Heuvel, Edwin; Geeven, Geert; Bauerschmidt, Susanne; Polman, Jan E.M.
More informationFeature selection methods for SVM classification of microarray data
Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction
More informationBioconductor Project
Bioconductor Project Bioconductor Project Working Papers Year 2004 Paper 7 Differential Expression with the Bioconductor Project Anja von Heydebreck Wolfgang Huber Robert Gentleman Department of Computational
More informationComparison of Affymetrix GeneChip Expression Measures
Johns Hopkins University, Dept. of Biostatistics Working Papers 9-1-2005 Comparison of Affymetrix GeneChip Expression Measures Rafael A. Irizarry Johns Hopkins Bloomberg School of Public Health, Department
More informationFACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY
FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY K. L. Knudtson 1, C. Griffin 2, A. I. Brooks 3, D. A. Iacobas 4, K. Johnson 5, G. Khitrov 6,
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. MBQC base beta diversity, major protocol variables, and taxonomic profiles.
Supplementary Figure 1 MBQC base beta diversity, major protocol variables, and taxonomic profiles. A) Multidimensional scaling of MBQC sample Bray-Curtis dissimilarities (see Fig. 1). Labels indicate centroids
More informationAnalysis of a Proposed Universal Fingerprint Microarray
Analysis of a Proposed Universal Fingerprint Microarray Michael Doran, Raffaella Settimi, Daniela Raicu, Jacob Furst School of CTI, DePaul University, Chicago, IL Mathew Schipma, Darrell Chandler Bio-detection
More informationMixed effects model for assessing RNA degradation in Affymetrix GeneChip experiments
Mixed effects model for assessing RNA degradation in Affymetrix GeneChip experiments Kellie J. Archer, Ph.D. Suresh E. Joel Viswanathan Ramakrishnan,, Ph.D. Department of Biostatistics Virginia Commonwealth
More informationMicroarray Technique. Some background. M. Nath
Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique
More informationHidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions
Hidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions Ming YUAN and Christina KENDZIORSKI Among the first microarray experiments were those measuring expression over time,
More informationCHAPTER 8 PERFORMANCE APPRAISAL OF A TRAINING PROGRAMME 8.1. INTRODUCTION
168 CHAPTER 8 PERFORMANCE APPRAISAL OF A TRAINING PROGRAMME 8.1. INTRODUCTION Performance appraisal is the systematic, periodic and impartial rating of an employee s excellence in matters pertaining to
More informationA learned comparative expression measure for Affymetrix GeneChip DNA microarrays
Proceedings of the Computational Systems Bioinformatics Conference, August 8-11, 2005, Stanford, CA. pp. 144-154. A learned comparative expression measure for Affymetrix GeneChip DNA microarrays Will Sheffler
More informationExpression summarization
Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene.
More informationAS A SERVICE TO THE RESEARCH COMMUNITY, GENOME BIOLOGY PROVIDES A 'PREPRINT' DEPOSITORY
This information has not been peer-reviewed. Responsibility for the findings rests solely with the author(s). Deposited research article A non-parametric approach for identifying differentially expressed
More informationRNA
RNA sequencing Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271 www.inouyelab.org
More informationBayesian Variable Selection and Data Integration for Biological Regulatory Networks
Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu Gary
More informationRunning head: Empirical estimates suggest most published research is true
Running head: Empirical estimates suggest most published research is true Title: Empirical estimates suggest most published medical research is true Authors: Leah R. Jager 1 and Jeffrey T. Leek 2 * Affiliations:
More informationThe essentials of microarray data analysis
The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering
More informationIntroduction to gene expression microarray data analysis
Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful
More informationA Discussion of Statistical Methods for Design and Analysis of Microarray Experiments for Plant Scientists
The Plant Cell, Vol. 18, 2112 2121, September 2006, www.plantcell.org ª 2006 American Society of Plant Biologists SPECIAL SERIES ON LARGE-SCALE BIOLOGY A Discussion of Statistical Methods for Design and
More informationEstoril Education Day
Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/
More informationNear-Balanced Incomplete Block Designs with An Application to Poster Competitions
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania
More informationGCTA/GREML. Rebecca Johnson. March 30th, 2017
GCTA/GREML Rebecca Johnson March 30th, 2017 1 / 12 Motivation for method We know from twin studies and other methods that genetic variation contributes to complex traits like height, BMI, educational attainment,
More informationSAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG
for the Analysis of Microarray Data Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG Overview Challenges in Microarray Data Analysis Software for Microarray Data Analysis SAS Scientific Discovery
More informationComparative Analysis using the Illumina DASL assay with FFPE tissue. Wendell Jones, PhD Vice President, Statistics and Bioinformatics
TM Comparative Analysis using the Illumina DASL assay with FFPE tissue Wendell Jones, PhD Vice President, Statistics and Bioinformatics Background EA has examined several protocol assay possibilities for
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing
More informationPreprocessing of Microarray data I: Normalization and Missing Values. For example, a list of possible sources in spotted arrays:
Normalization: Preprocessing of Microarray data I: Normalization and Missing Values Comparability across two (experimental condition vs control) or more (many experimental conditions) sets of measurements.
More informationMixture modeling for genome-wide localization of transcription factors
Mixture modeling for genome-wide localization of transcription factors Sündüz Keleş Department of Statistics and Department of Biostatistics & Medical Informatics 1300 University Avenue, 1245B Medical
More informationDiscriminant models for high-throughput proteomics mass spectrometer data
Proteomics 2003, 3, 1699 1703 DOI 10.1002/pmic.200300518 1699 Short Communication Parul V. Purohit David M. Rocke Center for Image Processing and Integrated Computing, University of California, Davis,
More informationMicroarray Gene Expression Analysis at CNIO
Microarray Gene Expression Analysis at CNIO Orlando Domínguez Genomics Unit Biotechnology Program, CNIO 8 May 2013 Workflow, from samples to Gene Expression data Experimental design user/gu/ubio Samples
More informationBioinformatics Advance Access published February 10, A New Summarization Method for Affymetrix Probe Level Data
Bioinformatics Advance Access published February 10, 2006 BIOINFORMATICS A New Summarization Method for Affymetrix Probe Level Data Sepp Hochreiter, Djork-Arné Clevert, and Klaus Obermayer Department of
More informationSECTION 11 ACUTE TOXICITY DATA ANALYSIS
SECTION 11 ACUTE TOXICITY DATA ANALYSIS 11.1 INTRODUCTION 11.1.1 The objective of acute toxicity tests with effluents and receiving waters is to identify discharges of toxic effluents in acutely toxic
More information