Gene expression data analysis in clinical cancer research

Size: px
Start display at page:

Download "Gene expression data analysis in clinical cancer research"

Transcription

1 Gene expression data analysis in clinical cancer research L analisi dell espressione genica nella ricerca oncologica Philippe Broët 1 INSERM U47 and Faculty of Medicine Paris-Sud broet@vjf.inserm.fr Summary: Nell ambito degli studi d associazione che utilizzano le biotecnologie orientate verso la transcriptomica dove l obiettivo è l identificazione dei geni le cui modifiche d espressione sono correlate a un fattore bio-clinico, uno dei problemi maggiori è l identificazione dei geni tenendo conto della molteplicità dei confronti effettuati. I due principali criteri utilizzati nell ambito delle procedure dei paragoni multipli sono : il FWER (Family Wise Error Rate) e il FDR (False Discovery Rate). Attualmente, esistono numerose procedure che permettono di controllare (o di stimare) questi diversi criteri d errore. Ciò nonostante, queste procedure rispondono solo parzialmente ai bisogni della ricerca clinica oncologica. In questo contesto presentiamo un metodo basato su modelli di misture Bayesiane che permettono di calcolare il FDR per un insieme qualsiasi di geni. Un esempio é presentato a partire da dati reali sul cancro del seno. Keywords: Bayesian mixture model, Clinical research, FDR, Microarray data analysis, Oncology. 1. Introduction Transcriptome-oriented biotechnologies have led to the availability for researchers of comparatively analysing thousands of mrna expression in parallel. Typically, these data consist of the measurement of gene expression under various experimental or biological conditions that can potentially provide information on the complex transcriptional activity for the biological system under study (Schena, 000). In parallel to the rapid development of this genomic technology, research into ways of interpreting the vast and rich body of generated data has become an active area. The interest in this new challenge for biostatisticians is underscored by the increasing number of articles recently published in the scientific literature. From well-designed experiments, research scientists pose questions related to comparison, prediction and clustering problems. For class comparison, the aim is to select relevant genes based on the relationship between its expression measurement and a response variable. For class prediction, the main interest is in deriving predictors defined from a linear or non-linear combination of gene measurement expressions. For class discovery, the major objective is to find new sub-classes of a disease entity that could help for future clinical and fondamental research. For class comparison, research into ways of identifying gene expression changes in microarray experiments taking into account false conclusions has become an active area. Up to now, statistical procedures have mostly relied on the multiple comparisons framework in order to control false positive conclusions (Hochberg and Tamhane, 1987). In this framework, two quantities have 1 16 Avenue Paul Vaillant Couturier Villejuif, France 1 Il lavoro è stato svolto con Sylvia RICHARDSON e Alex LEWIN, Department of Public Health, Imperial College, Norfolk Place, London W 1PG, United Kingdom

2 been considered : the familywise error rate (FWER) and the false discovery rate (FDR). The FWER, which is the oldest criterion considered in multiple comparisons, is defined as the probability of at least one false positive conclusion over all the true null hypotheses (a null hypothesis corresponds to the lack of relationship between gene expression measurement and a response variable). The most classical methods are Bonferroni and Sidãk methods (Hochberg and Tamhane, 1987). However, as argued by Benjamini and Hochberg (Benjamini and Hochberg, 1995), controlling the FWER in multiple testing settings may not always be appropriate. As an alternative and less stringent concept of error control they introduced the false discovery rate (FDR). The FDR is the expected proportion of erroneously rejected null hypotheses among the rejected ones. The main interest of the FDR is that it is an appealing error criteria which leads to more powerful procedures than those relying on the FWER. Moreover, the FDR seems well-suited for genomic and post-genomic biotechnologies which are mostly in the line of exploratory data analysis and screening. Based on this concept, they initially developed an step-up procedure under the hypothesis of independency which controls FDR at a prespecified value (Benjamini and Hochberg, 1995). Extension for the case of dependent tests has also been recently proposed (Benjamini and Yekutieli, 001). In this spirit, seminal work has been done for estimating the FDR, or the pfdr as defined by Storey (Storey, 001), in a non-parametric spirit (for some key contributions, see Storey and Tibshirani, 003; Tusher et al, 001; Efron et al, 001). A drawback of these latter procedures is that they only focus on protecting against false positive conclusions. However, in the exploratory and screening context of most microarray data analysis, investigators may be seriously concerned that such methods do not take into account false negatives and lead to the discarding of too large a proportion of meaningful experimental information. Indeed, a large gene expression variation does not necessarily translate into a major role in the biological process studied and vice versa. This is especially true for microarray experiments in oncology where the top genes (based on p-value or gene statistics) are not necessarily key genes whereas other interesting genes (related to biological pathway or target drug) may exhibit smaller transcriptional variations. In this setting, finite mixture modelling offers a flexible framework (see the numerous illustrations in McLachlan et al, 000) and allows for inferences obtained from a frequentist or Bayesian approach (for a few Pan et al, 003, Broët et al, 00). In this work we present a fully Bayesian mixture model that pays particular attention to the modelling of the alternative hypothesis in order to obtain good estimates of the FDR and its dual quantity the FNR as defined by Genovese and Wasserman (00). Moreover, it allows us to estimate the FDR and FNR for any subset of genes, a feature that cannot be obtained from classical approach that only considers monotone rejection regions. We illustrate our purpose in reanalyzing a dataset about breast cancer (Hedenfalk et al. 001), where the aim is to select relevant gene in a multi-class response experiments comparing BRCA1, BRCA related cancer and sporadic cancer.. Bayesian mixture modelling approach..1 Gene-based statistic In this subsection, we define a gene-based statistic for multi-class response experiments. In the following, let X ijk denote the measurement from the i th gene (1,..., I), in the j th sample (1,..., J k ) belonging to the k th class (1,..., K). The gene-based statistic D i

3 used in our proposed model-based approach is a transformation of the gene statistic F i (following under H 0 (corresponding to truly unmodified expression) a Fisher distribution, denoted FN K K 1 with (K 1) and (N K) degrees of freedom): D i = [(1 9(N K) )F 1 3 i (1 9(K 1) )][ 9(N K) F 3 i + 9(K 1) ] This transformation normalizes the distribution of the F i (Johnson and Kotz, 1970). Under H 0, D i is approximately distributed as a standard normal distribution, while D i has a more complex decentered distribution otherwise. Note that the decentered D i values summarize different gene expression changes across the conditions. Thus, the marginal distribution of D i is a mixture of distributions related to modified and unmodified gene expression measurements over the different classes... Model Our purpose is to model the mixture distribution of D i and to estimate for each gene the posterior probability to belong to the null component representing no difference over the different classes, conditional on the observed data. Our modelling approach assumes that the marginal density of D i can be written such as: f(d i ) = G g=0 w g f(. µ g, σ g) where f(. µ g, σg) are Gaussian densities, with unknown parameters ( µ g, σg) for the g th component density in the mixture. The quantities w g are the mixing proportions with 0 w g 1 and G g=0 w g = 1. Here, we define g = 0 to be the unmodified component having no expression change over the different conditions. This has a centered normal distribution. The number of modified components G in the mixture is treated as unknown since the alternative is expected to have a complex distribution summarizing various pattern of gene expression. The prior distribution for G is a Poisson distribution with parameter m, with m chosen small so as to encourage a parsimonious number of components being fitted. The mean parameter for the unmodified component µ 0 was set to 0 and we impose that µ G remark that under H 0 the distribution of F i are FN K K 1 Fisher distributions and noncentral Fisher distributions FN K K 1 (η) where η parameter under the alternative. The prior distributions specify that µ g;g 0, σg and w g are all drawn independently, with uniform, gamma and Dirichlet priors respectively. As usual for mixture models, we introduce L i an unobserved (latent) categorical variable taking the values 0,..., G with probability w 0,..., w G, respectively (McLachlan et al, 000). Thus, when L i 0 it will indicate that the gene i is not belonging to the null component. A joint posterior distribution for all unknowns is formed. Inference is then undertaken by simulating realizations from the resulting posterior distribution using a reversible-jump Metropolis-Hastings algorithm similar to the one used in Broët et al. (00) and Richardson and Green (1997). The full output of the Bayesian analysis includes information on the posterior distribution of G as well as our main quantities of interest, the posterior probabilities p 0i = p(l i = 0 data) for each gene. The p 0i are estimated within the algorithm by counting the number of times when L i = 0 divided by the length of the simulation run. Note that these probabilities are integrated over the range of normal mixtures (with different G) which are used by to fit the marginal density of D i, a unique feature of our model. From these posterior probabilities we can obtain model-based estimates of the observed false discovery and non-discovery rates conditionally upon the data. 1

4 ..3 The analysis of the Hedenfalk breast cancer dataset Dataset We analyzed the cdna microarray dataset publicly available from the breast cancer study conducted by Hedenfalk et al. (001). The aim of the study was to study breastcancer tissues from patients with BCRA1-related cancer, BCRA-related cancer, and sporadic cases of breast cancer for determining global gene-expression patterns in these three classes of tumors. The initial dataset consists of gene expression ratios derived from the fluorescent intensities from a tumor sample divided by those from a common reference sample. For each gene, a log-expression ratio was available. Here, we focus on the subset of 471 genes having a nominal denomination (EST and unknown gene were excluded). We consider each log-ratio measurement to be an additive sum of four terms: (i) a gene effect, (ii) a differential effect between the tumor sample and the reference sample co-hybridized on a defined array, (iii) an interaction gene cell line effect that reflects differential gene expression among the three tumor classes specific to each gene, (iv) an error term. As the term of interest is the interaction term, we estimate this term through a classical analysis of variance model. In practice, row and column effects are subtracted. Results The mixture integrated over different numbers of components provides a good semi-parametric fit to the gene-based statistics. This dataset appears to have a large number of differentiated genes (the Bayes estimate for the proportion of truly modified genes is 48%). The Bayes rule with the mixture model would give us a list of 995 genes, which is too many for practical purposes. Considering ordered p 0i, our method will provide FDR estimates for a list of the 96 or 384 genes (corresponding to classical 96 or 384 wellplates) of 1.6% and 6.1%, whereas FNR estimates are of 39% and 31.6%, respectively. In contrast, if the investigator is interested in studying a biological function, FDR and FNR can be obtained from individual p 0i. As an example, we consider three subsets of genes based on their known classical biological functions such as: apoptosis, cyclins and cell cycle regulation and cytoskelet. This gave us list size of 6, 1 and 5 genes of interest, respectively. Estimates for the FDR were 85% for apoptosis, 10% for cyclins and cell cycle regulation and 87% for cytoskelet. These results suggest that gene expression changes are different over the three tumor classes for cyclins and cell cycle regulation pathway as compared to the other considered biological functions and may lead the investigator to focus preferentially on gene involved in cell cycle. 3. Discussion Our fully Bayesian normal mixture model gives flexibility since the number of component is treated as an unknown parameter and can be considered as a parsimonious representation of a complex mixture density in a semi-parametric way. In this context, a mixture model-based approach such as the one presented here seems well suited for multi-class comparison experiments. obtained using our mixture model for the FDR and FNR are generaly accurate over a range of cases. When there is a substantial overlap between truly modified and unmodified gene profiles, the estimates outperform those obtained from classical nonparametric approach (such as Storey qvalue, 003). Moreover, our approach gives an estimate of the individual posterior probability for a gene of belonging to the null component integrated over all the possible mixture models. This allows to estimate FDR and FNR for any subset of genes, a feature that cannot be obtained from classical nonparametric approaches (such as Storey qvalue or SAM Tusher al, 001).

5 We applied the model to a cdna microarray dataset from a breast cancer study. When comparing for example three subset of genes defined from their biological functions, our results suggested that transcriptional expression for gene involved in kinase and cell cycle pathway differ between BRCA1, BRCA and sporadic tumors. In summary, we think this modelling approach gives an efficient way for obtaining the FDR and FNR and for analyzing relevant subset of genes that are particularly relevant in clinical cancer research. References Benjamini, Y., Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society, Ser. B,,57, Benjamini, Y., Yekutieli, D. (001) The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 9, Broët, P., Richardson, S., Radvanyi, F. (00) Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J. Comput. Biol.,9, Efron B. Tibshirani R. Storey J. Tusher V. (001) Empirical Bayes Analysis of a Microarray experiment, Journal of the American Statistical Association,96, Genovese, C., Wasserman, L. (00) Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society, Series B.,64, Hedenfalk, I., Duggan, D., Chen, Y. et al. (001) Gene-expression profiles in hereditary breast cancer. N Engl J Med,344, Hochberg, Y., Tamhane, A.(1987) Multiple comparison procedures, Wiley, New York. Johnson N.L., Kotz S. (1970) Continuous univariate distributions. Vol., Wiley, New York. McLachlan, G., Peel, D. (000) Finite Mixture models, Wiley, New York. Pan W, Lin J, Le C. A (003) mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics, 3,117-4 Richardson, S., and Green, P.J. (1997) On Bayesian analysis of mixtures with an unknown number of components. J.R.Statist. Soc. B.,59, Schena, M. (000) Microarray Biochip Technology, Eaton. Storey, J.D. (001) A direct approach to false dis rates, Journal of the Royal Statistical Society, Series B.,64, Storey, J.D, Tibshirani R. (003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA.,100, QVALUE: The manual jstorey/qvalue/manual.pdf Storey JD. (003) The positive false discovery r Bayesian interpretation and the q-value. Annals of Statistics,31, 1-3. Tusher, V., Tibshirani, R., Chu, G. (001) Significant analysis of microarray applied to the ionising radiation response, Proc. Natl Acad. Sci. USA.,98,

Introduction to microarrays

Introduction to microarrays Bayesian modelling of gene expression data Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) Philippe Broët (INSERM, Paris) In collaboration with Anne-Mette Hein, Natalia

More information

STATISTICAL CHALLENGES IN GENE DISCOVERY

STATISTICAL CHALLENGES IN GENE DISCOVERY STATISTICAL CHALLENGES IN GENE DISCOVERY THROUGH MICROARRAY DATA ANALYSIS 1 Central Tuber Crops Research Institute,Kerala, India 2 Dept. of Statistics, St. Thomas College, Pala, Kerala, India email:sreejyothi

More information

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007 Methods for comparing multiple microbial communities. james robert white, whitej@umd.edu Advisor: Mihai Pop, mpop@umiacs.umd.edu October 1 st, 2007 Abstract We propose the development of new software to

More information

Heterogeneity of Variance in Gene Expression Microarray Data

Heterogeneity of Variance in Gene Expression Microarray Data Heterogeneity of Variance in Gene Expression Microarray Data DavidM.Rocke Department of Applied Science and Division of Biostatistics University of California, Davis March 15, 2003 Motivation Abstract

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples

Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples Biometrics 62, 10 18 March 2006 DOI: 10.1111/j.1541-0420.2005.00397.x Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples Raphael Gottardo, 1, Adrian E. Raftery,

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 13 2005, pages 3017 3024 doi:10.1093/bioinformatics/bti448 Gene expression False discovery rate, sensitivity and sample size for microarray studies Yudi Pawitan

More information

Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments

Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments M. Kathleen Kerr The Jackson Laboratory Bar Harbor, Maine 469 U.S.A. mkk@jax.org Gary A. Churchill 1

More information

Comparison of Microarray Pre-Processing Methods

Comparison of Microarray Pre-Processing Methods Comparison of Microarray Pre-Processing Methods K. Shakya, H. J. Ruskin, G. Kerr, M. Crane, J. Becker Dublin City University, Dublin 9, Ireland Abstract Data pre-processing in microarray technology is

More information

A survey of statistical software for analysing RNA-seq data

A survey of statistical software for analysing RNA-seq data A survey of statistical software for analysing RNA-seq data Dexiang Gao, 1,5* Jihye Kim, 2 Hyunmin Kim, 4 Tzu L. Phang, 3 Heather Selby, 2 Aik Choon Tan 2,5 and Tiejun Tong 6** 1 Department of Pediatrics,

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE CHAPTER1 ROAD TO STATISTICAL BIOINFORMATICS Jae K. Lee Department of Public Health Science, University of Virginia, Charlottesville, Virginia, USA There has been a great explosion of biological data and

More information

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor. MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor. Claudio Isella, Tommaso Renzulli, Davide Corà and Enzo Medico May 3, 2016 Abstract Many microarray experiments compare

More information

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods EPP 245/298 Statistical Analysis of Laboratory Data October 11, 2005 1 The

More information

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and

More information

Nima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi

Nima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood

More information

Microarray probe expression measures, data normalization and statistical validation

Microarray probe expression measures, data normalization and statistical validation Comparative and Functional Genomics Comp Funct Genom 2003; 4: 442 446. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.312 Conference Review Microarray probe expression

More information

Statistical Issues in Microarray Data and Data Analysis

Statistical Issues in Microarray Data and Data Analysis Statistical Issues in Microarray Data and Data Analysis Outline Background Reliability of Microarray Technology MAQC experimental design and data Analysis of MAQC data Selection of Differential Expressed

More information

Multiple Testing in RNA-Seq experiments

Multiple Testing in RNA-Seq experiments Multiple Testing in RNA-Seq experiments O. Muralidharan et al. 2012. Detecting mutations in mixed sample sequencing data using empirical Bayes. Bernd Klaus Institut für Medizinische Informatik, Statistik

More information

Supervised Learning from Micro-Array Data: Datamining with Care

Supervised Learning from Micro-Array Data: Datamining with Care November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian

More information

Nonparametric Stepwise Procedure for Identification of Minimum Effective Dose (MED)

Nonparametric Stepwise Procedure for Identification of Minimum Effective Dose (MED) International Journal of Statistics and Systems ISSN 097-675 Volume, Number (06), pp. 77-88 Research India Publications http://www.ripublication.com Nonparametric Stepwise Procedure for Identification

More information

II. METHODS. A. DGE/RNA-seq data

II. METHODS. A. DGE/RNA-seq data Differential expression analysis of digital gene expression data: RNA-tag filtering, comparison of t-type tests and their genome-wide co-expression based adjustments Yinglei Lai Department of Statistics

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure

More information

Some Statistical Issues in Microarray Gene Expression Data

Some Statistical Issues in Microarray Gene Expression Data From the SelectedWorks of Jeffrey S. Morris June, 2006 Some Statistical Issues in Microarray Gene Expression Data Matthew S. Mayo, University of Kansas Medical Center Byron J. Gajewski, University of Kansas

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods BST 226 Statistical Methods for Bioinformatics January 8, 2014 1 The -Omics

More information

Review Statistical tests for differential expression in cdna microarray experiments Xiangqin Cui and Gary A Churchill

Review Statistical tests for differential expression in cdna microarray experiments Xiangqin Cui and Gary A Churchill Review Statistical tests for differential expression in cdna microarray experiments Xiangqin Cui and Gary A Churchill Address: The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA. Correspondence:

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Feature selection methods for SVM classification of microarray data

Feature selection methods for SVM classification of microarray data Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments

More information

Analysis of a Proposed Universal Fingerprint Microarray

Analysis of a Proposed Universal Fingerprint Microarray Analysis of a Proposed Universal Fingerprint Microarray Michael Doran, Raffaella Settimi, Daniela Raicu, Jacob Furst School of CTI, DePaul University, Chicago, IL Mathew Schipma, Darrell Chandler Bio-detection

More information

Analysis of Cancer Gene Expression Profiling in DNA Microarray Data using Clustering Technique

Analysis of Cancer Gene Expression Profiling in DNA Microarray Data using Clustering Technique Analysis of Cancer Gene Expression Profiling in DNA Microarray Data using Clustering Technique 1 C. Premalatha, 2 D. Devikanniga 1, 2 Assistant Professor, Department of Information Technology Sri Ramakrishna

More information

Improving statistical inference for gene expression profiling data by borrowing information

Improving statistical inference for gene expression profiling data by borrowing information Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2010 Improving statistical inference for gene expression profiling data by borrowing information Long Qu Iowa

More information

Facilitating Antibacterial Drug Development: Bayesian vs Frequentist Methods

Facilitating Antibacterial Drug Development: Bayesian vs Frequentist Methods Facilitating Antibacterial Drug Development: Bayesian vs Frequentist Methods Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington The Brookings Institution May 9, 2010 First:

More information

Package IsoGeneGUI. December 9, Type Package

Package IsoGeneGUI. December 9, Type Package Type Package Package IsoGeneGUI December 9, 2018 Title A graphical user interface to conduct a dose-response analysis of microarray data Version 2.18.0 Date 2015-04-09 Author Setia Pramana, Dan Lin, Philippe

More information

ALLEN Human Brain Atlas

ALLEN Human Brain Atlas TECHNICAL WHITE PAPER: MICROARRAY DATA NORMALIZATION The is a publicly available online resource of gene expression information in the adult human brain. Comprising multiple datasets from various projects

More information

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Microarrays: since we use probes we obviously must know the sequences we are looking at! These background are needed: 1. - Basic Molecular Biology & Genetics DNA replication Transcription Post-transcriptional RNA processing Translation Post-translational protein modification Gene expression

More information

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods SPH 247 Statistical Analysis of Laboratory Data April 21, 2015 1 The -Omics

More information

3.1.4 DNA Microarray Technology

3.1.4 DNA Microarray Technology 3.1.4 DNA Microarray Technology Scientists have discovered that one of the differences between healthy and cancer is which genes are turned on in each. Scientists can compare the gene expression patterns

More information

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG for the Analysis of Microarray Data Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG Overview Challenges in Microarray Data Analysis Software for Microarray Data Analysis SAS Scientific Discovery

More information

Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy

Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy Cell Lines, Microarrays, Drugs and Disease: Trying to Predict Response to Chemotherapy Keith Baggerly, Ph.D Associate Professor Department of Bioinformatics and Computational Biology M. D. Anderson Cancer

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

V10-8. Gene Expression

V10-8. Gene Expression V10-8. Gene Expression - Regulation of Gene Transcription at Promoters - Experimental Analysis of Gene Expression - Statistics Primer - Preprocessing of Data - Differential Expression Analysis Fri, May

More information

Total RNA was isolated using the TRIZOL reagent according to the manufacturer s

Total RNA was isolated using the TRIZOL reagent according to the manufacturer s RNA extraction Total RNA was isolated using the TRIZOL reagent according to the manufacturer s instructions (Invitrogen, Carlsbad, CA). RNA integrity for each sample was confirmed with the Agilent 2100

More information

Dose-Response Modeling of Gene Expression Data in Microarray Experiments

Dose-Response Modeling of Gene Expression Data in Microarray Experiments Dose-Response Modeling of Gene Expression Data in Microarray Experiments Setia Pramana Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Diepenbeek, Belgium

More information

Our view on cdna chip analysis from engineering informatics standpoint

Our view on cdna chip analysis from engineering informatics standpoint Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology

More information

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach Mudge et al. BMC Bioinformatics (2017) 18:312 DOI 10.1186/s12859-017-1728-3 METHODOLOGY ARTICLE Open Access Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach J. F.

More information

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results

More information

Disclaimer This presentation expresses my personal views on this topic and must not be interpreted as the regulatory views or the policy of the FDA

Disclaimer This presentation expresses my personal views on this topic and must not be interpreted as the regulatory views or the policy of the FDA On multiplicity problems related to multiple endpoints of controlled clinical trials Mohammad F. Huque, Ph.D. Div of Biometrics IV, Office of Biostatistics OTS, CDER/FDA JSM, Vancouver, August 2010 Disclaimer

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Significance testing for small microarray experiments

Significance testing for small microarray experiments CHAPTER 8 Significance testing for small microarray experiments Charles Kooperberg, Aaron Aragaki, Charles C. Carey, and Suzannah Rutherford 8.1 Introduction When a study has many degrees of freedom it

More information

arxiv: v1 [stat.me] 13 Apr 2013

arxiv: v1 [stat.me] 13 Apr 2013 arxiv:1304.3838v1 [stat.me] 13 Apr 2013 Article type: Overview Identification of significant features in DNA microarray data 2DPP Eric Bair Departments of Endodontics and Biostatistics Univ. of North Carolina

More information

Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach

Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach Rohan Fernando, Ali Toosi, Anna Wolc, Dorian Garrick, and Jack Dekkers Data that are collected for

More information

Design and Analysis of Microarray Experiments for Pharmacogenomics

Design and Analysis of Microarray Experiments for Pharmacogenomics Chapter 7 Design and Analysis of Microarray Experiments for Pharmacogenomics 7.1 7.2 Potential uses of biomarkers............................................. Clinical uses of genetic profiling.........................................

More information

Workshop on Data Science in Biomedicine

Workshop on Data Science in Biomedicine Workshop on Data Science in Biomedicine July 6 Room 1217, Department of Mathematics, Hong Kong Baptist University 09:30-09:40 Welcoming Remarks 9:40-10:20 Pak Chung Sham, Centre for Genomic Sciences, The

More information

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized

More information

Microarray Experiment Design

Microarray Experiment Design Microarray Experiment Design Samples used, extract preparation and labelling: AML blasts were isolated from bone marrow by centrifugation on a Ficoll- Hypaque gradient. Total RNA was extracted using TRIzol

More information

Some observations on experimental design of microarray experiments

Some observations on experimental design of microarray experiments 16 Technical Paper Some observations on experimental design of microarray experiments Lara Lusa Abstract. Gene-expression microarrays measure simultaneously the expression of thousands of genes and are

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

A comparison of methods for differential expression analysis of RNA-seq data

A comparison of methods for differential expression analysis of RNA-seq data Soneson and Delorenzi BMC Bioinformatics 213, 14:91 RESEARCH ARTICLE A comparison of methods for differential expression analysis of RNA-seq data Charlotte Soneson 1* and Mauro Delorenzi 1,2 Open Access

More information

Statistical signal detection in Clinical Trial data

Statistical signal detection in Clinical Trial data Statistical signal detection in Clinical Trial data Andreas Brueckner Christiane Ahlers, Anngret Mallick, Nils Opitz, Vlasta Pinkston, Bruno Tran, Janet Scott, Harry Southworth, Bruno Tran, Lionel Van

More information

Experimental Design for Gene Expression Microarray. Jing Yi 18 Nov, 2002

Experimental Design for Gene Expression Microarray. Jing Yi 18 Nov, 2002 Experimental Design for Gene Expression Microarray Jing Yi 18 Nov, 2002 Human Genome Project The HGP continued emphasis is on obtaining by 2003 a complete and highly accurate reference sequence(1 error

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Ning Tang ALL RIGHTS RESERVED

Ning Tang ALL RIGHTS RESERVED 2014 Ning Tang ALL RIGHTS RESERVED ROBUST GENE SET ANALYSIS AND ROBUST GENE EXPRESSION By NING TANG A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey

More information

Modeling & Simulation in pharmacogenetics/personalised medicine

Modeling & Simulation in pharmacogenetics/personalised medicine Modeling & Simulation in pharmacogenetics/personalised medicine Julie Bertrand MRC research fellow UCL Genetics Institute 07 September, 2012 jbertrand@uclacuk WCOP 07/09/12 1 / 20 Pharmacogenetics Study

More information

Evaluating Diagnostic Tests in the Absence of a Gold Standard

Evaluating Diagnostic Tests in the Absence of a Gold Standard Evaluating Diagnostic Tests in the Absence of a Gold Standard Nandini Dendukuri Departments of Medicine & Epidemiology, Biostatistics and Occupational Health, McGill University; Technology Assessment Unit,

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes

Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Evaluation of Some Statistical Methods for the Identification of Differentially

More information

Microarray data analysis: from disarray to consolidation and consensus

Microarray data analysis: from disarray to consolidation and consensus Microarray data analysis: from disarray to consolidation and consensus David B. Allison*, Xiangqin Cui*, Grier P. Page* and Mahyar Sabripour* Abstract In just a few years, microarrays have gone from obscurity

More information

Page 78

Page 78 A Case Study for Radiation Therapy Dose Finding Utilizing Bayesian Sequential Trial Design Author s Details: (1) Fuyu Song and (2)(3) Shein-Chung Chow 1 Peking University Clinical Research Institute, Peking

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

Comparative analysis of RNA-Seq data with DESeq2

Comparative analysis of RNA-Seq data with DESeq2 Comparative analysis of RNA-Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA-Seq Discovery find new transcripts find transcript boundaries find splice junctions Comparison Given

More information

Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation

Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics

More information

review Expression Microarrays Tiling genomic microarrays Sequencing methods Riassunto puntate precedenti RNA transcripts

review Expression Microarrays Tiling genomic microarrays Sequencing methods Riassunto puntate precedenti RNA transcripts Riassunto puntate precedenti Expression Microarrays Tiling genomic microarrays Sequencing methods RNA transcripts Depend on kind of RNA prep from cells: Total RNA Poly(A) + fraction Long RNA Small RNA.bound

More information

Machine Learning in Computational Biology CSC 2431

Machine Learning in Computational Biology CSC 2431 Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs

More information

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step

More information

Non-parametric optimal design in dose finding studies

Non-parametric optimal design in dose finding studies Biostatistics (2002), 3, 1,pp. 51 56 Printed in Great Britain Non-parametric optimal design in dose finding studies JOHN O QUIGLEY Department of Mathematics, University of California, San Diego, CA 92093,

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Time-series microarray data simulation modeled with a case-control label

Time-series microarray data simulation modeled with a case-control label Time-series microarray data simulation modeled with a case-control label Y.J. Liu and J.Y. Zhang School of Computer Science and Technology, Xidian University, Xi an, China Corresponding author: J.Y. Zhang

More information

Finding molecular signatures from gene expression data: review and a new proposal

Finding molecular signatures from gene expression data: review and a new proposal Finding molecular signatures from gene expression data: review and a new proposal Ramón Díaz-Uriarte rdiaz@cnio.es http://bioinfo.cnio.es/ rdiaz Unidad de Bioinformática Centro Nacional de Investigaciones

More information

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland ChIP-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? Short introduction to ChIP-seq Analyzing ChIP-seq data Central concepts Analysis

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information

Introduction to Quantitative Genomics / Genetics

Introduction to Quantitative Genomics / Genetics Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current

More information

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology. SIMS2003 Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School Introduction to Microarray Technology. Lecture 1 I. EXPERIMENTAL DETAILS II. ARRAY CONSTRUCTION III. IMAGE ANALYSIS Lecture

More information

Detecting Outliers in Exponentiated Pareto Distribution

Detecting Outliers in Exponentiated Pareto Distribution Journal of Sciences, Islamic Republic of Iran 28(3): 267-272 (207) University of Tehran, ISSN 06-04 http://jsciences.ut.ac.ir Detecting Outliers in Exponentiated Pareto Distribution M. Jabbari Nooghabi

More information

Sample Size and Power Calculation for High Order Crossover Designs

Sample Size and Power Calculation for High Order Crossover Designs Sample Size and Power Calculation for High Order Crossover Designs Roger P. Qu, Ph.D Department of Biostatistics Forest Research Institute, New York, NY, USA 1. Introduction Sample size and power calculation

More information

DNA Microarrays and Computational Analysis of DNA Microarray. Data in Cancer Research

DNA Microarrays and Computational Analysis of DNA Microarray. Data in Cancer Research DNA Microarrays and Computational Analysis of DNA Microarray Data in Cancer Research Mario Medvedovic, Jonathan Wiest Abstract 1. Introduction 2. Applications of microarrays 3. Analysis of gene expression

More information

RECENT developments in methods for controlling tested are truly false, the FDR procedure will identify a

RECENT developments in methods for controlling tested are truly false, the FDR procedure will identify a Copyright 2003 by the Genetics Society of America Note False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders Chiara Sabatti,*,1 Susan Service and Nelson Freimer *Departments

More information

The samr Package. R topics documented: October 7, Title SAM: Significance Analysis of Microarrays. Version 1.20

The samr Package. R topics documented: October 7, Title SAM: Significance Analysis of Microarrays. Version 1.20 The samr Package October 7, 2005 Title SAM: Significance Analysis of Microarrays Version 1.20 Author R. Tibshirani, G. Chu, T. Hastie, Balasubramanian Narasimhan Description Significance Analysis of Microarrays

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

The samr Package. June 7, 2007

The samr Package. June 7, 2007 The samr Package June 7, 2007 Title SAM: Significance Analysis of Microarrays Version 1.25 Author R. Tibshirani, G. Chu, T. Hastie, Balasubramanian Narasimhan Description Significance Analysis of Microarrays

More information

Inherent variation in the reactions, type of enzymes used. Depends on the type of labeling and procedures, as well as the age of the labels.

Inherent variation in the reactions, type of enzymes used. Depends on the type of labeling and procedures, as well as the age of the labels. 332 Experimental design, analysis of variance and slide quality assessment in gene expression arrays Sorin Draghici*, Alexander Kuklin, Bruce Hoff & Soheil Shams Address BioDiscovery Inc 11150 West Olympic

More information

A robust statistical procedure to discover expression biomarkers using microarray genomic expression data *

A robust statistical procedure to discover expression biomarkers using microarray genomic expression data * Zou et al. / J Zhejiang Univ SCIENCE B 006 7(8):603-607 603 Journal of Zhejiang University SCIENCE B ISSN 1673-1581 (Print); ISSN 186-1783 (Online) www.zju.edu.cn/jzus; www.springerlink.com E-mail: jzus@zju.edu.cn

More information

STATISTICAL ANALYSIS OF 70-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS

STATISTICAL ANALYSIS OF 70-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS STATISTICAL ANALYSIS OF 7-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS Hongmei Jiang 1, Jianlin Wang, Lu Tian, Z. Jeffrey Chen, and R.W. Doerge 1 1 Department

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information