Basic aspects of Microarray Data Analysis

Size: px
Start display at page:

Download "Basic aspects of Microarray Data Analysis"

Transcription

1 Hospital Universitari Vall d Hebron Institut de Recerca - VHIR Institut d Investigació Sanitària de l Instituto de Salud Carlos III (ISCIII) Basic aspects of Microarray Data Analysis Expression Data Analysis Course Ricardo Gonzalo Sanz ricardo.gonzalo@vhir.org 13/11/13

2 1 Introduction. 2 Software Installation. OneChannel GUI. 3 Quality control. 4 Normalization. 5 Filtering. 6 Statistical inference of diferential expression. 7 Clustering. 8 Annotation. 9 Biological interpretation. Extracted from Rafaele Callogero course slides

3 1 Introduction. Before beginning the analysis Any analysis of microarray data is useless if: there is not a clear biological question to be investigated biological experiments are not carefully designed to minimize error sources: human intervention, reagent lots, EXPERIMENTAL DESIGN equipments. etc.

4 1 Introduction. Experimental Design: Experiment should be designed with many replicas (>3) Time course experiments should be designed with many points (>4). Investigate part of the experiment by microarrays and use the rest for further validations. Discuss the experiment with the statistician/bioinformatician involved in data analysis

5 1 Introduction. Experimental Design: Experiments involving various samples and conditions need to be carefully designed to avoid unwanted effects. C 2C2C2 T 1T1T1 C 1C1C1 T 2T2 C 1C1C1 T 2T2 C 2C2 T 2 T 1T1 T 2 C 2 Day 1 Day 2 T 1 Day 1 Day 2

6 1 Introduction. To pool or not to pool? The basic assumption underlying sample pooling is biological averaging: the expression from a pooled sample averages out the expression from the individual contributing samples. Bioinformatics 2004, 20:3318

7 1 Introduction. To pool or not to pool? It is impossibile to associate the gene expression from the pooled sample with the individual phenotypic information: Making unfeasible certain statistical inference or predictions for individuals. Conclusions: Researcher has to be cautious about designing a pooled experiment. Pooling of samples is recommended when there is not enough RNA from each individual sample to run an array.

8 1 Introduction. Biological question Experimental design FAILED Microarray experiment QC Image analysis PASS Normalization Estimation Testing Analysis Clustering Discrimination Biological verification and interpretation

9 2 Software Installation. OneChannel GUI. onechannelgui This is a graphical interface to Bioconductor libraries devoted to the analysis of data derived from single channel platforms. Able to analyze 3 IVT, Exon, Gene arrays Also able to analyze RNAseq.

10 2 Software Installation. OneChannel GUI. Open R software In the usb stick you will find a folder called onechannelgui. Copy to a known location of your computer. Select the script that is inside the onechannelgui folder previously copied Positionate the cursor in the first line and press Control+R line by line. And wait it will take a long.

11 Biological question Experimental design FAILED Microarray experiment QC Image analysis PASS Normalization Estimation Testing Analysis Clustering Discrimination Biological verification and interpretation

12 3 Quality control. Was the experiment a success??? Microarray experiments generate huge quantities of data It is hard to decide if things seem to be all right just by looking at the numbers. Standard statistical approach use plots to check the quality show all data together highlight structures may help to detect problems ( unusual patterns )

13 3 Quality control. Diagnostics plots for microarrays: Microarray data usually considered at two levels 1. Low level: Data directly coming from the scanner 2. High level: processed from low-level data. Expression values, normalized or not. Adjusted PLM model. Some plots specific for some type of arrays or for some level. Any previous classification may be misleading

14 3 Quality control. Diagnostics plots for microarrays: Low level: Layout image Degradation plots (only in 3 IVT) Histogram/Density plots PCA, Boxplot High level: MA plots Model based plots (NUSE, RLE,...) PCA, Boxplot

15 3 Quality control. Diagnostics plots for microarrays. Low level. Layout image.

16 3 Quality control. Diagnostics plots for microarrays. Low level. RNA degradation plot.

17 3 Quality control. Diagnostics plots for microarrays. Low level. Histogram/density plot.

18 3 Quality control. Diagnostics plots for microarrays. Low level. Boxplot.

19 3 Quality control. Diagnostics plots for microarrays. Low level. PCA. Principal component analysis involves a mathematical procedure that transforms a number of correlated variables into a (smaller) number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible. Each succeeding component accounts for as much of the remaining variability as possible. The components can be thought of as axes in n-dimensional space, where n is the number of components. Each axis represents a different trend in the data.

20 3 Quality control. Diagnostics plots for microarrays. Low level. PCA.

21 3 Quality control. Diagnostics plots for microarrays. Low level. PCA.

22 3 Quality control. Diagnostics plots for microarrays. High level. RLE (Relative Log Expression) RLE values are computed for each probe set by comparing the expression value on each array against the median expression value for that probeset across all arrays. Assuming that most genes are not changing in expression across arrays means ideally most of these RLE values will be near 0. Boxplots of these values, for each array, provides a quality assessment tool.

23 3 Quality control. Diagnostics plots for microarrays. High level. RLE.

24 3 Quality control. Diagnostics plots for microarrays. High level. NUSE (Normalized Unscaled Standard Error). Normalized Unscaled Standard Errors (NUSE) can also be used for assessing quality. The standard error estimates obtained for each gene on each array from fitplm are taken and standardized across arrays so that the median standard error for that genes is 1 across all arrays. This process accounts for differences in variability between genes. An array were there are elevated SE relative to the other arrays is typically of lower quality. Boxplots of these values, separated by array can be used to compare arrays.

25 3 Quality control. Diagnostics plots for microarrays. High level. NUSE.

26 3 Quality control. Diagnostics plots for microarrays. High level. MA plots. MA plots allow pair wise comparison of log-intensity of each array to a reference array and identification of intensity-dependent biases. The Y axis of the plot contains the log-ratio intensity of one array to the reference median array, which is called 'M' while the X axis contains the average log-intensity of both arrays - called 'A'. The normalization is expected to correct for intensity-dependent biases: these graphs plotted before and after normalization allow checking the efficiency of this correction. The probe levels are not likely to differ a lot so we expect a MA plot centered on the Y=0 axis from low to high intensities.

27 3 Quality control. Diagnostics plots for microarrays. High level. MA plots.

28 Biological question Experimental design FAILED Microarray experiment QC Image analysis PASS Normalization Estimation Testing Analysis Clustering Discrimination Biological verification and interpretation

29 4 Normalization. Why normalization? To remove systematic biases sample preparation Variability in hybridization Scanner settings Experimenter bias To achieve a measured scale such that Why not normalization? has the same origin for all spots Use the same unit for all arrays Linear relationship with RNA To cure poor data

30 4 Normalization. General Steps: Background correction (correcting the scale origin for spots) Normalization (standardizing the scale unit - rescaling) Probe level intensity calculation Summary of information of several spots into a single measure for each gene.

31 4 Normalization. Exists different methods: RMA methodology (Irizarry et al., 2003) performs background correction, normalization, and summarization in a modular way. RMA does not take in account unspecific probe hybridization in probe set background calculation. GCRMA is a version of RMA with a background correction component that makes use of probe sequence information (Wu et al., 2004). The PLIER (Probe Logarithmic Error Intensity Estimate) method produces an improved signal by accounting for experimentally observed patterns in probe behavior and handling error at the appropriately at low and high signal values.

32 4 Normalization.

33 5 Filtering. In a microarray experiment only a few hundreds/thousand of genes change their expression due to the different conditions. Genes that do not change introduce noise, therefore is better not to be present when the statistical analysis is done. Researcher is interested in keeping the number of tests/genes as low as possible while keeping the interesting genes in the selected subset. If the truly differentially expressed genes are overrepresented among those selected in the filtering step, the FDR associated with a certain threshold of the test statistic will be lowered due to the filtering.

34 5 Filtering. Exists different types of filtering: Annotation features (specific): Specific gene features (i.e. GO term, presence of transcriptional regulative elements in promoters, etc.) Signal features (non specific): % intensities greater of a user defined value Interquantile range (IQR) greater of a defined value

35 5 Filtering. Annotation filtering In transcriptional studies focusing on genes characterized by specific feature (i.e. transcription factor elements in promoters) the best filtering approach is selecting only those genes linked to the peculiar feature. For example: Identification of genes modulated by estradiol:er or IGF1 by direct binding to Estrogen-Responsive Elements (ERE): HGU133plus2: probe sets Entrez Genes HGU133plus2 with ERE in putative promoter regions: 6764 probe sets 3058 Entrez Genes

36 5 Filtering. Anotation filtering. How? Data derived from specifically devoted annotation data set can be used for functional filtering. The Ingenuity Pathways Knowledge Base is the world's largest curated database of biological networks created from millions of individually modeled relationships The Ingenuity Pathways Analysis software (IPA) identifies relations between genes.

37 5 Filtering. Signal filtering. This technique has as its premise the removal of genes that are deemed to be not expressed or unchanged according to some specific criterion that is under the control of the user. The aim of non specific filtering is to remove genes that, e. g. due to their low overall intensity or variability, are unlikely to carry information about the phenotypes under investigation.

38 5 Filtering. Signal filtering /42 SpikeIn Enrichment: 100% /42 SpikeIn Enrichment: 401%

39 Biological question Experimental design FAILED Microarray experiment QC Image analysis PASS Normalization Estimation Testing Analysis Clustering Discrimination Biological verification and interpretation

40 6 Statistical inference of diferential expression. Class comparison problem: Identify genes whose expression is significantly associated with different conditions Treatment, cell type, (qualitative variables) Dose, time, (quantitative variables) Estimate effects/differences between groups probably using log-ratios, i.e. the difference on log scale log(x)-log(y) [=log(x/y)]

41 6 Statistical inference of diferential expression. But.what is a significal change? Depends on the variability within groups, which may be different from gene to gene. Fold change it is not sufficient to indicate significance of the expression changes. Has to be supported by statistical information. To assess the statistical significance of differences, conduct a statistical test for each gene.

42 6 Statistical inference of diferential expression. Which situations can we found? Indirect comparisons: 2 groups, 2 samples, unpaired E.g. 10 individuals: 5 suffer diabetes, 5 healthy One sample fro each individual Typically: Two sample t-test or similar Direct comparisons: Two groups, two samples, paired E.g. 6 individuals with brain stroke. Two samples from each: one from healthy (region 1) and one from affected (region 2). Typically: One sample t-test (also called paired t-test) or similar based on the individual differences between conditions.

43 6 Statistical inference of diferential expression. Some issues in gene selection Gene expression values have peculiarities that have to be dealt with. Some related with small sample sizes Variance unstability (very low variances produces a high t statistic value) Non-normality of the data Other related to big number of variables Multiple testing Standard t test is not strictly correct to used here, it is better to use a moderated t-test

44 6 Statistical inference of diferential expression. To know if a gene is differentially expressed, we need to assign to each contrast a p-value: Genes with p-values falling below a prescribed level may be regarded as significant But what happens when you repeat the same test thousand of times.? Consider more than one test at once: Two tests each at 5% level. Now probability of getting a false positive is: *0.95 = Three tests : = n tests : n Converge towards 1 as n increases MULTIPLE TESTING PROBLEM

45 6 Statistical inference of diferential expression. MULTIPLE TESTING PROBLEM It is needed to control the type I error (False positives) FDR :Controls the proportion of false positives if you can tolerate more false positives you will get fewer false negatives No information lost

46 6 Statistical inference of diferential expression. After statistics is performed a nice (or not) Top Table is obtained: Gene Description Average intensity P-values AffyID Gene Symbol Log2 FC T statistics Log-odd statistics

47 6 Statistical inference of diferential expression. Visualization of the statistical inference: Venn diagrams and Volcano plots

48 7 Clustering. Types: Supervised clustering try to find the best partition for data that belong to a know set of classes Unsupervised clustering try to define the number and the size of the classes in which the transcription profiles can be fitted in.

49 7 Clustering. Distances: The ability to calculate a distance (or similarity, it s inverse) between two expression vectors is fundamental to clustering algorithms. Distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression. Different types of distances: Euclidean distance, Manhattan distance, Mahalanobis distance. This can originate different sample grouping

50 7 Clustering. Hierarchical Clustering (HCL) HCL is an agglomerative/divisive clustering method. The iterative process continues until all groups are connected in a hierarchical tree. Samples more similar between them are closed.

51 7 Clustering. Heatmaps They allow the quick visualization of the possible expression patterns that could exists among samples.

52 8 Annotation. Relation between probes sets and genes: An important issue in microarray data analysis is the specific association of probe identifiers with genome annotated transcripts. A critical point in annotation is the way in which the association between probes and genes is produced. In Affymetrix arrays usually NetAffx (from Affymetrix web page) is used.

53 9 Biological interpretation. The goal of the Gene Ontology (GO) Consortium is to produce a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. For genes and gene products the Gene Ontology Consortium (GO) is an initiative that is designed to address the problem of defining common set of terms and descriptions for basic biological functions. GO provides a restricted vocabulary as well as clear indications of the relationships between terms.

54 9 Biological interpretation. GENE ONTOLOGY The Gene Ontology (GO) consortium produces three independent ontologies for gene products. The three ontologies are: molecular function of a gene product which is defined to be biochemical activity or action of the gene product (MF 7220). biological process interpreted as a biological objective to which the gene product contributes (BP 9529). cellular component is a component of a cell that is part of some larger object or structure (CC 1536).

55 9 Biological interpretation. The Graph Structure of GO The GO ontologies are structured as directed acyclic graphs (DAGs) that represent a network in which each term may be a child of one or more parents. GO node is interchangeable with GO term. Child terms are more specific than their parents: The term transmembrane receptor proteintyrosine kinase is child of transmembrane receptor and protein tyrosine kinase.

56 9 Biological interpretation. GO structure Graph of GO relationships for the term: transcription factor (GO: )

Analysis pipe-line. Analysis pipe

Analysis pipe-line. Analysis pipe Bioconductor Bioconductor Platform specific Platform specific devices devices Analysis pipe Analysis pipe-line line Sample Sample Preparation Preparation Array Array Fabrication Fabrication Hybridization

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing

More information

Gene expression analysis: Introduction to microarrays

Gene expression analysis: Introduction to microarrays Gene expression analysis: Introduction to microarrays Adam Ameur The Linnaeus Centre for Bioinformatics, Uppsala University February 15, 2006 Overview Introduction Part I: How a microarray experiment is

More information

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 Agilent GeneSpring GX 10: Gene Expression and Beyond Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 GeneSpring GX 10 in the News Our Goals for GeneSpring GX 10 Goal 1: Bring back GeneSpring

More information

Introduction to gene expression microarray data analysis

Introduction to gene expression microarray data analysis Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful

More information

Array Quality Metrics. Audrey Kauffmann

Array Quality Metrics. Audrey Kauffmann Array Quality Metrics Audrey Kauffmann Introduction Microarrays are widely/routinely used Technology and protocol improvements trustworthy Variance and noise Technical causes: Platform Lab, experimentalist

More information

Generating quality metrics reports for microarray data sets. Audrey Kauffmann

Generating quality metrics reports for microarray data sets. Audrey Kauffmann Generating quality metrics reports for microarray data sets Audrey Kauffmann Introduction Microarrays are widely/routinely used Technology and protocol improvements trustworthy Variance and noise Technical

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Outline. Analysis of Microarray Data. Most important design question. General experimental issues Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays

A Distribution Free Summarization Method for Affymetrix GeneChip Arrays A Distribution Free Summarization Method for Affymetrix GeneChip Arrays Zhongxue Chen 1,2, Monnie McGee 1,*, Qingzhong Liu 3, and Richard Scheuermann 2 1 Department of Statistical Science, Southern Methodist

More information

Measuring and Understanding Gene Expression

Measuring and Understanding Gene Expression Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics

More information

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

RNA Degradation and NUSE Plots. Austin Bowles STAT 5570/6570 April 22, 2011

RNA Degradation and NUSE Plots. Austin Bowles STAT 5570/6570 April 22, 2011 RNA Degradation and NUSE Plots Austin Bowles STAT 5570/6570 April 22, 2011 References Sections 3.4 and 3.5.1 of Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Gentleman et

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

Computing with large data sets

Computing with large data sets Computing with large data sets Richard Bonneau, spring 2009 Lecture 16 (week 10): bioconductor: an example R multi-developer project Acknowledgments and other sources: Ben Bolstad, Biostats lectures, Berkely

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7

More information

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies. References Summaries of Affymetrix Genechip Probe Level Data,

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

The essentials of microarray data analysis

The essentials of microarray data analysis The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering

More information

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748 Introduction to Bioinformatics! Giri Narasimhan ECS 254; Phone: x3748 giri@cs.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs11.html Reading! The following slides come from a series of talks by Rafael Irizzary

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG

SAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG for the Analysis of Microarray Data Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG Overview Challenges in Microarray Data Analysis Software for Microarray Data Analysis SAS Scientific Discovery

More information

Measuring gene expression

Measuring gene expression Measuring gene expression Grundlagen der Bioinformatik SS2018 https://www.youtube.com/watch?v=v8gh404a3gg Agenda Organization Gene expression Background Technologies FISH Nanostring Microarrays RNA-seq

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

From CEL files to lists of interesting genes. Rafael A. Irizarry Department of Biostatistics Johns Hopkins University

From CEL files to lists of interesting genes. Rafael A. Irizarry Department of Biostatistics Johns Hopkins University From CEL files to lists of interesting genes Rafael A. Irizarry Department of Biostatistics Johns Hopkins University Contact Information e-mail Personal webpage Department webpage Bioinformatics Program

More information

Standard Data Analysis Report Agilent Gene Expression Service

Standard Data Analysis Report Agilent Gene Expression Service Standard Data Analysis Report Agilent Gene Expression Service Experiment: S534662 Date: 2011-01-01 Prepared for: Dr. Researcher Genomic Sciences Lab Prepared by S534662 Standard Data Analysis Report 2011-01-01

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

10.1 The Central Dogma of Biology and gene expression

10.1 The Central Dogma of Biology and gene expression 126 Grundlagen der Bioinformatik, SS 09, D. Huson (this part by K. Nieselt) July 6, 2009 10 Microarrays (script by K. Nieselt) There are many articles and books on this topic. These lectures are based

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood

More information

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology Image Analysis Lecture 3 Pre-Processing of Affymetrix Arrays Stat 697K, CS 691K, Microbio 690K 2 Affymetrix Terminology Probe: an oligonucleotide of 25 base-pairs ( 25-mer ). Based on Information from

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world.

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world. Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Preprocessing and normalization A trailer show of the rest of the microarray world Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Annotation. (Chapter 8)

Annotation. (Chapter 8) Annotation (Chapter 8) Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Exercise on Microarray data analysis

Exercise on Microarray data analysis Exercise on Microarray data analysis Aim The aim of this exercise is to introduce basic data analysis of transcriptome data using the statistical software R. The exercise is divided in two parts. First,

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data

Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Exploration, Normalization, Summaries, and Software for Affymetrix Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU March 12, 2003 Outline Review of technology Why study probe level

More information

DNA Microarray Data Oligonucleotide Arrays

DNA Microarray Data Oligonucleotide Arrays DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental

More information

Release Notes. JMP Genomics. Version 3.1

Release Notes. JMP Genomics. Version 3.1 JMP Genomics Version 3.1 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

Comparative Analysis using the Illumina DASL assay with FFPE tissue. Wendell Jones, PhD Vice President, Statistics and Bioinformatics

Comparative Analysis using the Illumina DASL assay with FFPE tissue. Wendell Jones, PhD Vice President, Statistics and Bioinformatics TM Comparative Analysis using the Illumina DASL assay with FFPE tissue Wendell Jones, PhD Vice President, Statistics and Bioinformatics Background EA has examined several protocol assay possibilities for

More information

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule Gene expression: Microarray data analysis Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN -47-4-8). Copyright

More information

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays GENE EXPRESSION MONITORING TECHNICAL NOTE New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays Introduction Affymetrix has designed new algorithms for monitoring GeneChip

More information

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET

FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYMETRIX GENECHIP CONTROL DATASET Johns Hopkins University, Dept. of Biostatistics Working Papers 3-17-2006 FEATURE-LEVEL EXPLORATION OF THE CHOE ET AL. AFFYETRIX GENECHIP CONTROL DATASET Rafael A. Irizarry Johns Hopkins Bloomberg School

More information

Gene List Enrichment Analysis

Gene List Enrichment Analysis Outline Gene List Enrichment Analysis George Bell, Ph.D. BaRC Hot Topics March 16, 2010 Why do enrichment analysis? Main types Selecting or ranking genes Annotation sources Statistics Remaining issues

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the

More information

Analysis of microarray data

Analysis of microarray data BNF078 Fall 2006 Analysis of microarray data Markus Ringnér Computational Biology and Biological Physics Department of Theoretical Physics Lund University markus@thep.lu.se 046-2229337 1 Contents Preface

More information

Annotation and Function of Switch-like Genes in Health and Disease. A Thesis. Submitted to the Faculty. Drexel University. Adam M.

Annotation and Function of Switch-like Genes in Health and Disease. A Thesis. Submitted to the Faculty. Drexel University. Adam M. Annotation and Function of Switch-like Genes in Health and Disease A Thesis Submitted to the Faculty of Drexel University by Adam M. Ertel in partial fulfillment of the requirements for the degree of Doctor

More information

The first and only fully-integrated microarray instrument for hands-free array processing

The first and only fully-integrated microarray instrument for hands-free array processing The first and only fully-integrated microarray instrument for hands-free array processing GeneTitan Instrument Transform your lab with a GeneTitan Instrument and experience the unparalleled power of streamlining

More information

A Parallel Approach to Microarray Preprocessing and Analysis

A Parallel Approach to Microarray Preprocessing and Analysis A Parallel Approach to Microarray and Patrick Breheny 2007 Outline The Central Dogma Purifying and labeling RNA Measuring of the amount of RNA corresponding to specific genes requires a number of steps,

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

Multivariate Methods to detecting co-related trends in data

Multivariate Methods to detecting co-related trends in data Multivariate Methods to detecting co-related trends in data Canonical correlation analysis Partial least squares Co-inertia analysis Classical CCA and PLS require n>p. Can apply Penalized CCA and sparse

More information

DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS

DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS A. J. BUTTE 1, J. YE, G. NIEDERFELLNER 3, K. RETT 3, H. U. HÄRING 3, M. F. WHITE, I. S. KOHANE 1 1 Children s Hospital Informatics Program,

More information

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction:

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction: Technical Note GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison Introduction: Affymetrix has launched a new 3 IVT PLUS Reagent Kit which creates hybridization ready target

More information

From hybridization theory to microarray data analysis: performance evaluation

From hybridization theory to microarray data analysis: performance evaluation RESEARCH ARTICLE Open Access From hybridization theory to microarray data analysis: performance evaluation Fabrice Berger * and Enrico Carlon * Abstract Background: Several preprocessing methods are available

More information

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING D. Martucci a, F. Pinciroli a,b, M. Masseroli a a Dipartimento di Bioingegneria, Politecnico di Milano, Milano,

More information

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm Exam description: The purpose of this exam is for you to demonstrate your ability to use the different biomolecular

More information

Computational Biology I

Computational Biology I Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

Genomic data visualisation

Genomic data visualisation Genomic data visualisation Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 1 Thursday 29 th January 2016 Structure of human genome Consist

More information

Expression summarization

Expression summarization Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene.

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

Basic GO Usage. R. Gentleman. October 13, 2014

Basic GO Usage. R. Gentleman. October 13, 2014 Basic GO Usage R. Gentleman October 13, 2014 Introduction In this vignette we describe some of the basic characteristics of the data available from the Gene Ontology (GO), (The Gene Ontology Consortium,

More information

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Technical Overview Introduction RNA Sequencing (RNA-Seq) is one of the most commonly used next-generation sequencing (NGS)

More information

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Vol 1 no 1 2005 Pages 1 5 Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Ozgun Babur 1 1 Center for Bioinformatics, Computer Engineering Department, Bilkent University,

More information

Expression data analysis with Chipster. Eija Korpelainen, Massimiliano Gentile

Expression data analysis with Chipster. Eija Korpelainen, Massimiliano Gentile Expression data analysis with Chipster Eija Korpelainen, Massimiliano Gentile chipster@csc.fi Understanding data analysis - why? Bioinformaticians might not always be available when needed Biologists know

More information

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture Humboldt Universität zu Berlin Microarrays Grundlagen der Bioinformatik SS 2017 Lecture 6 09.06.2017 Agenda 1.mRNA: Genomic background 2.Overview: Microarray 3.Data-analysis: Quality control & normalization

More information

CodeLink Human Whole Genome Bioarray

CodeLink Human Whole Genome Bioarray CodeLink Human Whole Genome Bioarray 55,000 human gene targets on a single bioarray The CodeLink Human Whole Genome Bioarray comprises one of the most comprehensive coverages of the human genome, as it

More information

Microarray analysis challenges.

Microarray analysis challenges. Microarray analysis challenges. While not quite as bad as my hobby of ice climbing you, need the right equipment! T. F. Smith Bioinformatics Boston Univ. Experimental Design Issues Reference and Controls

More information

Microarray Technique. Some background. M. Nath

Microarray Technique. Some background. M. Nath Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique

More information

Package TIN. March 19, 2019

Package TIN. March 19, 2019 Type Package Title Transcriptome instability analysis Version 1.14.0 Date 2014-07-14 Package TIN March 19, 2019 Author Bjarne Johannessen, Anita Sveen and Rolf I. Skotheim Maintainer Bjarne Johannessen

More information

A Genetic Algorithm Approach to DNA Microarrays Analysis of Pancreatic Cancer

A Genetic Algorithm Approach to DNA Microarrays Analysis of Pancreatic Cancer A Genetic Algorithm Approach to DNA Microarrays Analysis of Pancreatic Cancer Nicolae Teodor MELITA 1, Stefan HOLBAN 2 1 Politehnica University of Timisoara, Faculty of Automation and Computers, Bd. V.

More information

Mixture modeling for genome-wide localization of transcription factors

Mixture modeling for genome-wide localization of transcription factors Mixture modeling for genome-wide localization of transcription factors Sündüz Keleş 1,2 and Heejung Shim 1 1 Department of Statistics 2 Department of Biostatistics & Medical Informatics University of Wisconsin,

More information

2007/04/21.

2007/04/21. 2007/04/21 hmwu@stat.sinica.edu.tw http://idv.sinica.edu.tw/hmwu 1 GeneChip Expression Array Design Assay and Analysis Flow Chart Quality Assessment Low Level Analysis (from probe level data to expression

More information

Exercise1 ArrayExpress Archive - High-throughput sequencing example

Exercise1 ArrayExpress Archive - High-throughput sequencing example ArrayExpress and Atlas practical: querying and exporting gene expression data at the EBI Gabriella Rustici gabry@ebi.ac.uk This practical will introduce you to the data content and query functionality

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

ALLEN Human Brain Atlas

ALLEN Human Brain Atlas TECHNICAL WHITE PAPER: MICROARRAY DATA NORMALIZATION The is a publicly available online resource of gene expression information in the adult human brain. Comprising multiple datasets from various projects

More information

Introduction to microarrays. Overview The analysis process Limitations Extensions (NGS)

Introduction to microarrays. Overview The analysis process Limitations Extensions (NGS) Introduction to microarrays Overview The analysis process Limitations Extensions (NGS) Outline An overview (a review) of microarrays Experiments with microarrays The data analysis process Microarray limitations

More information

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

Computational Approaches to Analysis of DNA Microarray Data

Computational Approaches to Analysis of DNA Microarray Data 2006 IMI and Schattauer GmbH 91 Computational pproaches to nalysis of DN Microarray Data J. Quackenbush Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department

More information

Rafael A Irizarry, Department of Biostatistics JHU

Rafael A Irizarry, Department of Biostatistics JHU Getting Usable Data from Microarrays it s not as easy as you think Rafael A Irizarry, Department of Biostatistics JHU rafa@jhu.edu http://www.biostat.jhsph.edu/~ririzarr http://www.bioconductor.org Acknowledgements

More information

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types

More information

RNA-Seq Analysis. Simon Andrews, Laura v

RNA-Seq Analysis. Simon Andrews, Laura v RNA-Seq Analysis Simon Andrews, Laura Biggins simon.andrews@babraham.ac.uk @simon_andrews v2018-10 RNA-Seq Libraries rrna depleted mrna Fragment u u u u NNNN Random prime + RT 2 nd strand synthesis (+

More information