Analysis of Microarray Data

Similar documents
Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Analysis of Microarray Data

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Stefano Monti. Workshop Format

EECS730: Introduction to Bioinformatics

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

Gene Expression Data Analysis (I)

BABELOMICS: Microarray Data Analysis

Exercise1 ArrayExpress Archive - High-throughput sequencing example

NCBI web resources I: databases and Entrez

IPA Advanced Training Course

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Microarray Technique. Some background. M. Nath

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

What we ll do today. Types of stem cells. Do engineered ips and ES cells have. What genes are special in stem cells?

Exploration and Analysis of DNA Microarray Data

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison

Do engineered ips and ES cells have similar molecular signatures?

Functional analysis using EBI Metagenomics

Designing Complex Omics Experiments

B I O I N F O R M A T I C S

Functional Enrichment Analysis & Candidate Gene Ranking

Microarray Gene Expression Analysis at CNIO

ProtPlot A Tissue Molecular Anatomy Program Java-based Data Mining Tool: Screen Shots

Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice

Gene-centered resources at NCBI

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Engineering Genetic Circuits

DNA Microarray Data Oligonucleotide Arrays

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Gene Signal Estimates from Exon Arrays

Protein Bioinformatics Part I: Access to information

Types of Databases - By Scope

RNA Sequencing Analyses & Mapping Uncertainty

Introduction to Bioinformatics. Fabian Hoti 6.10.

Gene Annotation and Gene Set Analysis

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer

Bioinformatics for Proteomics. Ann Loraine

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

Package GeneExpressionSignature

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Bioinformatics Analysis of Nano-based Omics Data

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Why learn sequence database searching? Searching Molecular Databases with BLAST

Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation

Le proteine regolative variano nei vari tipi cellulari e in funzione degli stimoli ambientali

Overview of Health Informatics. ITI BMI-Dept

1. Introduction. 1 Page 1

Regulation of eukaryotic transcription:

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Introduction to microarrays. Overview The analysis process Limitations Extensions (NGS)

DNA Microarrays Introduction Part 2. Todd Lowe BME/BIO 210 April 11, 2007

Expression Array System

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Risk Management User Guide

LIR 832: MINITAB WORKSHOP

Microarray Data Mining and Gene Regulatory Network Analysis

MATH 5610, Computational Biology

Lecture #1. Introduction to microarray technology

Long and short/small RNA-seq data analysis

Introduction to genome biology

Exploring Similarities of Conserved Domains/Motifs

Gene Expression Technology

Application Note. NGS Analysis of B-Cell Receptors & Antibodies by AptaAnalyzer -BCR

Post-assembly Data Analysis

Data Sheet. GeneChip Human Genome U133 Arrays

massir: MicroArray Sample Sex Identifier

ELE4120 Bioinformatics. Tutorial 5

Training materials.

Technical Note. Performance Review of the GeneChip AutoLoader for the Affymetrix GeneChip Scanner Introduction

USDA NRSP-8 BIOINFORMATICS COORDINATION PROGRAM ACTIVITIES

MOLECULAR BIOLOGY OF EUKARYOTES 2016 SYLLABUS


Hands-On Four Investigating Inherited Diseases

Technical note: Molecular Index counting adjustment methods

Additional file 2. Figure 1: Receiver operating characteristic (ROC) curve using the top

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.

TSSpredator User Guide v 1.00

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Entrez Gene: gene-centered information at NCBI

The Five Key Elements of a Successful Metabolomics Study

DNA Arrays Affymetrix GeneChip System

Supporting Information

Copy Number Analysis in Partek Genomics Suite 6.6

3.1.4 DNA Microarray Technology

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Machine Learning in Computational Biology CSC 2431

Lees J.A., Vehkala M. et al., 2016 In Review

Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

Jobs and Skills in the Bay Area

SPH 247 Statistical Analysis of Laboratory Data

RNA spike-in controls & analysis methods for trustworthy genome-scale measurements

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

COMPUTATIONAL ANALYSIS OF MICROARRAY DATA

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Transcription:

Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute

Outline Review Visualizing all the data What to do with a set of interesting genes? Basic annotation Comparing lists Genome mapping Obtaining and analyzing promoters Gene Ontology and pathway analysis Other expression data 2

Generic Microarray Pipeline Design experiment Prepare samples and perform hybridizations Quantify scanned slide image Calculate expression values Normalize Handle low-level expression values Merge data for replicates Determine differentially expressed genes Cluster interesting data not covered in course 3

Review Preliminary filtering? Measuring differential expression: Correcting for multiple hypothesis testing Fold change, t-test, ANOVA Bonferroni, False Discovery Rate, etc. Filtering; identifying interesting genes Distance measures for clustering Clustering/segmentation types and methods What is the best analysis pipeline? Why are you doing the experiment? Are you being reasonable with the statistics? 4

Why draw figures? Get a global perspective of the experiments Quality control: check for low-quality data and errors Compare raw and normalized data Compare controls: are they homogeneous? Help decide how to filter data Look at a subset of data in detail 5

Intensity histogram Median = 6.6 Median = 100 6

Intensity histogram Most genes have low expression levels Using log 2 scale to transform data more normal distribution more helpful interpretation One way to observe overall intensity of chip How to choose genes with no expression? 7

Intensity scatterplot 8

Intensity scatterplot Compares intensity on two colors or chips Genes with similar expression are on the diagonal Use log-transformed expression values Genes with lower expression noisier expression harder to call significant 9

R-I and M-A plots 10

R-I and M-A plots Compares intensity on two colors or chips Like an intensity scatterplot rotated 45º R (ratio) = log(chip1 / chip2) I (intensity) = log(chip1 * chip2) M = log 2 (chip1 / chip2) A = ½(log 2 (chip1*chip2)) Popularized with lowess normalization Easier to intrepret than an intensity scatterplot 11

Volcano plot 12

Volcano plot Scatterplot showing differential expression statistics and fold change Visualize effects of filtering genes by both measures Using fold change vs. statistical measures for differential expression produce very different results 13

Boxplots Raw and median-normalized log 2 (expression values) 14

Boxplots Display summary statistics about the distribution of each chip: Median Quartiles (25% and 75% percentiles) Extreme values (>3 quartiles from median) Note that mean-normalized chips wouldn t have the same median Easy in R; much harder to do in Excel 15

Chip images Affymetrix U95A chip hybridized with fetal brain Image generated from.cel file Helpful for quality control 16

Heatmaps experiments genes 17

Using distance measurements Genes with most similar profiles to GPR37 18

Functional Analysis: intro After data is normalized, compared, filtered, clustered, and differentially expressed genes are found, what happens next? Driven by experimental questions Specificity of hypothesis testing increases power of statistical tests One general question: what s special about the differentially expressed genes? 19

Annotation using sequence databases Gene data can be translated into IDs from a wide variety of sequence databases: LocusLink, Ensembl, UniGene, RefSeq, genome databases Each database in turn links to a lot of different types of data Use Excel or programming tools to do this quickly Web links, instead of actual data, can also be used. What s the difference between these databases? How can all this data be integrated? 20

Venn diagrams Show intersection(s) between at least 2 sets Typical figure More informative figure 21

Mapping genes to the genome Genomic locations of differentially expressed genes Human genome, May 2004 22

Promoter extraction Prerequisite of any promoter analysis Requires a sequenced genome and a complete, mapped cdna sequence Promoter is defined in this context as upstream regulatory sequence Extract genomic DNA using a genome browser: UCSC, Ensembl, NCBI, GBrowse, etc. Functional promoter needs to be determined experimentally 23

Promoter analysis TRANSFAC contains curated binding data Transcription factor binding sites can be predicted matrix (probabilities of each nt at each site) pattern (fuzzy consensus of binding site) Functional sites tend to be evolutionarily conserved Functional promoter activity needs to be verified experimentally 24

Gene Ontology GO is a systematic way to describe protein (gene) function GO comprises ontologies and annotations The ontologies: Molecular function Biological process Cellular component Ontologies are like hierarchies except that a child can have more than one parent. Annotation sources: publications (TAS), bioinformatics (IEA), genetics (IGI), assays (IDA), phenotypes (IMP), etc. 25

Gene Ontology enrichment analysis Unbiased method to ask question, What s so special about my set of genes? Obtain GO annotation (most specific term(s)) for genes in your set Climb an ontology to get all parents (more general, induced terms) Look at occurrence of each term in your set compared to terms in population (all genes or all genes on your chip) Are some terms over-represented? Ex: sample:10/100 pop1: 600/6000 pop2: 15/6000 26

Pathway enrichment analysis Unbiased method to ask question, Is my set of genes especially involved in specific pathways? First step: Link genes to pathways Are some pathways over-represented? Caveats What is meant by pathway"? Multiple DBs with varied annotations Annotations are very incomplete 27

Enrichment analysis on sorted expression data Input 1: complete sorted gene list no threshold value or definition of significance Input 2: set of biologically meaningful gene sets pathway, genome location, function,... Is the rank of genes from any gene set in your sorted list non-random? Example: GSEA Broad Institute 28

Comparisons with other expression studies Array repositories: GEO (NCBI), ArrayExpress (EBI), WADE (WIBR) Search for genes, chips, types of experiments, species View or download data Normalize but still expect noise Check medians and distribution of data It s much easier to make comparisons within an experiment than between experiments 29

Summary Plots: histogram, scatter, R-I, volcano, box Other visualizations: whole chip, heatmaps, bar graphs, Venn diagrams Annotation to sequence DBs Genome mapping Promoter extraction and analysis GO and pathway enrichment analysis Comparison with published studies 30

More information Course page: http://jura.wi.mit.edu/bio/education/bioinfo2006/arrays/ Bioconductor short courses: http://www.bioconductor.org/ BaRC analysis tools: http://jura.wi.mit.edu/bioc/tools/ Gene Ontology Consortium website: http://www.geneontology.org/ Causton HC et al. Microarray Gene Expression Data Analysis: A Beginner s Guide. Blackwell, 2003. Parmigiana G et al. The Analysis of Gene Expression Data: Methods and Software. Springer, 2003. 31

Exercises Graphing all data Scatterplot R-I (M-A) plot Volcano plot Functional analysis Annotation Comparisons Genome mapping Promoter extraction and analysis GO and pathway analysis Using other expression studies 32