BABELOMICS: Microarray Data Analysis

Similar documents
GEPAS and Babelomics. Ignacio Medina David Montaner Bioinformatics Department Functional Genomics Unit

Bioinformatica. Centro de Investigación Príncipe Felipe (CIPF), and Functional genomics node, (INB),

Herramientas para el diseño y el análisis de datos de paneles de genes

Biology 644: Bioinformatics

MS bioinformatics analysis for proteomics. Protein anotations

MicroRNA sequencing (mirnaseq)

Analysis of Microarray Data

Genomes: What we know and what we don t know

Exploration and Analysis of DNA Microarray Data

Introduction to Microarray Analysis

Course Presentation. Ignacio Medina Presentation

Measuring and Understanding Gene Expression

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Analysis of Microarray Data

A NGS data analysis course

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

3.1.4 DNA Microarray Technology

6. GENE EXPRESSION ANALYSIS MICROARRAYS

Complete draft sequence 2001

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Introduction to Bioinformatics and Gene Expression Technology

High performance sequencing and gene expression quantification

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data

Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station

Videos. Lesson Overview. Fermentation

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Gene expression analysis: Introduction to microarrays

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Announcement Structure Analysis

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

Chapter 2: Access to Information

Data Mining for Biological Data Analysis

Machine Learning. HMM applications in computational biology

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Expressed genes profiling (Microarrays) Overview Of Gene Expression Control Profiling Of Expressed Genes

Microarrays & Gene Expression Analysis

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Introduction to BIOINFORMATICS

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Variant calling in NGS experiments

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

The University of California, Santa Cruz (UCSC) Genome Browser

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Lecture #1. Introduction to microarray technology

Measuring gene expression (Microarrays) Ulf Leser

Bioinformatics for Cell Biologists

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Standard Data Analysis Report Agilent Gene Expression Service

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Array-Ready Oligo Set for the Rat Genome Version 3.0

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

DNA Structure and Analysis. Chapter 4: Background

Human Genomics. Higher Human Biology

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Welcome! Introduction to High Throughput Genomics December Norwegian Microarray Consortium FUGE Bioinformatics platform

Microarray Informatics

EECS730: Introduction to Bioinformatics

Gene expression. What is gene expression?

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

Analyzing Gene Set Enrichment

Functional microrna targets in protein coding sequences. Merve Çakır

Make the protein through the genetic dogma process.

Measuring gene expression

Agilent Genomics Software Future Directions

Introduction to Bioinformatics

Introduction to Genome Biology

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

ChIP-seq and RNA-seq. Farhat Habib

What is Bioinformatics?

Bioinformatics: Microarray Technology. Assc.Prof. Chuchart Areejitranusorn AMS. KKU.

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

Microarray Informatics

Bioinformatics. Outline of lecture

Deoxyribonucleic Acid DNA

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Protein Synthesis: From Gene RNA Protein Trait

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

A Microarray Analysis Teaching Module. for Hamilton College. July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT

Ingenuity Pathway Analysis (IPA )

Shannon pipeline plug-in: For human mrna splicing mutations CLC bio Genomics Workbench plug-in CLC bio Genomics Server plug-in Features and Benefits

Multiple choice questions (numbers in brackets indicate the number of correct answers)

The study of the structure, function, and interaction of cellular proteins is called. A) bioinformatics B) haplotypics C) genomics D) proteomics

DNA/RNA MICROARRAYS NOTE: USE THIS KIT WITHIN 6 MONTHS OF RECEIPT.

Microarrays The technology

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

Gene Expression Technology

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

Bioinformatics for Proteomics. Ann Loraine

Transcription:

BABELOMICS: Microarray Data Analysis Madrid, 21 June 2010 Martina Marbà mmarba@cipf.es Bioinformatics and Genomics Department Centro de Investigación Príncipe Felipe (CIPF) (Valencia, Spain)

DNA Microarrays Paradigm of High Throughput Technologies Yield concentration measurements for : genes, SNP, exons, mrna... Measure cells in different biological conditions In a genomic scale Allow us conducting biological experiments So... How do they work?

Central Dogma of Molecular Biology CentralDogma3 For a cell, at a particular time, thousands of mrna are created and sent out of the nucleus to be translated into proteins. Protein concentration regulates biological systems.

DNA Microarrays

Biological Sample We want to know which genes are expressed under particular biological conditions. We can extract all mrna molecules that are being translated within the cells and provide an expression level indicator of its concentration in the biological sample.

RNA Extraction

Labelling the Sample Fluorescent Dye

Hybridization

Expression Measurement If the fluorescent label is attached to one spot we know that the particular complementary gene transcript was present in our cell sample. The greater the fluorescence the greater the concentration of the transcript.

Scanning the Microarray Measuring the intensity of the fluorescence in each spot We get a measurement of the hybridization in each spot of the microarray

The Data For each biological sample (individual) We get intensity measurements for thousands of genetic transcripts. The measured intensity is used as an indicator of gene expression. 200000_s_at 134.4 200001_at 586.5 200002_at 1868.4 200003_s_at 1232.7 200004_at 1071.6 200005_at 312.8 200006_at 1712.6 200007_at 606.5 200008_s_at 421.9 200009_at 395.6 200010_at 1228.6 200011_s_at 132.5 200012_x_at 2606.3 200013_at 1572.9 200014_s_at 138.7 200015_s_at 124.1 200016_x_at 1058.7 200017_at 889.4 200018_at 3964.2 200019_s_at 1069.9 200020_at 212.1 200021_at 1018.1 200022_at 1254.8 200023_s_at 1202.8 200024_at 2460.6

Several Microarrays

Data Matrix

How to treat the data??

Babelomics Pipeline

Definitions Babelomics One of the most complete integrated packages of tools for microarray data analysis available over the web A complete suite of web tools for functional analysis of genome-scale experiments

Babelomics Web http://babelomics.bioinfo.cipf.es http://www.babelomics.org

Questions Is there any significant difference in gene expression between tumor and healthy cells? Can we detect group of genes with similar expression profiles? Can we classify new biological samples based on gene expression patterns? Is there any significant functional enrichment in my gene list? Are these genes involved in the same disease?

Modules Gene Expression Analysis Normalization Differential Expression Clustering Predictors Functional Analysis FatiGO Fatiscan

Gene Expression Analysis Gene Expression Analysis

Normalization

Differential expression

Predictors

Clustering

Functional Analysis Fuctional Analysis

From data to functional interpretation Gene Expression Analysis Microarray Data Differential expression Preprocessing (Normalization, Scaling, ) Tab matrix file Predictors Clustering Genes differentially expressed Predicting genes Genes with same expression patterns Functional Analysis

Functional Analysis Babelomics, a suite of web tools for: statistical test, multiple test corrections, blast,... List of genes or ids, ie: differentially expressed genes,... Integrated Biological DB of Functional Annotation (GO, KEGG, InterPro, Ensembl, SwisProt, Transcription Factors, MicroRNA, Cisred, Biocarta, Bioentities Literature)

FatiGO

FatiGO features Compares two lists of genes. Compares one list of genes against the rest of the genome. One statistical test (Fisher s exact) for each Block of annotation. Multiple testing context. Filtering of annotation is convenient (the less tests the best correction).

FatiGO test One Gene List (A) The other list (B) Are this two groups of genes carrying out different biological roles? Biosynthesis 60% Biosynthesis 20% Sporulation Sporulation 20% Genes in group A have significantly to do with biosynthesis, but not with sporulation. Biosynthesis No biosynthesis A 6 4 20% B 2 8 We do this for each GO, mirna, Interpro,...!!!

FatiScan

FatiScan Interpret a ranked list of genes. There is not need for choosing a cut off. All information is included. One statistical test for each Block of annotation. Multiple testing context (hundreds of annotation). Filtering of annotation is convenient (the less tests the best correction).

FatiScan Testing along an ordered list Annotation label A List of genes Index ranking genes according to some biological aspect under study. + A B C Annotation label B Annotation label C Block of genes enriched in the annotation A Database that stores gene class membership information. Annotation C is homogeneously distributed along the list FatiScan searches over the whole ordered list, trying to find runs of functionally related genes. - Block of genes enriched in the annotation B

FatiScan results Significant results: Enriched class Annotated genes par GO from each list

Babelomics Running an Example...

Example Experiment description: Rheumatoid Arthritis and Osteoarthritis 15 Affymetrix (HG U133A), 38,500 genes Human 3 classes: control, osteoarthritis and rheumatoid arthritis

Example Rheumatoid Arthritis and Osteoarthritis Analysis: Normalization Differential expression FatiScan

Input form Normalization Select your data > browser server > select your data : affy_arthritis Analysis : RMA Job name : normalization_arthritis

Normalization Results

Normalization Results

Input form Differential expression (ND vs RA) Dataset file name : rma.summary.txt Class : CLASS [ND, RA] Test : t Multiple-test correction : fdr Adjusted p-value : 0.05 Job name : Diffexpr_2class_arthritis

Diff Expression Results

Diff Expression Results

Input form FatiScan Species : hsa Duplicates management : Never Fisher exact test : Over represented terms in list 2 Organism: Homo sapiens Data bases: GO biological process Job name : fatiscan_bottomlist_arhritis

FatiScan Results

Acknowledgements Supervisor: Joaquín Dopazo The GEPAS team: Fatima Al-Shahrour Eva Alloza José Carbonell Ana Conesa Pablo Escobar Francisco García Stefan Göetz Martina Marbà Ignacio Medina Pablo Mínguez David Montaner Luis Pulido Javier Santoyo Joaquín Tárraga http://bioinfo.cipf.es