Gene Expression Data Analysis

Size: px
Start display at page:

Download "Gene Expression Data Analysis"

Transcription

1 Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University BMIF 310, Fall 2009

2 Gene expression technologies (summary) Hybridization-based approaches Printed arrays cdna arrays: customizable, high array variation Synthesized oligo arrays Affymetrix arrays: high density, low array variation Classic arrays: probes on 3 UTR Exon arrays: probes on all known exons Tiling arrays: probes spread across the genomic sequence Sequencing-based approaches Traditional Sanger sequencing-based approaches Serial analysis of gene expression: ~10bp tag at the 3 end 2 nd generation sequencing based approaches RNA-Seq: high-throughput unbiased profiling 2 BMIF 310, Fall 2009

3 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 3 BMIF 310, Fall 2009

4 Well begun is half done A clearly defined biological question Well control of potential sources of variation (biological and technical) Statistically sound microarray experimental arrangement (replicates) Compliance with the standard of microarray information collection (MIAME) 4 BMIF 310, Fall 2009

5 Image analysis Analysis of the image of the scanned array in order to extract an intensity for each spot or feature on the array. Gridding: align a grid to the spots Segmentation: identify the shape of each spot Intensity extraction: extract intensity for each spot and potentially for each surrounding background Background correction: subtract background signal from the spot intensity to get a more accurate estimate of the biological signal from the spot 5 BMIF 310, Fall 2009

6 Garbage in, garbage out Remove bad arrays Remove poor-quality spots Remove data points with low signal/noise ratio Remove data points with too many missing value Bad Array 6 BMIF 310, Fall 2009

7 Normalization The purpose of normalization is to remove systematic variation in a microarray experiment which affects the measured gene expression levels Systematic Variation Unequal quantities of starting RNA Differences in labelling and detection efficiencies Topographical slide variation Scanner introduced bias 7 BMIF 310, Fall 2009

8 Normalization method Multiply each array by a constant to make the mean (median) intensity the same for each individual array (Global normalization) Match the percentiles of each array (Quantile normalization) Adjust using a nonlinear smoothing curve Adjust the arrays using some control or housekeeping genes that you would expect to have the same intensity level across all of the samples Adjust using spike control No normalization Global normalization Quantile normalization 8 BMIF 310, Fall 2009

9 Get to know your data matrix Genes Samples ID Samp 1 Samp 2 Samp 3 Samp m-1 Samp m Gene Gene Gene Gene Gene Gene Gene n Gene n BMIF 310, Fall 2009

10 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 10 BMIF 310, Fall 2009

11 Differential Gene Expression n-fold change Arbitrarily selected fold change cut-offs Pros Usually 2 fold Intuitive and easily visualised Simple and rapid Cons Statistically inefficient Magnitude does not necessarily indicate importance Often too restrictive MVA plot M: log ratio ( log 2 (A/B) ) A: average log intensity ( log 2 (A*B)/2 ) 11 BMIF 310, Fall 2009

12 Differential Gene Expression Statistical tests Test for significant change between repeated measurements of a variable in two groups/multiple groups Calculation of statistics, selection of a cut-off value, reject the null-hypothesis Methods Two independent groups Student s t-test: parametric Mann-Whitney U test: nonparametric Two or more independent groups ANOVA (Analysis of variance): parametric Kruskal-Wallis test: nonparametric 12 BMIF 310, Fall 2009

13 Correction for multiple testing Why? In an experiment with a 10,000-gene array in which the significance level p is set at 0.05, 10,1000x0.05=500 genes would be inferred as significant even though none is differentially expressed Unadjusted p-value is likely to exaggerate Type I errors (false positives) Methods Control the family-wise error rate (FWER), the probability that there is a single type I error in the entire set (family) of hypotheses tested. e.g. Standard Bonferroni Correction: uncorrected p value x no. of gene tested Control the false discovery rate (FDR), the expected proportion of false positives among the number of rejected hypotheses. e.g. Benjamini and Hochberg correction. 13 BMIF 310, Fall 2009

14 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 14 BMIF 310, Fall 2009

15 What is clustering Clustering algorithms are methods to divide a set of n objects (genes or samples) into g groups so that within group similarities are larger than between group similarities Unsupervised techniques, does not require the incorporation of any prior knowledge in the process 15 BMIF 310, Fall 2009

16 Why clustering? Exploratory data analysis, providing rough maps and suggesting directions for further study Representing distances among high-dimensional expression profiles in a concise, visually effective way, such as a tree or dendrogram Identify candidate subgroups in complex data. e.g. identification of novel sub-types in cancer, identification of co-expressed genes 16 BMIF 310, Fall 2009

17 Clustering method Hierarchical clustering: generate a hierarchy of clusters going from 1 cluster to n clusters Partitioning: divide the data into g groups using some reallocation algorithm, e.g. K-means Fuzzy clustering: each object has a set of weights suggesting the probability of it belonging to each cluster 17 BMIF 310, Fall 2009

18 Hierarchical clustering Agglomerative clustering (bottom-up) Start with n groups, join the two closest, continue Divisive clustering (top-down) Start with 1 group, split into 2, then into 3,, into n Require distance measurement Between two objects Between clusters 18 BMIF 310, Fall 2009

19 Between objects distance measurement Euclidean distance Focus on the absolute expression value Pearson correlation coefficient Focus on the expression profile shape Parametric, normally distributed and follow the linear regression model Spearman correlation coefficient Focus on the expression profile shape Non-parametric, no assumption Less sensitive than Pearson 19 BMIF 310, Fall 2009

20 Different measurement, different distance Most similar profile to GeneA (blue) based on different distance measurement: Euclidean: GeneB (pink) Pearson: GeneC (green) Spearman: GeneD (red) 20 BMIF 310, Fall 2009

21 Between cluster distance measurement Single linkage: the smallest distance of all pairwise distances Complete linkage: the maximum distance of all pairwise distances Average linkage: the average distance of all pairwise distances 21 BMIF 310, Fall 2009

22 Hierarchical clustering Dendrogram Output of a hierarchical clustering Tree structure with the genes or samples as the leaves The height of the join indicates the distance between the left branch and the right branch Problems Hard to define distinct clusters 22 BMIF 310, Fall 2009

23 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 23 BMIF 310, Fall 2009

24 What is classification Classification algorithms are methods to classify objects into predefined classes Supervised techniques, requires training data and predefined classes Two step process Model construction: describe a set of predetermined classes using training data Model application: classify new objects into predefined classes 24 BMIF 310, Fall 2009

25 Classification methods K-nearest neighbor Decision tree Support vector machine Naïve Bayes classifier Artificial neural network 25 BMIF 310, Fall 2009

26 Feature selection Microarray data are characterized by large numbers of variables (genes) with respect to very few observations (samples), we need to select a subset of genes likely to be predictive (i.e. highly related with particular classes for classification) 26 BMIF 310, Fall 2009

27 Model construction Classification Algorithms Training Data Sample GeneA GeneB Tumor A H H N B H L Y C L L N D H L Y E L L N F L H N Classifier (Model) IF GeneA = H AND GeneB = L THEN Tumor= yes 27 BMIF 310, Fall 2009

28 Model application New objects Classifier (Model) IF GeneA = H Sample GeneA GeneB Tumor Z H L? AND GeneB = L THEN Tumor= yes Sample Z = Tumor? Yes 28 BMIF 310, Fall 2009

29 K-Nearest neighbor Objects are points in an n-d space Compute the distance between the new case and all learning cases Return the most common value among the k learning cases nearest to the new case = 29 BMIF 310, Fall 2009

30 Over-fitting and cross-validation Over-fitting The classifier is very effective in classifying the training samples but not accurate enough for new samples Cross-validation Hold-out N-fold Split data into Training and Testing data Learn with Training data and estimate true error with Testing data Randomly Split data into Training and Testing data n times Learn with Training and estimate true error with Testing in each split separately Average test performance Leave-one-out Leave one case for Testing Learn with the remaining data and estimate true error with the Testing Average test performance 30 BMIF 310, Fall 2009

31 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 31 BMIF 310, Fall 2009

32 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 32 BMIF 310, Fall 2009

33 Importance of biological interpretation Importance of biological interpretation Normalize, Filter, Cluster and Visualize Identification of sets of genes of potential interest Numerical technique, does not reveal the biological implications encrypted in expression data Evaluation of the functional significance of large, heterogeneous and noisy sets of genes constitutes a big challenge 33 BMIF 310, Fall 2009

34 Gene Ontology Structured, precisely defined, common, controlled vocabulary for describing the roles of genes and gene products Three major categories that describe the attributes of biological process, molecular function and cellular component for a gene product Categories of concepts are held within a Directed Acyclic Graph (DAG) 34 BMIF 310, Fall 2009

35 Gene Ontology Tree Machine (GOTM) A web-based tool for the analysis and visualization of sets of genes identified from high-throughput technologies User friendly data navigation and visualization Statistical analysis suggesting biological areas that warrant further study 35 BMIF 310, Fall 2009

36 GOTM observed 24 p=1.92e-34 expected Up-regulated mitotic cell cycle random mitotic cell cycle 36 BMIF 310, Fall 2009

37 Bioinformatics tasks Biological question Experiment design Microarray experiment Image analysis Normalization Data Mining Experimental verification Data storage Data integration Data visualization Differential expression Clustering Classification Network analysis Biological interpretation Hypothesis 37 BMIF 310, Fall 2009

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Outline. Analysis of Microarray Data. Most important design question. General experimental issues Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

STATISTICAL CHALLENGES IN GENE DISCOVERY

STATISTICAL CHALLENGES IN GENE DISCOVERY STATISTICAL CHALLENGES IN GENE DISCOVERY THROUGH MICROARRAY DATA ANALYSIS 1 Central Tuber Crops Research Institute,Kerala, India 2 Dept. of Statistics, St. Thomas College, Pala, Kerala, India email:sreejyothi

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from

More information

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule Gene expression: Microarray data analysis Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN -47-4-8). Copyright

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood

More information

Estoril Education Day

Estoril Education Day Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Functional genomics + Data mining

Functional genomics + Data mining Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data

More information

Gene expression analysis: Introduction to microarrays

Gene expression analysis: Introduction to microarrays Gene expression analysis: Introduction to microarrays Adam Ameur The Linnaeus Centre for Bioinformatics, Uppsala University February 15, 2006 Overview Introduction Part I: How a microarray experiment is

More information

Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing

More information

10.1 The Central Dogma of Biology and gene expression

10.1 The Central Dogma of Biology and gene expression 126 Grundlagen der Bioinformatik, SS 09, D. Huson (this part by K. Nieselt) July 6, 2009 10 Microarrays (script by K. Nieselt) There are many articles and books on this topic. These lectures are based

More information

The essentials of microarray data analysis

The essentials of microarray data analysis The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering

More information

Microarray Technique. Some background. M. Nath

Microarray Technique. Some background. M. Nath Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

Computational Biology I

Computational Biology I Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Pre processing and quality control of microarray data

Pre processing and quality control of microarray data Pre processing and quality control of microarray data Christine Stansberg, 20.04.10 Workflow microarray experiment 1 Problem driven experimental design Wet lab experiments RNA labelling 2 Data pre processing

More information

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world.

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world. Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Preprocessing and normalization A trailer show of the rest of the microarray world Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5 C6

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Measuring and Understanding Gene Expression

Measuring and Understanding Gene Expression Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics

More information

Introduction to microarrays

Introduction to microarrays Bayesian modelling of gene expression data Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) Philippe Broët (INSERM, Paris) In collaboration with Anne-Mette Hein, Natalia

More information

Introduction to gene expression microarray data analysis

Introduction to gene expression microarray data analysis Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Basic aspects of Microarray Data Analysis

Basic aspects of Microarray Data Analysis Hospital Universitari Vall d Hebron Institut de Recerca - VHIR Institut d Investigació Sanitària de l Instituto de Salud Carlos III (ISCIII) Basic aspects of Microarray Data Analysis Expression Data Analysis

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

Analysis of a Proposed Universal Fingerprint Microarray

Analysis of a Proposed Universal Fingerprint Microarray Analysis of a Proposed Universal Fingerprint Microarray Michael Doran, Raffaella Settimi, Daniela Raicu, Jacob Furst School of CTI, DePaul University, Chicago, IL Mathew Schipma, Darrell Chandler Bio-detection

More information

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New

More information

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 Background A typical human cell consists of ~6 billion base pairs of DNA and ~600 million bases of mrna. It is time-consuming and expensive to

More information

Comparison of Microarray Pre-Processing Methods

Comparison of Microarray Pre-Processing Methods Comparison of Microarray Pre-Processing Methods K. Shakya, H. J. Ruskin, G. Kerr, M. Crane, J. Becker Dublin City University, Dublin 9, Ireland Abstract Data pre-processing in microarray technology is

More information

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information Jarkko Salojärvi Lecture slides by

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

Microarray Experiment Design

Microarray Experiment Design Microarray Experiment Design Samples used, extract preparation and labelling: AML blasts were isolated from bone marrow by centrifugation on a Ficoll- Hypaque gradient. Total RNA was extracted using TRIzol

More information

1. Introduction Gene regulation Genomics and genome analyses

1. Introduction Gene regulation Genomics and genome analyses 1. Introduction Gene regulation Genomics and genome analyses 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites Databases 3. Technologies Microarrays Deep sequencing

More information

Machine Learning Methods for Microarray Data Analysis

Machine Learning Methods for Microarray Data Analysis Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Marco F. Ramoni Machine Learning Methods for Microarray Data Analysis Marco F. Ramoni Children s Hospital Informatics

More information

Standard Data Analysis Report Agilent Gene Expression Service

Standard Data Analysis Report Agilent Gene Expression Service Standard Data Analysis Report Agilent Gene Expression Service Experiment: S534662 Date: 2011-01-01 Prepared for: Dr. Researcher Genomic Sciences Lab Prepared by S534662 Standard Data Analysis Report 2011-01-01

More information

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable The DNA Array Analysis Pipeline Question Experimental Design Array design Probe design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure

More information

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748

Introduction to Bioinformatics! Giri Narasimhan. ECS 254; Phone: x3748 Introduction to Bioinformatics! Giri Narasimhan ECS 254; Phone: x3748 giri@cs.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs11.html Reading! The following slides come from a series of talks by Rafael Irizzary

More information

Gene Expression Profiling of Prokaryotic Samples using Low Input Quick Amp WT Kit

Gene Expression Profiling of Prokaryotic Samples using Low Input Quick Amp WT Kit Gene Expression Profiling of Prokaryotic Samples using Low Input Quick Amp WT Kit Application Note Authors Nilanjan Guha and Becky Mullinax Abstract Agilent s Low Input Quick Amp Labeling WT (LIQA WT)

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

Supplementary Figures Supplementary Figure 1

Supplementary Figures Supplementary Figure 1 Supplementary Figures Supplementary Figure 1 Supplementary Figure 1 COMSOL simulation demonstrating flow characteristics in hydrodynamic trap structures. (a) Flow field within the hydrodynamic traps before

More information

Measuring gene expression

Measuring gene expression Measuring gene expression Grundlagen der Bioinformatik SS2018 https://www.youtube.com/watch?v=v8gh404a3gg Agenda Organization Gene expression Background Technologies FISH Nanostring Microarrays RNA-seq

More information

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology. SIMS2003 Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School Introduction to Microarray Technology. Lecture 1 I. EXPERIMENTAL DETAILS II. ARRAY CONSTRUCTION III. IMAGE ANALYSIS Lecture

More information

Introduction to Quantitative Genomics / Genetics

Introduction to Quantitative Genomics / Genetics Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison Thank you for waiting. The presentation will be starting in a few minutes at 9AM Pacific Daylight

More information

Preprocessing Methods for Two-Color Microarray Data

Preprocessing Methods for Two-Color Microarray Data Preprocessing Methods for Two-Color Microarray Data 1/15/2011 Copyright 2011 Dan Nettleton Preprocessing Steps Background correction Transformation Normalization Summarization 1 2 What is background correction?

More information

Analysis of microarray data

Analysis of microarray data BNF078 Fall 2006 Analysis of microarray data Markus Ringnér Computational Biology and Biological Physics Department of Theoretical Physics Lund University markus@thep.lu.se 046-2229337 1 Contents Preface

More information

Our view on cdna chip analysis from engineering informatics standpoint

Our view on cdna chip analysis from engineering informatics standpoint Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology

More information

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY K. L. Knudtson 1, C. Griffin 2, A. I. Brooks 3, D. A. Iacobas 4, K. Johnson 5, G. Khitrov 6,

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Computational Approaches to Analysis of DNA Microarray Data

Computational Approaches to Analysis of DNA Microarray Data 2006 IMI and Schattauer GmbH 91 Computational pproaches to nalysis of DN Microarray Data J. Quackenbush Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department

More information

Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice

Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice Thank you for waiting. The presentation will be starting in a few minutes at 9AM Pacific Daylight Time. During this

More information

Exam 1 from a Past Semester

Exam 1 from a Past Semester Exam from a Past Semester. Provide a brief answer to each of the following questions. a) What do perfect match and mismatch mean in the context of Affymetrix GeneChip technology? Be as specific as possible

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

BIOSTATISTICS AND MEDICAL INFORMATICS (B M I)

BIOSTATISTICS AND MEDICAL INFORMATICS (B M I) Biostatistics and Medical Informatics (B M I) 1 BIOSTATISTICS AND MEDICAL INFORMATICS (B M I) B M I/POP HLTH 451 INTRODUCTION TO SAS PROGRAMMING FOR 2 credits. Use of the SAS programming language for the

More information

V10-8. Gene Expression

V10-8. Gene Expression V10-8. Gene Expression - Regulation of Gene Transcription at Promoters - Experimental Analysis of Gene Expression - Statistics Primer - Preprocessing of Data - Differential Expression Analysis Fri, May

More information

ALLEN Human Brain Atlas

ALLEN Human Brain Atlas TECHNICAL WHITE PAPER: MICROARRAY DATA NORMALIZATION The is a publicly available online resource of gene expression information in the adult human brain. Comprising multiple datasets from various projects

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Affymetrix probe-set remapping and probe-level filtering leads to dramatic improvements in gene expression measurement accuracy

Affymetrix probe-set remapping and probe-level filtering leads to dramatic improvements in gene expression measurement accuracy Affymetrix probe-set remapping and probe-level filtering leads to dramatic improvements in gene expression measurement accuracy Mariano Javier Alvarez a,*, Pavel Sumazin a,* and Andrea Califano a,b (a)

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach Mudge et al. BMC Bioinformatics (2017) 18:312 DOI 10.1186/s12859-017-1728-3 METHODOLOGY ARTICLE Open Access Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach J. F.

More information

DNA Microarrays and Computational Analysis of DNA Microarray. Data in Cancer Research

DNA Microarrays and Computational Analysis of DNA Microarray. Data in Cancer Research DNA Microarrays and Computational Analysis of DNA Microarray Data in Cancer Research Mario Medvedovic, Jonathan Wiest Abstract 1. Introduction 2. Applications of microarrays 3. Analysis of gene expression

More information

Nima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi

Nima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics

More information

Microarray data analysis: from disarray to consolidation and consensus

Microarray data analysis: from disarray to consolidation and consensus Microarray data analysis: from disarray to consolidation and consensus David B. Allison*, Xiangqin Cui*, Grier P. Page* and Mahyar Sabripour* Abstract In just a few years, microarrays have gone from obscurity

More information

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the

More information

Microarray analysis of gene expression in male germ cell tumors

Microarray analysis of gene expression in male germ cell tumors Microarray analysis of gene expression in male germ cell tumors Microarray analysis of gene expression in male germ cell tumors General microarry data analysis workflow From raw data to biological significance

More information

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods BST 226 Statistical Methods for Bioinformatics January 8, 2014 1 The -Omics

More information

Lecture 2: March 8, 2007

Lecture 2: March 8, 2007 Analysis of DNA Chips and Gene Networks Spring Semester, 2007 Lecture 2: March 8, 2007 Lecturer: Rani Elkon Scribe: Yuri Solodkin and Andrey Stolyarenko 1 2.1 Low Level Analysis of Microarrays 2.1.1 Introduction

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

RNA-Seq Analysis. Simon Andrews, Laura v

RNA-Seq Analysis. Simon Andrews, Laura v RNA-Seq Analysis Simon Andrews, Laura Biggins simon.andrews@babraham.ac.uk @simon_andrews v2018-10 RNA-Seq Libraries rrna depleted mrna Fragment u u u u NNNN Random prime + RT 2 nd strand synthesis (+

More information

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 Agilent GeneSpring GX 10: Gene Expression and Beyond Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 GeneSpring GX 10 in the News Our Goals for GeneSpring GX 10 Goal 1: Bring back GeneSpring

More information

RNA-Seq analysis using R: Differential expression and transcriptome assembly

RNA-Seq analysis using R: Differential expression and transcriptome assembly RNA-Seq analysis using R: Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 12/7/2016 Agenda Brief about RNA-seq and experiment design Gene oriented analysis Gene quantification

More information

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Feature Selection of Gene Expression Data for Cancer Classification: A Review Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression

More information

Supervised Learning from Micro-Array Data: Datamining with Care

Supervised Learning from Micro-Array Data: Datamining with Care November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian

More information

RNA

RNA RNA sequencing Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271 www.inouyelab.org

More information

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Outline. Array platform considerations: Comparison between the technologies available in microarrays Microarray overview Outline Array platform considerations: Comparison between the technologies available in microarrays Differences in array fabrication Differences in array organization Applications of

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information