Bioinformatics : Gene Expression Data Analysis

Size: px
Start display at page:

Download "Bioinformatics : Gene Expression Data Analysis"

Transcription

1 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering

2 What is Bioinformatics Broad Definition The study of how information technologies are used to solve problems in biology Narrow Definition The creation and management of biological databases in support of genomic sequences Oxford English Dictionary (proposed) Conceptualizing biology in terms of molecules and applying information techniques to understand and organize the information associated with these molecules, on a large scale

3 Aims of Bioinformatics Simplest Organize data in a way that allows researchers to access information and submit new entries as they are produced Higher Develop tools and resources that aid in the analysis of data Advanced Use these tools to analyze the data and interpret the results in a biologically meaning manner

4 Subjects of Bioinfromatics Data Source Raw DNA sequence Protein sequence Macromolecular structure Genomes Gene expression Literature Metabolic pathways Data Size 8.2 million sequences (9.5 billion bases) 300,000 sequences (~300 amino acids each) 13,000 structures (~1,000 atomic coordinates each) 40 complete genomes (1.6 million 3 billion bases each) ~20 time point measurements for ~6,000 genes 11 million citations Topics Separating regions Gene product prediction Sequence comparison, alignments, identification Structure prediction, 3D alignment Protein geometry measurements Molecular simulations Phylogenetic analysis Genomic-scale censuses Linkage analysis Clustering, correlating patterns, mapping data to sequence, structural and biochemical data Digital libraries Knowledge databases Pathway simulations

5 Figure taken from

6 DNA Microarray Experiments

7 Gene Expression Data Gene Expression Data Matrix Each row represents a gene G i ; Each column represents an experiment condition S j ; Each cell X ij is a real value representing the gene expression level of gene G i under condition S j ; X ij > 0: over expressed X ij < 0: under expressed A time-series gene expression data matrix typically contains O(10 3 ) genes and O(10) time points.

8 Gene Expression Data sample 1 sample 2 sample 3 genes X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 samples asymmetric dimensionality 10 ~ 100 sample / condition 1000 ~ gene two-way analysis sample space gene space

9 Microarray Data Analysis Analysis from two angles sample as object, gene as attribute gene as object, sample/condition as attribute

10 Challenges of Gene Data Analysis (1) Gene space: Automatically identify clusters of genes which express similar patterns in the data set Robust to huge amount of noise Effective to handle the highly intersected clusters Potential to visualize the clustering results

11 Co-expressed Genes Gene Expression Data Matrix Gene Expression Patterns Co-expressed Genes Why looking for co-expressed genes? Co-expression indicates co-function; Co-expression also indicates co-regulation.

12 Challenges of Gene Data Analysis (2) Sample space: unsupervised sample clustering presents interesting but also very challenging problems The sample space and gene space are of very different dimensionality (10 1 ~ 10 2 samples versus 10 3 ~10 4 genes). High percentage of irrelevant or redundant genes. People usually have little knowledge about how to construct an informative gene space.

13 Sample Clustering Gene expression data clustering

14 Microarray Data Analysis Microaray Data Microarray Images Sample Clusters Gene Expression Matrices Gene Expression Data Analysis Visualization Important Important patterns Important patterns patterns Gene Expression Patterns

15 Our Approaches Density-based approach: recognizes a dense area as a cluster, and organizes the cluster structure of a data set into a hierarchical tree. caculate the density of each data object based on its neighboring data distribution. construct the "attraction" relationship between data objects according to object density. organize the attraction relationship into the "attraction tree". summarize the attraction tree by a hierarchical "density tree". derive clusters from density tree.

16 Our Approaches (2) Interrelated dimensional clustering -- automatically perform two tasks: detection of meaningful sample patterns selection of those significant genes of empirical pattern

17 Our Approaches (3) Visualization tool: offers insightful information Detects the structure of dataset Three Aspects Explorative Confirmative Representative Microarray Analysis Status Numerical methods dominant Visualization serve graphical presentations of major clustering methods Visualization applied Global visualization (TreeView) Sammon s mapping TreeView

18 VizStruct Architecture Explorative Visualization Sample space Confirmative Visualization Gene space

19 VizStruct - Dimension Tour Interactively adjust dimension parameters Manually or automatically May cause false clusters to break Create dynamic visualization

20 Visualized Results for a Time Series Data Set

21 Elements of Clustering Feature Selection. Select properly the features on which clustering is to be performed. Clustering Algorithm. Criteria (e.g. object function) Proximity Measure (e.g. Euclidean distance, Pearson correlation coefficient ) Cluster Validation. The assessment of clustering results. Interpretation of the results.

22 Supervised Analysis Select training samples (hold out ) Sort genes (t-test, ranking ) Select informative genes (top 50 ~ 200) Cluster or classification based on informative genes Class 1 Class 2 g 1 g g 1 g 2... g 4131 g g g

23 Unsupervised Analysis Microarray data analysis methods can be divided into two categories: supervised/unsupervised analysis. We will focus on unsupervised sample classification which assume no membership information being assigned to any sample. Since the initial biological identification of sample classes has been slow, typically evolving through years of hypothesis-driven research, automatically discovering sample pattern presents a significant contribution in microarray data analysis. Unsupervised sample classification is much more complex than supervised manner. Many mature statistic methods such as t-test, Z-score, and Markov filter can not be applied without the phenotypes of samples known in advance.

24 Problem Statement Given a data matrix M in which the number of samples and the volume of genes are in different order of magnitude ( G >> S ) and the number of sample categories K. The goal is to find K mutually exclusive groups of the samples matching their empirical types, thus to discover their meaningful pattern and to find the set of genes which manifests the meaningful pattern.

25 Problem Statement samples Informative Genes gene 1 gene 2 gene 3 gene 4 Noninformative Genes gene 5 gene 6 gene 7 gene 8

26 Problem Statement (2) samples Informative Genes gene 1 gene 2 gene 3 Noninformative Genes gene 4 gene 5 gene 6 gene 7

27 Problem Statement (3) Class 1 Class 2 Class3 Class 1 Class 2 Class3 gene a gene b gene c gene d gene e gene f

28 Related Work New tools using traditional methods : TreeView CLUTO CIT CNIO GeneSpring J-Express CLUSFAVOR SOM K-means Hierarchical clustering Graph based clustering PCA Their similarity measures based on full gene space are interfered by high percentage of noise.

29 Related Work (2) Clustering with feature selection: (CLIFF, leaf ordering, two-way ordering) 1. Filtering the invarient genes Bayes model Rank variance PCA 2. Partition the samples Ncut Min-Max Cut 3. Pruning genes based on the partition Markov blanket filter T-test Leaf ordering

30 Subspace clustering : Bi-clustering d-clustering Related Work (3)

31 Intra-pattern-steadiness Variance of a single gene: Average row variance: = y y S j S i j i y w w S y i Var 2,, ) ( 1 1 ), ( ( ). ) ( 1 1 ), ( 1 ), ( 2,, = = x y y x G i S j S i j i y x G i x w w S G y i Var G y x R We require each genes show either all on or all off within each sample class.

32 Intra-pattern-consistency(2) Measurement residue Data(A) Data(B) MSR ARV*

33 Inter-pattern-divergence In our model, both ``inter-patternsteadiness'' and ``intrapattern-dissimilarity' on the same gene are reflected. Average block distance: D ( x, ( y, y ' )) = i G x w i, S y G x w i, S y '

34 Pattern Quality The purpose of pattern discovery is to identify the empirical pattern where the patterns inside each class are steady and the divergence between each pair of classes is large. Ω = S S 1 R ( x, y ) + R ( x, y, 1 y D ( x, ( y, y )) y 2 )

35 Pattern Quality (2) Data(A) Data(B) Data(C) Con Div W

36 The Problem Input 1. m samples each measured by n-dimensional genes 2. the number of sample categories K Output A K partition of samples (empirical pattern) and a subset of genes (informative space) that the pattern quality of the partition projected on the gene subset reaches the highest.

37 Strategy Starts with a random K-partition of samples and a subset of genes as the candidate of the informative space. Iteratively adjust the partition and the gene set toward the optimal solution. Basic elements: A state: A partition of samples {S 1,S 2, S k } A set of genes G G The corresponding pattern quality W An adjustment For a gene ˇG, insert into G For a gene G, remove from G For a sample in group S, move to other group i g r i g r i s r

38 Strategy (2) Iteratively adjust the partition and the gene set toward the optimal pattern. for each gene, try possible insert/remove for each sample, try best movement.

39 Improvement Data Standardization o the original gene intensity values relative values w ' i, j = w i, j σ i w i, where w i = m m w j= 1 i, j j= 1 ; σi = Random order Conduct negative action with a probability Stimulated annealing m ( w i, j m 1 w ) i 2 p Ω 1 = exp( ) T(0) = 1; T( i) =. Ω T( i) 1+ i

40 Experimental Results Data Sets: Multiple-sclerosis data MS-IFN : 4132 * 28 (14 MS vs. 14 IFN) MS-CON : 4132 * 30 (15 MS vs. 15 Control) Leukemia data 7129 * 38 (27 ALL vs. 11 AML) 7129 * 34 (20 ALL vs. 14 AML) Colon Cancer data 2000 * 62 (22 normal vs. 40 tumor colon tissue) Hereditary breast cancer data 3226 * 22 ( 7 BRCA1, 8 BRCA2, 7 Sporadics)

41 Experimental Results (2) Multiple-sclerosis data CNIO CIT CLUSFAVO R Cluto J-Express Delta EPD* MS_IFN MS_CON

42 Interrelated Dimensional Clustering The approach is applied on classifying multiple-sclerosis patients and IFN-drug treated patients. (A) Shows the original 28 samples' distribution. Each point represents a sample, which is a mapping from the sample's 4132 genes intensity vectors. (B) Shows 28 samples' distribution on 2015 genes. (C) Shows 28 samples' distribution on 312 genes. (D) Shows the same 28 samples distribution after using our approach. We reduce 4132 genes to 96 genes.

43 Experimental Results (3) Experimental Results (3) Leukemia data CNIO CIT CLUSFAV OR Cluto J-Express Delta EPD* G G

44 Experimental Results (4) Experimental Results (4) Colon & Breast data CNIO CIT CLUSFAVO R Cluto J-Express Delta EPD* Colon Brest

45 Applications Gene Function Co-expressed genes in the same cluster tend to share common roles in cellular processes and genes of unrelated sequence but similar function cluster tightly together. Similar tendency was observed in both yeast data and human data. Gene Regulation By searching for common DNA sequences at the promoter regions of genes within the same cluster, regulatory motifs specific to each gene cluster are identified. Cancer Prediction Normal vs. Tumor Tissue Classification Drug Treatment Evaluation

46 Summary We have developed advanced approaches for gene expression data analysis which work more effectively than traditional analysis approaches This research area is exciting and challenging. There are a lot of interesting research issues.

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Mining Phenotypes and Informative Genes from Gene Expression Data

Mining Phenotypes and Informative Genes from Gene Expression Data In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 03), pages 655 660 Mining Phenotypes and Informative Genes from Gene Expression Data Chun Tang

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 Background A typical human cell consists of ~6 billion base pairs of DNA and ~600 million bases of mrna. It is time-consuming and expensive to

More information

Homework : Data Mining. Due at the start of class Friday, 25 September 2009

Homework : Data Mining. Due at the start of class Friday, 25 September 2009 Homework 4 36-350: Data Mining Due at the start of class Friday, 25 September 2009 This homework set applies methods we have developed so far to a medical problem, gene expression in cancer. In some questions

More information

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the

More information

Mining Multiple Phenotype Structures Underlying Gene Expression Profiles

Mining Multiple Phenotype Structures Underlying Gene Expression Profiles Mining Multiple henotype Structures Underlying Gene Expression rofiles Chun Tang and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Buffalo, NY 14260

More information

Functional genomics + Data mining

Functional genomics + Data mining Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data

More information

Gene expression connectivity mapping and its application to Cat-App

Gene expression connectivity mapping and its application to Cat-App Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster Outline TITLE OF THE PRESENTATION Gene expression

More information

DNA Microarrays and Clustering of Gene Expression Data

DNA Microarrays and Clustering of Gene Expression Data DNA Microarrays and Clustering of Gene Expression Data Martha L. Bulyk mlbulyk@receptor.med.harvard.edu Biophysics 205 Spring term 2008 Traditional Method: Northern Blot RNA population on filter (gel);

More information

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression

More information

Our view on cdna chip analysis from engineering informatics standpoint

Our view on cdna chip analysis from engineering informatics standpoint Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology

More information

Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed

More information

BIOINFORMATICS THE MACHINE LEARNING APPROACH

BIOINFORMATICS THE MACHINE LEARNING APPROACH 88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics

More information

Machine Learning in Computational Biology CSC 2431

Machine Learning in Computational Biology CSC 2431 Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs

More information

Single-cell sequencing

Single-cell sequencing Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data

More information

Scoring pathway activity from gene expression data

Scoring pathway activity from gene expression data Scoring pathway activity from gene expression data Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

Hybrid Intelligent Systems for DNA Microarray Data Analysis

Hybrid Intelligent Systems for DNA Microarray Data Analysis Hybrid Intelligent Systems for DNA Microarray Data Analysis November 27, 2007 Sung-Bae Cho Computer Science Department, Yonsei University Soft Computing Lab What do I think with Bioinformatics? Biological

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

CSC 2427: Algorithms in Molecular Biology Lecture #14

CSC 2427: Algorithms in Molecular Biology Lecture #14 CSC 2427: Algorithms in Molecular Biology Lecture #14 Lecturer: Michael Brudno Scribe Note: Hyonho Lee Department of Computer Science University of Toronto 03 March 2006 Microarrays Revisited In the last

More information

Statistical Analysis of Gene Expression Data Using Biclustering Coherent Column

Statistical Analysis of Gene Expression Data Using Biclustering Coherent Column Volume 114 No. 9 2017, 447-454 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu 1 ijpam.eu Statistical Analysis of Gene Expression Data Using Biclustering Coherent

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Feature selection methods for SVM classification of microarray data

Feature selection methods for SVM classification of microarray data Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments

More information

Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome

Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome RESEARCH PROPOSAL Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome Qiao JIN School of Medicine, Tsinghua University Advisor: Prof. Xuegong ZHANG

More information

Supervised Learning from Micro-Array Data: Datamining with Care

Supervised Learning from Micro-Array Data: Datamining with Care November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian

More information

Research Powered by Agilent s GeneSpring

Research Powered by Agilent s GeneSpring Research Powered by Agilent s GeneSpring Agilent Technologies, Inc. Carolina Livi, Bioinformatics Segment Manager Research Powered by GeneSpring Topics GeneSpring (GS) platform New features in GS 13 What

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not

More information

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems Oracle Life Sciences eseminar Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems http://conference.oracle.com Meeting Place: US Toll Free: 1-888-967-2253 US Only: 1-650-607-2253

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer

More information

Finding molecular signatures from gene expression data: review and a new proposal

Finding molecular signatures from gene expression data: review and a new proposal Finding molecular signatures from gene expression data: review and a new proposal Ramón Díaz-Uriarte rdiaz@cnio.es http://bioinfo.cnio.es/ rdiaz Unidad de Bioinformática Centro Nacional de Investigaciones

More information

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from

More information

From Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore

From Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore From Bench to Bedside: Role of Informatics Nagasuma Chandra Indian Institute of Science Bangalore Electrocardiogram Apparent disconnect among DATA pieces STUDYING THE SAME SYSTEM Echocardiogram Chest sounds

More information

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Predictive Genomics, Biology, Medicine Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Ex. Find mean m and standard deviation s for

More information

Chapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics

Chapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics Chapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics Prof. Nik Kasabov nkasabov@aut.ac.nz http://www.kedri.info 12/16/2002 Nik Kasabov - Evolving Connectionist Systems Overview

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

Data mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics

Data mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics Data mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics PavanKumar kolla* kolla.haripriyanka+ *School of Computing Sciences,

More information

Introduction to Quantitative Genomics / Genetics

Introduction to Quantitative Genomics / Genetics Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current

More information

Methods and tools for exploring functional genomics data

Methods and tools for exploring functional genomics data Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for

More information

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125 Proteomics Manickam Sugumaran Department of Biology University of Massachusetts Boston, MA 02125 Genomic studies produced more than 75,000 potential gene sequence targets. (The number may be even higher

More information

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Feature Selection of Gene Expression Data for Cancer Classification: A Review Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression

More information

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Conformational Analysis 2 Conformational Analysis Properties of molecules depend on their three-dimensional

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results

More information

2. Materials and Methods

2. Materials and Methods Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

Multiple Testing in RNA-Seq experiments

Multiple Testing in RNA-Seq experiments Multiple Testing in RNA-Seq experiments O. Muralidharan et al. 2012. Detecting mutations in mixed sample sequencing data using empirical Bayes. Bernd Klaus Institut für Medizinische Informatik, Statistik

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

Lecture 11 Microarrays and Expression Data

Lecture 11 Microarrays and Expression Data Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 11 Microarrays and Expression Data Genetic Expression Data Microarray experiments Applications Expression

More information

Analysis of microarray data

Analysis of microarray data BNF078 Fall 2006 Analysis of microarray data Markus Ringnér Computational Biology and Biological Physics Department of Theoretical Physics Lund University markus@thep.lu.se 046-2229337 1 Contents Preface

More information

Statistical Methods for Network Analysis of Biological Data

Statistical Methods for Network Analysis of Biological Data The Protein Interaction Workshop, 8 12 June 2015, IMS Statistical Methods for Network Analysis of Biological Data Minghua Deng, dengmh@pku.edu.cn School of Mathematical Sciences Center for Quantitative

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

296 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006

296 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006 296 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006 An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis Patrick C. H. Ma, Keith C. C. Chan, Xin Yao,

More information

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs. Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied

More information

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual

More information

A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data Thesis by Heba Abusamra In Partial Fulfillment of the Requirements For the Degree of Master of Science King

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Classification and Learning Using Genetic Algorithms

Classification and Learning Using Genetic Algorithms Sanghamitra Bandyopadhyay Sankar K. Pal Classification and Learning Using Genetic Algorithms Applications in Bioinformatics and Web Intelligence With 87 Figures and 43 Tables 4y Spri rineer 1 Introduction

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

Computational Approaches to Analysis of DNA Microarray Data

Computational Approaches to Analysis of DNA Microarray Data 2006 IMI and Schattauer GmbH 91 Computational pproaches to nalysis of DN Microarray Data J. Quackenbush Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department

More information

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 1, JANUARY-MARCH 2009 1 Fuzzy-Adaptive-Subspace-Iteration-Based Two-Way Clustering of Microarray Data Jahangheer Shaik and

More information

Uncovering differentially expressed pathways with protein interaction and gene expression data

Uncovering differentially expressed pathways with protein interaction and gene expression data The Second International Symposium on Optimization and Systems Biology (OSB 08) Lijiang, China, October 31 November 3, 2008 Copyright 2008 ORSC & APORC, pp. 74 82 Uncovering differentially expressed pathways

More information

Computational Biology I

Computational Biology I Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

Basic principles of NMR-based metabolomics

Basic principles of NMR-based metabolomics Basic principles of NMR-based metabolomics Professor Dan Stærk Bioanalytical Chemistry and Metabolomics research group Natural Products and Peptides research section Department of Drug Design and Pharmacology

More information

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule Gene expression: Microarray data analysis Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN -47-4-8). Copyright

More information

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome Suberoylanilide Hydroxamic Acid Treatment Reveals Crosstalks among Proteome, Ubiquitylome and Acetylome in Non-Small Cell Lung Cancer A549 Cell Line Quan Wu 1, Zhongyi Cheng 2, Jun Zhu 3, Weiqing Xu 1,

More information

Tool for the identification of differentially expressed genes using a user-defined threshold

Tool for the identification of differentially expressed genes using a user-defined threshold Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 8-18-2006 Tool for the identification of differentially expressed genes using a user-defined threshold Renikko

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

Network System Inference

Network System Inference Network System Inference Francis J. Doyle III University of California, Santa Barbara Douglas Lauffenburger Massachusetts Institute of Technology WTEC Systems Biology Final Workshop March 11, 2005 What

More information

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step

More information

BioInformatics and Computational Molecular Biology. Course Website

BioInformatics and Computational Molecular Biology. Course Website BioInformatics and Computational Molecular Biology Course Website http://bioinformatics.uchc.edu What is Bioinformatics Bioinformatics upgrades the information content of biological measurements. Discovery

More information

Computational methods in bioinformatics: Lecture 1

Computational methods in bioinformatics: Lecture 1 Computational methods in bioinformatics: Lecture 1 Graham J.L. Kemp 2 November 2015 What is biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of an animal Community All species

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 4: Batch Effects. Wong Limsoon

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 4: Batch Effects. Wong Limsoon : Knowledge Discovery Methods for Bioinformatics Unit 4: Batch Effects Wong Limsoon 2 Plan Batch effects Visualization Normalization PC1 removal Batch effect-resistant feature selection Batch effect-resistant

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information