Bioinformatics : Gene Expression Data Analysis
|
|
- Carol O’Connor’
- 5 years ago
- Views:
Transcription
1 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering
2 What is Bioinformatics Broad Definition The study of how information technologies are used to solve problems in biology Narrow Definition The creation and management of biological databases in support of genomic sequences Oxford English Dictionary (proposed) Conceptualizing biology in terms of molecules and applying information techniques to understand and organize the information associated with these molecules, on a large scale
3 Aims of Bioinformatics Simplest Organize data in a way that allows researchers to access information and submit new entries as they are produced Higher Develop tools and resources that aid in the analysis of data Advanced Use these tools to analyze the data and interpret the results in a biologically meaning manner
4 Subjects of Bioinfromatics Data Source Raw DNA sequence Protein sequence Macromolecular structure Genomes Gene expression Literature Metabolic pathways Data Size 8.2 million sequences (9.5 billion bases) 300,000 sequences (~300 amino acids each) 13,000 structures (~1,000 atomic coordinates each) 40 complete genomes (1.6 million 3 billion bases each) ~20 time point measurements for ~6,000 genes 11 million citations Topics Separating regions Gene product prediction Sequence comparison, alignments, identification Structure prediction, 3D alignment Protein geometry measurements Molecular simulations Phylogenetic analysis Genomic-scale censuses Linkage analysis Clustering, correlating patterns, mapping data to sequence, structural and biochemical data Digital libraries Knowledge databases Pathway simulations
5 Figure taken from
6 DNA Microarray Experiments
7 Gene Expression Data Gene Expression Data Matrix Each row represents a gene G i ; Each column represents an experiment condition S j ; Each cell X ij is a real value representing the gene expression level of gene G i under condition S j ; X ij > 0: over expressed X ij < 0: under expressed A time-series gene expression data matrix typically contains O(10 3 ) genes and O(10) time points.
8 Gene Expression Data sample 1 sample 2 sample 3 genes X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 samples asymmetric dimensionality 10 ~ 100 sample / condition 1000 ~ gene two-way analysis sample space gene space
9 Microarray Data Analysis Analysis from two angles sample as object, gene as attribute gene as object, sample/condition as attribute
10 Challenges of Gene Data Analysis (1) Gene space: Automatically identify clusters of genes which express similar patterns in the data set Robust to huge amount of noise Effective to handle the highly intersected clusters Potential to visualize the clustering results
11 Co-expressed Genes Gene Expression Data Matrix Gene Expression Patterns Co-expressed Genes Why looking for co-expressed genes? Co-expression indicates co-function; Co-expression also indicates co-regulation.
12 Challenges of Gene Data Analysis (2) Sample space: unsupervised sample clustering presents interesting but also very challenging problems The sample space and gene space are of very different dimensionality (10 1 ~ 10 2 samples versus 10 3 ~10 4 genes). High percentage of irrelevant or redundant genes. People usually have little knowledge about how to construct an informative gene space.
13 Sample Clustering Gene expression data clustering
14 Microarray Data Analysis Microaray Data Microarray Images Sample Clusters Gene Expression Matrices Gene Expression Data Analysis Visualization Important Important patterns Important patterns patterns Gene Expression Patterns
15 Our Approaches Density-based approach: recognizes a dense area as a cluster, and organizes the cluster structure of a data set into a hierarchical tree. caculate the density of each data object based on its neighboring data distribution. construct the "attraction" relationship between data objects according to object density. organize the attraction relationship into the "attraction tree". summarize the attraction tree by a hierarchical "density tree". derive clusters from density tree.
16 Our Approaches (2) Interrelated dimensional clustering -- automatically perform two tasks: detection of meaningful sample patterns selection of those significant genes of empirical pattern
17 Our Approaches (3) Visualization tool: offers insightful information Detects the structure of dataset Three Aspects Explorative Confirmative Representative Microarray Analysis Status Numerical methods dominant Visualization serve graphical presentations of major clustering methods Visualization applied Global visualization (TreeView) Sammon s mapping TreeView
18 VizStruct Architecture Explorative Visualization Sample space Confirmative Visualization Gene space
19 VizStruct - Dimension Tour Interactively adjust dimension parameters Manually or automatically May cause false clusters to break Create dynamic visualization
20 Visualized Results for a Time Series Data Set
21 Elements of Clustering Feature Selection. Select properly the features on which clustering is to be performed. Clustering Algorithm. Criteria (e.g. object function) Proximity Measure (e.g. Euclidean distance, Pearson correlation coefficient ) Cluster Validation. The assessment of clustering results. Interpretation of the results.
22 Supervised Analysis Select training samples (hold out ) Sort genes (t-test, ranking ) Select informative genes (top 50 ~ 200) Cluster or classification based on informative genes Class 1 Class 2 g 1 g g 1 g 2... g 4131 g g g
23 Unsupervised Analysis Microarray data analysis methods can be divided into two categories: supervised/unsupervised analysis. We will focus on unsupervised sample classification which assume no membership information being assigned to any sample. Since the initial biological identification of sample classes has been slow, typically evolving through years of hypothesis-driven research, automatically discovering sample pattern presents a significant contribution in microarray data analysis. Unsupervised sample classification is much more complex than supervised manner. Many mature statistic methods such as t-test, Z-score, and Markov filter can not be applied without the phenotypes of samples known in advance.
24 Problem Statement Given a data matrix M in which the number of samples and the volume of genes are in different order of magnitude ( G >> S ) and the number of sample categories K. The goal is to find K mutually exclusive groups of the samples matching their empirical types, thus to discover their meaningful pattern and to find the set of genes which manifests the meaningful pattern.
25 Problem Statement samples Informative Genes gene 1 gene 2 gene 3 gene 4 Noninformative Genes gene 5 gene 6 gene 7 gene 8
26 Problem Statement (2) samples Informative Genes gene 1 gene 2 gene 3 Noninformative Genes gene 4 gene 5 gene 6 gene 7
27 Problem Statement (3) Class 1 Class 2 Class3 Class 1 Class 2 Class3 gene a gene b gene c gene d gene e gene f
28 Related Work New tools using traditional methods : TreeView CLUTO CIT CNIO GeneSpring J-Express CLUSFAVOR SOM K-means Hierarchical clustering Graph based clustering PCA Their similarity measures based on full gene space are interfered by high percentage of noise.
29 Related Work (2) Clustering with feature selection: (CLIFF, leaf ordering, two-way ordering) 1. Filtering the invarient genes Bayes model Rank variance PCA 2. Partition the samples Ncut Min-Max Cut 3. Pruning genes based on the partition Markov blanket filter T-test Leaf ordering
30 Subspace clustering : Bi-clustering d-clustering Related Work (3)
31 Intra-pattern-steadiness Variance of a single gene: Average row variance: = y y S j S i j i y w w S y i Var 2,, ) ( 1 1 ), ( ( ). ) ( 1 1 ), ( 1 ), ( 2,, = = x y y x G i S j S i j i y x G i x w w S G y i Var G y x R We require each genes show either all on or all off within each sample class.
32 Intra-pattern-consistency(2) Measurement residue Data(A) Data(B) MSR ARV*
33 Inter-pattern-divergence In our model, both ``inter-patternsteadiness'' and ``intrapattern-dissimilarity' on the same gene are reflected. Average block distance: D ( x, ( y, y ' )) = i G x w i, S y G x w i, S y '
34 Pattern Quality The purpose of pattern discovery is to identify the empirical pattern where the patterns inside each class are steady and the divergence between each pair of classes is large. Ω = S S 1 R ( x, y ) + R ( x, y, 1 y D ( x, ( y, y )) y 2 )
35 Pattern Quality (2) Data(A) Data(B) Data(C) Con Div W
36 The Problem Input 1. m samples each measured by n-dimensional genes 2. the number of sample categories K Output A K partition of samples (empirical pattern) and a subset of genes (informative space) that the pattern quality of the partition projected on the gene subset reaches the highest.
37 Strategy Starts with a random K-partition of samples and a subset of genes as the candidate of the informative space. Iteratively adjust the partition and the gene set toward the optimal solution. Basic elements: A state: A partition of samples {S 1,S 2, S k } A set of genes G G The corresponding pattern quality W An adjustment For a gene ˇG, insert into G For a gene G, remove from G For a sample in group S, move to other group i g r i g r i s r
38 Strategy (2) Iteratively adjust the partition and the gene set toward the optimal pattern. for each gene, try possible insert/remove for each sample, try best movement.
39 Improvement Data Standardization o the original gene intensity values relative values w ' i, j = w i, j σ i w i, where w i = m m w j= 1 i, j j= 1 ; σi = Random order Conduct negative action with a probability Stimulated annealing m ( w i, j m 1 w ) i 2 p Ω 1 = exp( ) T(0) = 1; T( i) =. Ω T( i) 1+ i
40 Experimental Results Data Sets: Multiple-sclerosis data MS-IFN : 4132 * 28 (14 MS vs. 14 IFN) MS-CON : 4132 * 30 (15 MS vs. 15 Control) Leukemia data 7129 * 38 (27 ALL vs. 11 AML) 7129 * 34 (20 ALL vs. 14 AML) Colon Cancer data 2000 * 62 (22 normal vs. 40 tumor colon tissue) Hereditary breast cancer data 3226 * 22 ( 7 BRCA1, 8 BRCA2, 7 Sporadics)
41 Experimental Results (2) Multiple-sclerosis data CNIO CIT CLUSFAVO R Cluto J-Express Delta EPD* MS_IFN MS_CON
42 Interrelated Dimensional Clustering The approach is applied on classifying multiple-sclerosis patients and IFN-drug treated patients. (A) Shows the original 28 samples' distribution. Each point represents a sample, which is a mapping from the sample's 4132 genes intensity vectors. (B) Shows 28 samples' distribution on 2015 genes. (C) Shows 28 samples' distribution on 312 genes. (D) Shows the same 28 samples distribution after using our approach. We reduce 4132 genes to 96 genes.
43 Experimental Results (3) Experimental Results (3) Leukemia data CNIO CIT CLUSFAV OR Cluto J-Express Delta EPD* G G
44 Experimental Results (4) Experimental Results (4) Colon & Breast data CNIO CIT CLUSFAVO R Cluto J-Express Delta EPD* Colon Brest
45 Applications Gene Function Co-expressed genes in the same cluster tend to share common roles in cellular processes and genes of unrelated sequence but similar function cluster tightly together. Similar tendency was observed in both yeast data and human data. Gene Regulation By searching for common DNA sequences at the promoter regions of genes within the same cluster, regulatory motifs specific to each gene cluster are identified. Cancer Prediction Normal vs. Tumor Tissue Classification Drug Treatment Evaluation
46 Summary We have developed advanced approaches for gene expression data analysis which work more effectively than traditional analysis approaches This research area is exciting and challenging. There are a lot of interesting research issues.
advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA
advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents
More informationBioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence
More informationData Mining for Biological Data Analysis
Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han
More informationBioinformatics for Biologists
Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis
More informationBIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis
BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology
More informationMining Phenotypes and Informative Genes from Gene Expression Data
In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 03), pages 655 660 Mining Phenotypes and Informative Genes from Gene Expression Data Chun Tang
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationGene Expression Data Analysis
Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationExploring Similarities of Conserved Domains/Motifs
Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;
More informationFollowing text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005
Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of
More informationSurvival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification
Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December
More informationStudy on the Application of Data Mining in Bioinformatics. Mingyang Yuan
International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationIdentification of biological themes in microarray data from a mouse heart development time series using GeneSifter
Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationCS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016
CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 Background A typical human cell consists of ~6 billion base pairs of DNA and ~600 million bases of mrna. It is time-consuming and expensive to
More informationHomework : Data Mining. Due at the start of class Friday, 25 September 2009
Homework 4 36-350: Data Mining Due at the start of class Friday, 25 September 2009 This homework set applies methods we have developed so far to a medical problem, gene expression in cancer. In some questions
More informationBioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics
The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the
More informationMining Multiple Phenotype Structures Underlying Gene Expression Profiles
Mining Multiple henotype Structures Underlying Gene Expression rofiles Chun Tang and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Buffalo, NY 14260
More informationFunctional genomics + Data mining
Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data
More informationGene expression connectivity mapping and its application to Cat-App
Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster Outline TITLE OF THE PRESENTATION Gene expression
More informationDNA Microarrays and Clustering of Gene Expression Data
DNA Microarrays and Clustering of Gene Expression Data Martha L. Bulyk mlbulyk@receptor.med.harvard.edu Biophysics 205 Spring term 2008 Traditional Method: Northern Blot RNA population on filter (gel);
More informationEstimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data
2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression
More informationOur view on cdna chip analysis from engineering informatics standpoint
Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology
More informationMicroarrays & Gene Expression Analysis
Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed
More informationBIOINFORMATICS THE MACHINE LEARNING APPROACH
88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationBIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)
BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics
More informationMachine Learning in Computational Biology CSC 2431
Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs
More informationSingle-cell sequencing
Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data
More informationScoring pathway activity from gene expression data
Scoring pathway activity from gene expression data Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationHybrid Intelligent Systems for DNA Microarray Data Analysis
Hybrid Intelligent Systems for DNA Microarray Data Analysis November 27, 2007 Sung-Bae Cho Computer Science Department, Yonsei University Soft Computing Lab What do I think with Bioinformatics? Biological
More informationVALLIAMMAI ENGINEERING COLLEGE
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared
More informationA STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET
A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,
More informationCSC 2427: Algorithms in Molecular Biology Lecture #14
CSC 2427: Algorithms in Molecular Biology Lecture #14 Lecturer: Michael Brudno Scribe Note: Hyonho Lee Department of Computer Science University of Toronto 03 March 2006 Microarrays Revisited In the last
More informationStatistical Analysis of Gene Expression Data Using Biclustering Coherent Column
Volume 114 No. 9 2017, 447-454 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu 1 ijpam.eu Statistical Analysis of Gene Expression Data Using Biclustering Coherent
More informationCreation of a PAM matrix
Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental
More informationFeature selection methods for SVM classification of microarray data
Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments
More informationPermutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome
RESEARCH PROPOSAL Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome Qiao JIN School of Medicine, Tsinghua University Advisor: Prof. Xuegong ZHANG
More informationSupervised Learning from Micro-Array Data: Datamining with Care
November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian
More informationResearch Powered by Agilent s GeneSpring
Research Powered by Agilent s GeneSpring Agilent Technologies, Inc. Carolina Livi, Bioinformatics Segment Manager Research Powered by GeneSpring Topics GeneSpring (GS) platform New features in GS 13 What
More informationData Mining and Applications in Genomics
Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationOracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems
Oracle Life Sciences eseminar Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems http://conference.oracle.com Meeting Place: US Toll Free: 1-888-967-2253 US Only: 1-650-607-2253
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationWhat is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.
What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer
More informationFinding molecular signatures from gene expression data: review and a new proposal
Finding molecular signatures from gene expression data: review and a new proposal Ramón Díaz-Uriarte rdiaz@cnio.es http://bioinfo.cnio.es/ rdiaz Unidad de Bioinformática Centro Nacional de Investigaciones
More informationFirst steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes
First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from
More informationFrom Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore
From Bench to Bedside: Role of Informatics Nagasuma Chandra Indian Institute of Science Bangalore Electrocardiogram Apparent disconnect among DATA pieces STUDYING THE SAME SYSTEM Echocardiogram Chest sounds
More informationLearning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data
Predictive Genomics, Biology, Medicine Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Ex. Find mean m and standard deviation s for
More informationChapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics
Chapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics Prof. Nik Kasabov nkasabov@aut.ac.nz http://www.kedri.info 12/16/2002 Nik Kasabov - Evolving Connectionist Systems Overview
More informationSeven Keys to Successful Microarray Data Analysis
Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment
More informationData mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics
Data mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics PavanKumar kolla* kolla.haripriyanka+ *School of Computing Sciences,
More informationIntroduction to Quantitative Genomics / Genetics
Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current
More informationMethods and tools for exploring functional genomics data
Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for
More informationProteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125
Proteomics Manickam Sugumaran Department of Biology University of Massachusetts Boston, MA 02125 Genomic studies produced more than 75,000 potential gene sequence targets. (The number may be even higher
More informationFeature Selection of Gene Expression Data for Cancer Classification: A Review
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression
More informationStructural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction
Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Conformational Analysis 2 Conformational Analysis Properties of molecules depend on their three-dimensional
More informationMicroarray Data Analysis in GeneSpring GX 11. Month ##, 200X
Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options
More informationInferring Gene-Gene Interactions and Functional Modules Beyond Standard Models
Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results
More information2. Materials and Methods
Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationLab 1: A review of linear models
Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need
More informationMultiple Testing in RNA-Seq experiments
Multiple Testing in RNA-Seq experiments O. Muralidharan et al. 2012. Detecting mutations in mixed sample sequencing data using empirical Bayes. Bernd Klaus Institut für Medizinische Informatik, Statistik
More informationIntroduction to Microarray Analysis
Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray
More informationLecture 11 Microarrays and Expression Data
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 11 Microarrays and Expression Data Genetic Expression Data Microarray experiments Applications Expression
More informationAnalysis of microarray data
BNF078 Fall 2006 Analysis of microarray data Markus Ringnér Computational Biology and Biological Physics Department of Theoretical Physics Lund University markus@thep.lu.se 046-2229337 1 Contents Preface
More informationStatistical Methods for Network Analysis of Biological Data
The Protein Interaction Workshop, 8 12 June 2015, IMS Statistical Methods for Network Analysis of Biological Data Minghua Deng, dengmh@pku.edu.cn School of Mathematical Sciences Center for Quantitative
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationIntroduction to Bioinformatics. Fabian Hoti 6.10.
Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction
More information296 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006
296 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 10, NO. 3, JUNE 2006 An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis Patrick C. H. Ma, Keith C. C. Chan, Xin Yao,
More informationComparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.
Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied
More informationCS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes
CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual
More informationA Comparative Study of Feature Selection and Classification Methods for Gene Expression Data
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data Thesis by Heba Abusamra In Partial Fulfillment of the Requirements For the Degree of Master of Science King
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationClassification and Learning Using Genetic Algorithms
Sanghamitra Bandyopadhyay Sankar K. Pal Classification and Learning Using Genetic Algorithms Applications in Bioinformatics and Web Intelligence With 87 Figures and 43 Tables 4y Spri rineer 1 Introduction
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review
More informationComputational Approaches to Analysis of DNA Microarray Data
2006 IMI and Schattauer GmbH 91 Computational pproaches to nalysis of DN Microarray Data J. Quackenbush Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department
More informationThis article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 1, JANUARY-MARCH 2009 1 Fuzzy-Adaptive-Subspace-Iteration-Based Two-Way Clustering of Microarray Data Jahangheer Shaik and
More informationUncovering differentially expressed pathways with protein interaction and gene expression data
The Second International Symposium on Optimization and Systems Biology (OSB 08) Lijiang, China, October 31 November 3, 2008 Copyright 2008 ORSC & APORC, pp. 74 82 Uncovering differentially expressed pathways
More informationComputational Biology I
Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)
More informationWhole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist
Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data
More informationBasic principles of NMR-based metabolomics
Basic principles of NMR-based metabolomics Professor Dan Stærk Bioanalytical Chemistry and Metabolomics research group Natural Products and Peptides research section Department of Drug Design and Pharmacology
More informationGene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule
Gene expression: Microarray data analysis Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN -47-4-8). Copyright
More informationSuberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome
Suberoylanilide Hydroxamic Acid Treatment Reveals Crosstalks among Proteome, Ubiquitylome and Acetylome in Non-Small Cell Lung Cancer A549 Cell Line Quan Wu 1, Zhongyi Cheng 2, Jun Zhu 3, Weiqing Xu 1,
More informationTool for the identification of differentially expressed genes using a user-defined threshold
Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 8-18-2006 Tool for the identification of differentially expressed genes using a user-defined threshold Renikko
More informationMachine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University
Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary
More informationNetwork System Inference
Network System Inference Francis J. Doyle III University of California, Santa Barbara Douglas Lauffenburger Massachusetts Institute of Technology WTEC Systems Biology Final Workshop March 11, 2005 What
More informationCOS 597c: Topics in Computational Molecular Biology. DNA arrays. Background
COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step
More informationBioInformatics and Computational Molecular Biology. Course Website
BioInformatics and Computational Molecular Biology Course Website http://bioinformatics.uchc.edu What is Bioinformatics Bioinformatics upgrades the information content of biological measurements. Discovery
More informationComputational methods in bioinformatics: Lecture 1
Computational methods in bioinformatics: Lecture 1 Graham J.L. Kemp 2 November 2015 What is biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of an animal Community All species
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More informationCS4220: Knowledge Discovery Methods for Bioinformatics Unit 4: Batch Effects. Wong Limsoon
: Knowledge Discovery Methods for Bioinformatics Unit 4: Batch Effects Wong Limsoon 2 Plan Batch effects Visualization Normalization PC1 removal Batch effect-resistant feature selection Batch effect-resistant
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More information