Our goal....to understanding (wisdom)...to knowledge...to information data
|
|
- Philip Wilson
- 6 years ago
- Views:
Transcription
1 Knowledge Discovery
2 Our goal...to understanding (wisdom)...to knowledge...to information data
3 Why do we need Knowledge Discovery? Data Explosion: web usage, automated data collec?on tools, mature database technology Too much data and too liale knowledge Humans not able to sid through the data effec?vely Computa?onal approaches to data analysis are required for the con?nually increasing, accumulated data
4 Poten?al Applica?ons Market analysis, customer rela?onship management Risk analysis and management Fraud detec?on Text mining newsgroups, , documents Web mining of logs, data streams for customiza?on, adver?sing, marke?ng Biology and Medicine many types of highthroughput data for diagnos?cs, predic?ve and personalized medicine
5 Link to image reference
6 Link to image reference
7
8 Even BeAer Consult the Domain Expert(s)
9 The Process Guided Discovery PBL Knowledge Discovery Learn through examples and prac?ce Same general approach may be applied to many different problem domains Select appropriate methods to customize approach No one right answer!
10 Running Example of KD Gene Expression Data Why a good example? Biotechnology advances created huge influx of data Biologists not equipped to analyze the data Computa?onal scien?sts didn t understand the biology KDD process sorely needed Has significantly advanced over the last 10 years
11 Papers Data preprocessing and transforma?on Quackenbush Need for standards MAGE ML Mining large datasets for paaerns Molecular Classifica?on of Cancer Golub et al.
12 A Typical Scenario Biologist designs and runs an experiment and delivers samples (along with $$) to the Func?onal Genomics lab for high throughput gene expression analysis. A couple weeks later biologist picks up a CD with mul?ple files containing the raw data and some preprocessed data not knowing how to analyze the data biologist calls in your help Where do we start? Understand the domain and the problems
13 High Throughput Systems for Studying Global Gene Expression are Complex Need to learn about and consider: the biology behind the experiments & the interpreta?on of the experiments How the data is acquired (biotechnology) the data issues 13
14 Biology Basics: The Flow of Informa?on A gene is expressed in 2 steps: DNA is transcribed into RNA (mrna) RNA is translated into protein 14
15 Genotype to Phenotype Individual cells in an organism have the same genes (DNA) the genotype but.not all genes are ac?ve (expressed) in each cell It is the expression of thousands of genes and their products (RNA, proteins), func?oning in a complicated and orchestrated way, that make a specific cell what it is. the phenotype 15
16 Gene Expression Depends on Context The subsets of genes that are expressed (RNA/ protein) will differ among cells,?ssues, organs, condi?ons the subset expressed confers unique proper?es to the cell neuron liver muscle muscle 16
17 Differen?al Gene Expression The level of expression of genes also differs with the cellular context i.e. the amount of a given RNA will vary We can think of gene expression (in higher organisms) as having both an on/off switch and volume control 17
18 What Biologists Want to Know: Specific PaAerns of Gene Expression Tissue/Cell type specific e.g. skin cell vs. brain cell e.g. kera?nocyte vs. melanocyte Developmental stage e.g. embryonic skin cell vs. adult skin cell Disease state e.g. normal skin cell vs. skin tumor cell Environment specific (drugs, toxins) e.g. skin cell untreated vs. treated 18
19 But also, the more difficult problem: Gene Networks Genes and their products are related through their roles in: metabolic pathways cell signalling networks 19
20 Metabolic Pathway From KEGG Database 20
21 Cell Signalling Networks dortmund.mpg.de/departments/dep1/signaltransduk?on/image3.gif 21
22 What can we learn by studying global paaerns of gene expression? Individual gene expression pa1erns Classifica5ons: for diagnosis, predic?on Groups of Genes Molecular taxonomy of disease Gene Networks/Pathways: Reconstruc?on of metabolic & regulatory pathways 22
23 Now that we have some understanding of the domain and goals What about the data? How are the data generated? Data type? Data quality? Need for data cleaning and preprocessing?
24 Knowledge Discovery Process Consult the Domain Expert(s)
25 GeneChip Oligonucleo?de Array High throughput gene expression analysis 25
26 Recall that DNA and RNA are composed of strings of nucleo?des A gene of interest will have a specific nucleo?de sequence DNA and RNA sequences can form bonds with complementary bases on another string called basepairing. When we do this experimentally we call it hybridiza?on and we can detect it by labeling one of the strings (aka strands) 26
27 GeneChip Expression Analysis Hybridiza?on and Staining Array Hybridized Array crna Target Streptavidin phycoerythrin conjugate Courtesy of M. Hessner, CAAGED Workshop
28 How do Affymetrix microarrays work? probes are picked to interrogate a gene, the idea is to get mul?ple measurements. Each probe is a 25mer oligonucleo?de that binds to a gene The collec?on of probes that are designed to hybridize to the same gene is called a probe set.may be tens of thousands of these probesets on a given chip Probe set names have iden?fica?on names called Affymetrix Ids, and look like 10329_g_at, etc. On any Genechip, some probesets are dedicated for Quality Control, these begin with AFFX_ Take home message: have to learn a lot of terminology
29 Affymetrix Chips ,000 Probes Perfect Match and Mismatch Average Difference Values Courtesy of J. Glasner CAAGED Workshop
30 Affymetrix Analysis High resolu?on image of the scanned microarray generates a DAT file Since the probes are laid out in a grid fashion, and each probe posi?on determined in terms of its X Y co ordinates, one can compute the PM and MM probe intensi?es from the pixelated image The CDF (chip defini?on file) library file contains the XY layout of every probe
31 Affymetrix Data Flow Hybridized GeneChip CDF file CHP file Scan Chip DAT file EXP file Process Image (GCOS) CEL file MAS5 (GCOS) TXT file RPT file GeneChip Opera?ng SoDware (GCOS) Affymetrix hap://
32 Affymetrix File Types DAT file: Raw (TIFF) op?cal image of the hybridized chip CDF File (Chip Descrip?on File): Provided by Affy, describes layout of chip CEL File: Processed DAT file (intensity/posi?on values) hap:// AffxFileFormats/cel.html CHP File: The CHP file contains summarized gene expression scores ader probe cells are analyzed; format is: Gene Avg. D Presence AFFX_CreX_at 48 A AFFX_BioB_at 149 P TXT File: Probeset expression values with annota?on (CHP file in text format) RPT File Generated by Affy sodware, report of QC info
33 Knowledge Discovery Process Consult the Domain Expert(s)
34 Data Quality Most data mining techniques can tolerate some level of imperfec?on in the data, but improving data quality can improve quality of analyses Main issues Noise Outliers Missing values Duplicate data Inconsistent data
35 There are Many Problems Facing Expression Analysis on the Biotech side Standardiza?on & quality control in the experiments (affects data quality at many levels) Cost 35
36 Problem in reproducibility of Lots of varia?on in arrays experimental data more than 100 experimental steps Sources of varia?on biological variability in each RNA extract each labeling reac?on is different each slide is a separate hybridiza?on spots on the slide are variable across slides (and within slides when double spoaed) each color is scanned separately Need Replicates and Sta?s?cs! 36
37 Outcome Noisy data Data preprocessing is necessary normaliza?on scaling Heavy reliance on sta?s?cs today 37
38 What do the spots (intensity measurements) represent? Fluorescence intensity is a measure of the rela?ve abundance of individual mrnas (expressed genes) in given samples e.g. experimental rela?ve to control But, gene expression experiments are run on mul?ple samples Why? We are trying to understand a dynamic process each sample only represents a snapshot Compare among samples (different arrays) Compare across a?me course of related samples
39 How can we use the data? We can only really depend on between sample fold change for Microarrays not absolute values or within sample comparisons (> fold change, in general) Take home message: Have to be careful when comparing between arrays; from experiment to experiment.
40 Pre processing Gene filtering control genes uninforma?ve genes Normaliza?on and scaling allows comparisons across arrays scaling to control dynamic range Transforma?on logarithmic transforma?on for improved sta?s?cal proper?es 40
41 Normaliza?on Cy5 signal (log 2 ) Cy3 signal (log 2 )
42 Take home Message Important to remember that once preprocessing, normaliza?on, transforma?on of the data have occurred, all downstream mining will be affected.
43 Data Representa?on Flat file Vector data Sparse matrix (text) data Sequence data (e.g. web or genomic) Time series Image data Spa?o temporal
44 Three levels of microarray gene expression data processing Brazma et al., Nature Genetics, 29: , 2001
45 Outcomes of Microarray Analysis Large, complex data sets of high dimensionality example of a rou?ne study: 50,000 genes from 20 samples approx. 1 2 X 10 6 pieces of data challenges for Bioinforma?cs annota?on, storage, retrieval, sharing of data informa?on from the data
46 Knowledge Discovery Process Consult the Domain Expert(s)
47 State of Microarray Data Wide availability of technology has given rise to a large number of distributed databases data scaaered among many independent sites (accessible via Internet) or not publicly available at all Need for standardiza?on!
48 MGED Group and Standardiza?on Issues Microarray Gene Expression Database (MGED) Group MGED is taking on the challenge of standardiza?on Four major projects
49 MGED Projects MIAME The formula?on of the minimum informa?on about a microarray experiment required to interpret and verify the results. MAGE The establishment of a data exchange format (MAGE ML) and object model (MAGE OM) for microarray experiments.
50 MGED Projects Ontologies The development of ontologies for microarray experiment descrip?on and biological material (biomaterial) annota?on in par?cular. Normaliza?on The development of recommenda?ons regarding experimental controls and data normaliza?on methods.
51 MAGE ML the XML representa?on of the MAGE OM the DTD (document type defini?on) is what is specified in MAGE_ML rules or declara?ons what tags can be used what tags contain
52 MAGE OM hap:// mage om.html mapping of microarray experimental workflow to the OM
53 DTD hap:// dtd
54 MAGE STK sodware toolkit defines an API to MAGE OM in Java, Perl, C++ Used to export data to MAGE_ML to store data in rela?onal database input data to analysis tools Reader: MAGE ML docs into objects Writer: objects into MAGE ML
55 Knowledge Discovery Process Consult the Domain Expert(s)
56 Data Mining Techniques Exploratory data analysis Descrip?ve modeling Predic?ve modeling PaAern discovery others
57 Exploratory Data Analysis Interac?ve and visual Insight and feel for the data in a broad sense Provide summaries e.g. max/min, mean/median, variance etc Visualiza?on Histograms, scaaerplots Useful for data valida?on or verifica?on Simple exploratory data analysis is invaluable Always get a cursory view of the data before applying data mining algorithms
58 PaAern Discovery Discover interes?ng local paaerns in data rather than to characterize data globally Market basket data Discover that if customers buy wine and bread, they buy cheese with a 0.9 probability Known as associa?on rules
59 Descrip?ve Modeling Build model for underlying process Simulate the data if needed Cluster analysis to find natural groups in the data Bayesian network to find dependency models among variables
60 Predic?ve Modeling Predict a variable Y, given a p dimensional vector X Classifica?on: Y is categorical Regression: Y is real valued Much like func?on approxima?on Learning the rela?onship between Y and X Sta?s?cs and machine learning have many algorithms for predic?ve modeling Emphasis is oden on predic?ve accuracy rather than understanding the model itself.
61 Mining of Expression Data Recall that: A gene expression paaern derived from a single microarray is simply a snapshot (one experimental sample vs reference) Usually want to understand a process or changes in expression over a collec?on of samples gene expression profile
62 Working with Gene Expression Data Hypothesis driven approaches Typically model oriented Descrip?ve sta?s?cs relying on prior knowledge and good design Discovery based Few, if any, a priori hypotheses Data driven and algorithm oriented Sta?s?cal algorithms Machine learning using heuris?c techniques 62
63 Tes?ng Hypotheses Based on prior biological knowledge Simplest look for individual differen?ally expressed genes fold changes ScaAerplot Sta?s?cal measures 63
64 64 ScaAerplot
65 Some simple sta?s?cs If we are looking at samples that seem to belong to two groups or condi?ons t test compares the means of two groups while accoun?ng for the standard error of the difference of the means ANOVA if want to extend the analysis to more than two groups 65
66 But, gene chips allow us to measure thousands of genes... Across mul?ple samples 66
67 Goal of Analysis of Expression Matrix Some sta?s?cal methods applied to: 1. Group similar genes together => groups of func?onally similar genes. 2. Group similar cell samples together. 3. Extract representa?ve genes in each group.
68 Typical approach Look for paaerns compare rows to find evidence for co regula?on of genes compare columns to find evidence for relatedness among samples 1) Choose a measure of similarity (distance) among the objects being compared each row or column is considered a vector in space 2) Then, group together objects (genes or samples) with similar proper?es is a mul?dimensional analysis
69 An experiment 12 Genes Expression values at 0, 2, 4, 6, 8 and 10 hours 69
70 Table 4.2 of Campbell/Heyer Name 0 hrs 2 hrs 4 hrs 6 hrs 8 hrs 10 hrs C D E F G H I J K L M N
71 Take logs C D E F G H I J K L M N Compare 71
72 How Similar are two Rows? How similar are the expressions of two genes? First we ll normalize each row Calculate the mean and standard devia?on for each gene Normalize each value by subtrac?ng the mean and dividing by the standard devia?on. 72
73 How Similar are two Rows? Calculate the Pearson Correla?on between pairs of rows Correla?on quan?fies the extent to which the expression paaerns of two genes go up or down together, regardless of their magnitudes. Calculated by taking the dot product of the two vectors > (pc '( ) ; row G '( )) ; row L 1.0 > (pc '( ) ; row G '( )) ; row D
74 Some other pairs Name 0 hrs 2 hrs 4 hrs 6 hrs 8 hrs 10 hrs C D E F G H I J K L M N > (pc '( ) ; row D '( )) ; row M > (pc '( ) ; row G '( )) ; row H
75 Pearson Correla?on pc(g,l) = 1 iden?cally expressed genes pc(g,d) =.897 similarly expressed genes pc(d,m) =.926 reciprocally expressed pc(g,h) =.909 also reciprocally expressed 75
76 Descrip?ve and Predic?ve Modeling Clustering Feature extrac?on/selec?on Classifica?on discrimina?on analysis
77 Analy?c Approaches Clustering: Identification of associations between data points; organization of data into groups Unsupervised Clustering: genes clustered by similarity/ correla?on, or other criteria based on X values no useful external informa?on about the Y variables ( the response), is used doesn t reveal groups of genes with special interest for?ssue discrimina?on Supervised Methods: grouping of variables (genes), controlled by informa?on about the X and Y variables supervised algorithms try to find gene clusters, whose average expression profile has great poten?al for explaining the response Y, i.e. for?ssue discrimina?on
78 Unsupervised Clustering Algorithms Hierarchical K means Self organizing maps Others
79 g e n e s samples Gene Expression Matrix & Hierarchical Clustering Eisen et al. content/full/95/25/14863
80 Theory Hierarchical Clustering works by sequen?ally joining the two nearest clusters and then hierarchically joining the next two closest clusters and so on in this fashion, joining the nearest clusters first and farthest clusters last. Ini?ally each individual data pt is set equal to one cluster
81 Hierarchical Clustering Algorithm Given a set of N items to be clustered, and an N*N distance (or similarity) matrix. 1. Start by assigning each item to a cluster, so that if you have N items, you will now have N clusters, each containing just one item. Let the distances (similari?es) between the clusters be defined as the same as the distances (similari?es) between the items they contain. 2. Find the closest (most similar) pair of clusters and merge them into a single cluster. You now have one cluster less. 3. Compute distances (similari?es) between the new cluster and each of the old clusters. 4. Repeat steps 2 and 3 un?l all items are clustered into a single cluster of size N.
82 Hierarchical in ac?on
83 Varia?ons of Hierarchical Algorithm Step 3 (compu?ng distances between the new cluster and each of the old clusters) can be done in several different ways. Single Linkage, average linkage and complete linkage. In single linkage the distance between clusters is equal to the shortest distance from any one member of one cluster to any one member of the other cluster. In Average linkage the distance between two clusters is defined as the average distance between any member of one cluster to any member of the other cluster. Complete linkage is defined as the the maximum distance from any one member of the first cluster to any one member of the second cluster.
84 Varia?ons of Hierarchical Algorithm Self Organizing Tree Algorithm Unsupervised neural network with a binary tree topology Combina?on of SOM and hierarchical clustering Run?me is approximately linear Faster than normal hierarchical method Uses divisive method In comparison to boaom up method of hierarchical
85 Advantages Hierarchical clustering results in a visual representa?on that is convenient for humans to analyze Unlike k means and SOM, does not have an a priori cluster number
86 Why cluster analysis may not be the answer Clustering methods typically require user inputs: Example: distance measure Clustering methods differ in the way that the number of clusters are specified. Clustering methods are oden sensi?ve to the ini?aliza?on condi?on (star?ng guess) Local vs. global sampling of clustering space
87 Cluster Analysis Challenges Noise in the data itself Large data sets most of the techniques currently used were not developed for mul?dimensional data What about networks? limita?on of cluster analysis: similarity in expression paaern suggests co regula?on but doesn t reveal cause effect rela?onships
88 Feature Selec?on & Classifica?on First, iden?fy features (genes) that discriminate between classes Then use features for classifica?on machine learning approach supervised analysis assignment of a new sample to a previously specified class, based on sample features and a trained classifier
89 Classic Example: Classifica?on of AML vs. ALL Comparing 2 acute leukemias acute myeloid leukemia (AML) acute lymphoid leukemia (ALL) Biological/Clinical Problems: previously, no single reliable test to dis?nguish them differ greatly in clinical course & response to treatments Golub et al., Science Oct :
90 Study Design Golub et al., Science Oct :
91
92 The prediction of a new sample is based on 'weighted votes' of a set of informative genes
93 Results of the study 1) Clustering of microarray data using tumors of known type found 1100 of 6817 genes correlated with class dis?nc?on 2) Forma?on of a class predictor = 50 most informa?ve genes used as a training set classifica?on of unknown tumors Golub et al., Science Oct :
94 Results How to test the validity of class predictors? Cross valida?on tests: The 50 gene predictor assigned 36 of the 38 samples as either AML or ALL and the remaining two as uncertain (PS < 0.3). All 36 predic?ons agreed with the pa?ents' clinical diagnosis; Independent test: The 50 gene predictor was applied to an independent collec?on of 34 leukemia samples. The predictor assigned 29 of the 34 samples, and the accuracy was 100%; Predic?on strength: median PS = 0.77 in cross valida?on and 0.73 in independent test (Fig. 3A).
95 Results Class discovery If the AML ALL dis?nc?on were not already known, could it have been discovered simply on the basis of gene expression?
96 Results Two cluster analysis (1). Cluster tumors by gene expression: A two cluster SOM was applied to automa?cally group the 38 ini?al leukemia samples into two classes on the basis of the expression paaern of all 6817 genes.
97 Results Determine whether puta?ve classes produced are meaningful. The clusters were first evaluated by comparing them to the known AML ALL classes (Fig. 4A). Class A1 contained mostly ALL (24 of 25 samples) and class A2 contained mostly AML (10 of 13 samples). The SOM was thus quite effec?ve at automa?cally discovering the two types of leukemia.
98 Results How could one evaluate such puta?ve clusters if the "right" answer were not already known? Class discovery could be tested by class predic?on; If puta?ve classes reflect true structure, then a class predictor based on these classes should perform well.
99
100
Downstream analysis of transcriptomic data
Downstream analysis of transcriptomic data Shamith Samarajiwa CRUK Bioinforma3cs Summer School July 2015 General Methods Dimensionality reduc3on methods (clustering, PCA, MDS) Visualizing PaKerns (heatmaps,
More informationGene Expression Data Analysis
Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based
More informationSTATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from
STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays Materials are from http://www.ohsu.edu/gmsr/amc/amc_technology.html The GeneChip high-density oligonucleotide arrays are fabricated
More informationBIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis
BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology
More informationData Mining for Biological Data Analysis
Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han
More informationMicroarrays & Gene Expression Analysis
Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed
More informationMicroarray Informatics
Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments
More informationCMSC423: Bioinformatic databases, algorithms and tools
CMSC423: Bioinformatic databases, algorithms and tools Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationCanadian Bioinforma2cs Workshops
Canadian Bioinforma2cs Workshops www.bioinforma2cs.ca Module #: Title of Module 2 1 Introduction to Microarrays & R Paul Boutros Morning Overview 09:00-11:00 Microarray Background Microarray Pre- Processing
More informationThe essentials of microarray data analysis
The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction
More informationadvanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA
advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents
More informationCMSC702: Computational systems biology and functional genomics
CMSC702: Computational systems biology and functional genomics Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,
More informationIntro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000
More informationCAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1
CAP 5510-9 BIOINFORMATICS Su-Shing Chen CISE 10/5/2005 Su-Shing Chen, CISE 1 Basic BioTech Processes Hybridization PCR Southern blotting (spot or stain) 10/5/2005 Su-Shing Chen, CISE 2 10/5/2005 Su-Shing
More informationMaking Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26
Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on Dr. Yanjun Qi 2017/11/26 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on
More informationMicroarray Informatics
Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments
More informationIdentification of biological themes in microarray data from a mouse heart development time series using GeneSifter
Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study
More informationPreprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT
Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT
More informationBioinformatics : Gene Expression Data Analysis
05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More information6. GENE EXPRESSION ANALYSIS MICROARRAYS
6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011
More informationAffymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy
Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT
More informationPathway Analysis Adding Func2onal Context to High- Throughput Results
Pathway Analysis Adding Func2onal Context to High- Throughput Results Stephen D. Turner, Ph.D. Bioinforma2cs Core Director bioinforma2cs@virginia.edu Outline Bioinforma2cs & the Bioinforma2cs Core Service
More informationDNA Microarray Data Oligonucleotide Arrays
DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental
More informationUser Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1
User Guide MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit Page 1 Case Western Reserve University February 2012 Page 2 Page 3 1 - Introduction This sec=on will introduce MAGNET:
More informationBioinformatics for Biologists
Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis
More informationFrom reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute
From reads to results: differen1al expression analysis with RNA seq Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute Purported benefits and opportuni1es of RNA seq All transcripts are
More informationOutline. Analysis of Microarray Data. Most important design question. General experimental issues
Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,
More informationMachine Learning Methods for Microarray Data Analysis
Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Marco F. Ramoni Machine Learning Methods for Microarray Data Analysis Marco F. Ramoni Children s Hospital Informatics
More informationOutline. Array platform considerations: Comparison between the technologies available in microarrays
Microarray overview Outline Array platform considerations: Comparison between the technologies available in microarrays Differences in array fabrication Differences in array organization Applications of
More informationProject Alloca,on and Guest Lecture BMS353
Project Alloca,on and Guest Lecture Today s Outline Part A : Summary of the module Alloca,on of projects Project Discussion Break Fes.ve treat -- Part B : Discussion based on your ques,ons from lecture
More informationComputational Biology I
Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)
More informationPathway Analysis in other data types
Pathway Analysis in other data types Alison Motsinger-Reif, PhD Associate Professor Bioinforma
More informationBioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence
More informationMicroarray Technique. Some background. M. Nath
Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique
More informationA very brief introduc0on to bioinforma0cs. Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute
A very brief introduc0on to bioinforma0cs Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute What bioinforma0cs does? Cataloguing Mining Modelling For lab biologists to look at favourite genes etc.
More informationIntroduction to Microarray Analysis
Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray
More informationDownstream analysis of ChIP- seq data
Downstream analysis of ChIP- seq data Shamith Samarajiwa Integra/ve Systems Biomedicine Group MRC Cancer Unit University of Cambridge CRUK Bioinforma/cs Summer School July 2015 ChIP- seq workflow overview
More informationExpression summarization
Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene.
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationIntroduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute
Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction
More informationFDA and the Regula/on of Next Genera/on Sequencing
FDA and the Regula/on of Next Genera/on Sequencing David Litwack, Ph.D. Personalized Medicine Staff Office of In Vitro Diagnos@cs and Radiological Health, FDA In Vitro Diagnos/cs in the Age of Precision
More informationIntroduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics
Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized
More informationMicroarray. Key components Array Probes Detection system. Normalisation. Data-analysis - ratio generation
Microarray Key components Array Probes Detection system Normalisation Data-analysis - ratio generation MICROARRAY Measures Gene Expression Global - Genome wide scale Why Measure Gene Expression? What information
More informationIntroduction to gene expression microarray data analysis
Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful
More informationAPPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION. A Thesis. Presented to
APPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for
More informationImage Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology
Image Analysis Lecture 3 Pre-Processing of Affymetrix Arrays Stat 697K, CS 691K, Microbio 690K 2 Affymetrix Terminology Probe: an oligonucleotide of 25 base-pairs ( 25-mer ). Based on Information from
More informationMeasuring and Understanding Gene Expression
Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics
More informationLecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics
Lecture: Genetic Basis of Complex Phenotypes 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms A Human Genealogy TCGAGGTATTAAC The ancestral chromosome From SNPS TCGAGGTATTAAC TCTAGGTATTAAC
More informationCOS 597c: Topics in Computational Molecular Biology. DNA arrays. Background
COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step
More informationSupervised Learning from Micro-Array Data: Datamining with Care
November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian
More informationBackground Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy
Background Correction and Normalization Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Feature Level Data Outline Affymetrix GeneChip arrays Two
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationGene Expression Technology
Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene
More informationGene expression analysis: Introduction to microarrays
Gene expression analysis: Introduction to microarrays Adam Ameur The Linnaeus Centre for Bioinformatics, Uppsala University February 15, 2006 Overview Introduction Part I: How a microarray experiment is
More informationDNA Microarrays and Clustering of Gene Expression Data
DNA Microarrays and Clustering of Gene Expression Data Martha L. Bulyk mlbulyk@receptor.med.harvard.edu Biophysics 205 Spring term 2008 Traditional Method: Northern Blot RNA population on filter (gel);
More informationDNA Microarray Technology
CHAPTER 1 DNA Microarray Technology All living organisms are composed of cells. As a functional unit, each cell can make copies of itself, and this process depends on a proper replication of the genetic
More informationCodeLink Human Whole Genome Bioarray
CodeLink Human Whole Genome Bioarray 55,000 human gene targets on a single bioarray The CodeLink Human Whole Genome Bioarray comprises one of the most comprehensive coverages of the human genome, as it
More informationNext Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs
Next Genera*on Sequencing II: Personal Genomics Jim Noonan Department of Gene*cs Personal genome sequencing Iden*fying the gene*c basis of phenotypic diversity among humans Gene*c risk factors for disease
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationIntroduction to Bioinformatics and Gene Expression Technology
Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA
More informationDNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center
DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Why DNA Chips? Functional genomics: get information about genes that is unavailable from sequence
More informationOur view on cdna chip analysis from engineering informatics standpoint
Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology
More informationIntroduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information Jarkko Salojärvi Lecture slides by
More informationIntroduction to Bioinformatics. Fabian Hoti 6.10.
Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction
More informationRecent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)
Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on
More informationSeven Keys to Successful Microarray Data Analysis
Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment
More informationExploration and Analysis of DNA Microarray Data
Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate
More informationHumboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture
Humboldt Universität zu Berlin Microarrays Grundlagen der Bioinformatik SS 2017 Lecture 6 09.06.2017 Agenda 1.mRNA: Genomic background 2.Overview: Microarray 3.Data-analysis: Quality control & normalization
More informationFeature Selection of Gene Expression Data for Cancer Classification: A Review
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression
More informationSta$s$cs for Genomics ( )
Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Website: http://www.biostat.jhsph.edu/~jleek/teaching/2011/genomics/ Class Times: MW, 10:30AM-11:50AM + R Lab TBA Grading: 20% Reading Assignments,
More informationBiology 644: Bioinformatics
Measure of the linear correlation (dependence) between two variables X and Y Takes a value between +1 and 1 inclusive 1 = total positive correlation 0 = no correlation 1 = total negative correlation. When
More informationUpstream/Downstream Relation Detection of Signaling Molecules using Microarray Data
Vol 1 no 1 2005 Pages 1 5 Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Ozgun Babur 1 1 Center for Bioinformatics, Computer Engineering Department, Bilkent University,
More informationIntegrative Genomics 1a. Introduction
2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering
More informationLecture 2: Population Structure Advanced Topics in Computa8onal Genomics
Lecture 2: Population Structure 02-715 Advanced Topics in Computa8onal Genomics 1 What is population structure? Popula8on Structure A set of individuals characterized by some measure of gene8c dis8nc8on
More informationMeasuring gene expression
Measuring gene expression Grundlagen der Bioinformatik SS2018 https://www.youtube.com/watch?v=v8gh404a3gg Agenda Organization Gene expression Background Technologies FISH Nanostring Microarrays RNA-seq
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationLecture #1. Introduction to microarray technology
Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing
More informationFirst steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes
First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from
More information10.1 The Central Dogma of Biology and gene expression
126 Grundlagen der Bioinformatik, SS 09, D. Huson (this part by K. Nieselt) July 6, 2009 10 Microarrays (script by K. Nieselt) There are many articles and books on this topic. These lectures are based
More informationDecoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics
Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in Computa8onal Genomics HMMs for Decoding Chromatin States Epigene8c modifica8ons of the genome have been associated with Establishing
More informationIntroduc)on to Pathway and Network Analysis
Introduc)on to Pathway and Network Analysis Alison Motsinger-Reif, PhD Associate Professor Bioinforma)cs Research Center Department of Sta)s)cs North Carolina State University Pathway and Network Analysis
More informationDeakin Research Online
Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM
More informationPopula'on Structure Computa.onal Genomics Seyoung Kim
Popula'on Structure 02-710 Computa.onal Genomics Seyoung Kim What is Popula'on Structure? Popula.on Structure A set of individuals characterized by some measure of gene.c dis.nc.on A popula.on is usually
More informationMicroarrays The technology
Microarrays The technology Goal Goal: To measure the amount of a specific (known) DNA molecule in parallel. In parallel : do this for thousands or millions of molecules simultaneously. Main components
More information3.1.4 DNA Microarray Technology
3.1.4 DNA Microarray Technology Scientists have discovered that one of the differences between healthy and cancer is which genes are turned on in each. Scientists can compare the gene expression patterns
More informationRelease Notes. JMP Genomics. Version 3.1
JMP Genomics Version 3.1 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive
More informationRNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
More informationFunctional genomics + Data mining
Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data
More informationAFFYMETRIX c Technology and Preprocessing Methods
Analysis of Genomic and Proteomic Data AFFYMETRIX c Technology and Preprocessing Methods bhaibeka@ulb.ac.be Université Libre de Bruxelles Institut Jules Bordet Table of Contents AFFYMETRIX c Technology
More informationCanadian Bioinforma3cs Workshops
Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationMoc/Bio and Nano/Micro Lee and Stowell
Moc/Bio and Nano/Micro Lee and Stowell Moc/Bio-Lecture GeneChips Reading material http://www.gene-chips.com/ http://trueforce.com/lab_automation/dna_microa rrays_industry.htm http://www.affymetrix.com/technology/index.affx
More informationDNA Microarrays Introduction Part 2. Todd Lowe BME/BIO 210 April 11, 2007
DNA Microarrays Introduction Part 2 Todd Lowe BME/BIO 210 April 11, 2007 Reading Assigned For Friday, please read two papers and be prepared to discuss in detail: Comprehensive Identification of Cell Cycle-related
More informationA Microarray Analysis Teaching Module. for Hamilton College. July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT
A Microarray Analysis Teaching Module for Hamilton College July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT Lecture Topics I. Uses of microarrays developed in 1987 a. To measure gene
More informationRNA Seq: Methods and Applica6ons. Prat Thiru
RNA Seq: Methods and Applica6ons Prat Thiru 1 Outline Intro to RNA Seq Biological Ques6ons Comparison with Other Methods RNA Seq Protocol RNA Seq Applica6ons Annota6on Quan6fica6on Other Applica6ons Expression
More informationA Prac'cal Guide to NCBI BLAST
A Prac'cal Guide to NCBI BLAST Leonardo Mariño-Ramírez NCBI, NIH Bethesda, USA June 2018 1 NCBI Search Services and Tools Entrez integrated literature and molecular databases Viewers BLink protein similarities
More information