Our goal....to understanding (wisdom)...to knowledge...to information data

Size: px
Start display at page:

Download "Our goal....to understanding (wisdom)...to knowledge...to information data"

Transcription

1 Knowledge Discovery

2 Our goal...to understanding (wisdom)...to knowledge...to information data

3 Why do we need Knowledge Discovery? Data Explosion: web usage, automated data collec?on tools, mature database technology Too much data and too liale knowledge Humans not able to sid through the data effec?vely Computa?onal approaches to data analysis are required for the con?nually increasing, accumulated data

4 Poten?al Applica?ons Market analysis, customer rela?onship management Risk analysis and management Fraud detec?on Text mining newsgroups, , documents Web mining of logs, data streams for customiza?on, adver?sing, marke?ng Biology and Medicine many types of highthroughput data for diagnos?cs, predic?ve and personalized medicine

5 Link to image reference

6 Link to image reference

7

8 Even BeAer Consult the Domain Expert(s)

9 The Process Guided Discovery PBL Knowledge Discovery Learn through examples and prac?ce Same general approach may be applied to many different problem domains Select appropriate methods to customize approach No one right answer!

10 Running Example of KD Gene Expression Data Why a good example? Biotechnology advances created huge influx of data Biologists not equipped to analyze the data Computa?onal scien?sts didn t understand the biology KDD process sorely needed Has significantly advanced over the last 10 years

11 Papers Data preprocessing and transforma?on Quackenbush Need for standards MAGE ML Mining large datasets for paaerns Molecular Classifica?on of Cancer Golub et al.

12 A Typical Scenario Biologist designs and runs an experiment and delivers samples (along with $$) to the Func?onal Genomics lab for high throughput gene expression analysis. A couple weeks later biologist picks up a CD with mul?ple files containing the raw data and some preprocessed data not knowing how to analyze the data biologist calls in your help Where do we start? Understand the domain and the problems

13 High Throughput Systems for Studying Global Gene Expression are Complex Need to learn about and consider: the biology behind the experiments & the interpreta?on of the experiments How the data is acquired (biotechnology) the data issues 13

14 Biology Basics: The Flow of Informa?on A gene is expressed in 2 steps: DNA is transcribed into RNA (mrna) RNA is translated into protein 14

15 Genotype to Phenotype Individual cells in an organism have the same genes (DNA) the genotype but.not all genes are ac?ve (expressed) in each cell It is the expression of thousands of genes and their products (RNA, proteins), func?oning in a complicated and orchestrated way, that make a specific cell what it is. the phenotype 15

16 Gene Expression Depends on Context The subsets of genes that are expressed (RNA/ protein) will differ among cells,?ssues, organs, condi?ons the subset expressed confers unique proper?es to the cell neuron liver muscle muscle 16

17 Differen?al Gene Expression The level of expression of genes also differs with the cellular context i.e. the amount of a given RNA will vary We can think of gene expression (in higher organisms) as having both an on/off switch and volume control 17

18 What Biologists Want to Know: Specific PaAerns of Gene Expression Tissue/Cell type specific e.g. skin cell vs. brain cell e.g. kera?nocyte vs. melanocyte Developmental stage e.g. embryonic skin cell vs. adult skin cell Disease state e.g. normal skin cell vs. skin tumor cell Environment specific (drugs, toxins) e.g. skin cell untreated vs. treated 18

19 But also, the more difficult problem: Gene Networks Genes and their products are related through their roles in: metabolic pathways cell signalling networks 19

20 Metabolic Pathway From KEGG Database 20

21 Cell Signalling Networks dortmund.mpg.de/departments/dep1/signaltransduk?on/image3.gif 21

22 What can we learn by studying global paaerns of gene expression? Individual gene expression pa1erns Classifica5ons: for diagnosis, predic?on Groups of Genes Molecular taxonomy of disease Gene Networks/Pathways: Reconstruc?on of metabolic & regulatory pathways 22

23 Now that we have some understanding of the domain and goals What about the data? How are the data generated? Data type? Data quality? Need for data cleaning and preprocessing?

24 Knowledge Discovery Process Consult the Domain Expert(s)

25 GeneChip Oligonucleo?de Array High throughput gene expression analysis 25

26 Recall that DNA and RNA are composed of strings of nucleo?des A gene of interest will have a specific nucleo?de sequence DNA and RNA sequences can form bonds with complementary bases on another string called basepairing. When we do this experimentally we call it hybridiza?on and we can detect it by labeling one of the strings (aka strands) 26

27 GeneChip Expression Analysis Hybridiza?on and Staining Array Hybridized Array crna Target Streptavidin phycoerythrin conjugate Courtesy of M. Hessner, CAAGED Workshop

28 How do Affymetrix microarrays work? probes are picked to interrogate a gene, the idea is to get mul?ple measurements. Each probe is a 25mer oligonucleo?de that binds to a gene The collec?on of probes that are designed to hybridize to the same gene is called a probe set.may be tens of thousands of these probesets on a given chip Probe set names have iden?fica?on names called Affymetrix Ids, and look like 10329_g_at, etc. On any Genechip, some probesets are dedicated for Quality Control, these begin with AFFX_ Take home message: have to learn a lot of terminology

29 Affymetrix Chips ,000 Probes Perfect Match and Mismatch Average Difference Values Courtesy of J. Glasner CAAGED Workshop

30 Affymetrix Analysis High resolu?on image of the scanned microarray generates a DAT file Since the probes are laid out in a grid fashion, and each probe posi?on determined in terms of its X Y co ordinates, one can compute the PM and MM probe intensi?es from the pixelated image The CDF (chip defini?on file) library file contains the XY layout of every probe

31 Affymetrix Data Flow Hybridized GeneChip CDF file CHP file Scan Chip DAT file EXP file Process Image (GCOS) CEL file MAS5 (GCOS) TXT file RPT file GeneChip Opera?ng SoDware (GCOS) Affymetrix hap://

32 Affymetrix File Types DAT file: Raw (TIFF) op?cal image of the hybridized chip CDF File (Chip Descrip?on File): Provided by Affy, describes layout of chip CEL File: Processed DAT file (intensity/posi?on values) hap:// AffxFileFormats/cel.html CHP File: The CHP file contains summarized gene expression scores ader probe cells are analyzed; format is: Gene Avg. D Presence AFFX_CreX_at 48 A AFFX_BioB_at 149 P TXT File: Probeset expression values with annota?on (CHP file in text format) RPT File Generated by Affy sodware, report of QC info

33 Knowledge Discovery Process Consult the Domain Expert(s)

34 Data Quality Most data mining techniques can tolerate some level of imperfec?on in the data, but improving data quality can improve quality of analyses Main issues Noise Outliers Missing values Duplicate data Inconsistent data

35 There are Many Problems Facing Expression Analysis on the Biotech side Standardiza?on & quality control in the experiments (affects data quality at many levels) Cost 35

36 Problem in reproducibility of Lots of varia?on in arrays experimental data more than 100 experimental steps Sources of varia?on biological variability in each RNA extract each labeling reac?on is different each slide is a separate hybridiza?on spots on the slide are variable across slides (and within slides when double spoaed) each color is scanned separately Need Replicates and Sta?s?cs! 36

37 Outcome Noisy data Data preprocessing is necessary normaliza?on scaling Heavy reliance on sta?s?cs today 37

38 What do the spots (intensity measurements) represent? Fluorescence intensity is a measure of the rela?ve abundance of individual mrnas (expressed genes) in given samples e.g. experimental rela?ve to control But, gene expression experiments are run on mul?ple samples Why? We are trying to understand a dynamic process each sample only represents a snapshot Compare among samples (different arrays) Compare across a?me course of related samples

39 How can we use the data? We can only really depend on between sample fold change for Microarrays not absolute values or within sample comparisons (> fold change, in general) Take home message: Have to be careful when comparing between arrays; from experiment to experiment.

40 Pre processing Gene filtering control genes uninforma?ve genes Normaliza?on and scaling allows comparisons across arrays scaling to control dynamic range Transforma?on logarithmic transforma?on for improved sta?s?cal proper?es 40

41 Normaliza?on Cy5 signal (log 2 ) Cy3 signal (log 2 )

42 Take home Message Important to remember that once preprocessing, normaliza?on, transforma?on of the data have occurred, all downstream mining will be affected.

43 Data Representa?on Flat file Vector data Sparse matrix (text) data Sequence data (e.g. web or genomic) Time series Image data Spa?o temporal

44 Three levels of microarray gene expression data processing Brazma et al., Nature Genetics, 29: , 2001

45 Outcomes of Microarray Analysis Large, complex data sets of high dimensionality example of a rou?ne study: 50,000 genes from 20 samples approx. 1 2 X 10 6 pieces of data challenges for Bioinforma?cs annota?on, storage, retrieval, sharing of data informa?on from the data

46 Knowledge Discovery Process Consult the Domain Expert(s)

47 State of Microarray Data Wide availability of technology has given rise to a large number of distributed databases data scaaered among many independent sites (accessible via Internet) or not publicly available at all Need for standardiza?on!

48 MGED Group and Standardiza?on Issues Microarray Gene Expression Database (MGED) Group MGED is taking on the challenge of standardiza?on Four major projects

49 MGED Projects MIAME The formula?on of the minimum informa?on about a microarray experiment required to interpret and verify the results. MAGE The establishment of a data exchange format (MAGE ML) and object model (MAGE OM) for microarray experiments.

50 MGED Projects Ontologies The development of ontologies for microarray experiment descrip?on and biological material (biomaterial) annota?on in par?cular. Normaliza?on The development of recommenda?ons regarding experimental controls and data normaliza?on methods.

51 MAGE ML the XML representa?on of the MAGE OM the DTD (document type defini?on) is what is specified in MAGE_ML rules or declara?ons what tags can be used what tags contain

52 MAGE OM hap:// mage om.html mapping of microarray experimental workflow to the OM

53 DTD hap:// dtd

54 MAGE STK sodware toolkit defines an API to MAGE OM in Java, Perl, C++ Used to export data to MAGE_ML to store data in rela?onal database input data to analysis tools Reader: MAGE ML docs into objects Writer: objects into MAGE ML

55 Knowledge Discovery Process Consult the Domain Expert(s)

56 Data Mining Techniques Exploratory data analysis Descrip?ve modeling Predic?ve modeling PaAern discovery others

57 Exploratory Data Analysis Interac?ve and visual Insight and feel for the data in a broad sense Provide summaries e.g. max/min, mean/median, variance etc Visualiza?on Histograms, scaaerplots Useful for data valida?on or verifica?on Simple exploratory data analysis is invaluable Always get a cursory view of the data before applying data mining algorithms

58 PaAern Discovery Discover interes?ng local paaerns in data rather than to characterize data globally Market basket data Discover that if customers buy wine and bread, they buy cheese with a 0.9 probability Known as associa?on rules

59 Descrip?ve Modeling Build model for underlying process Simulate the data if needed Cluster analysis to find natural groups in the data Bayesian network to find dependency models among variables

60 Predic?ve Modeling Predict a variable Y, given a p dimensional vector X Classifica?on: Y is categorical Regression: Y is real valued Much like func?on approxima?on Learning the rela?onship between Y and X Sta?s?cs and machine learning have many algorithms for predic?ve modeling Emphasis is oden on predic?ve accuracy rather than understanding the model itself.

61 Mining of Expression Data Recall that: A gene expression paaern derived from a single microarray is simply a snapshot (one experimental sample vs reference) Usually want to understand a process or changes in expression over a collec?on of samples gene expression profile

62 Working with Gene Expression Data Hypothesis driven approaches Typically model oriented Descrip?ve sta?s?cs relying on prior knowledge and good design Discovery based Few, if any, a priori hypotheses Data driven and algorithm oriented Sta?s?cal algorithms Machine learning using heuris?c techniques 62

63 Tes?ng Hypotheses Based on prior biological knowledge Simplest look for individual differen?ally expressed genes fold changes ScaAerplot Sta?s?cal measures 63

64 64 ScaAerplot

65 Some simple sta?s?cs If we are looking at samples that seem to belong to two groups or condi?ons t test compares the means of two groups while accoun?ng for the standard error of the difference of the means ANOVA if want to extend the analysis to more than two groups 65

66 But, gene chips allow us to measure thousands of genes... Across mul?ple samples 66

67 Goal of Analysis of Expression Matrix Some sta?s?cal methods applied to: 1. Group similar genes together => groups of func?onally similar genes. 2. Group similar cell samples together. 3. Extract representa?ve genes in each group.

68 Typical approach Look for paaerns compare rows to find evidence for co regula?on of genes compare columns to find evidence for relatedness among samples 1) Choose a measure of similarity (distance) among the objects being compared each row or column is considered a vector in space 2) Then, group together objects (genes or samples) with similar proper?es is a mul?dimensional analysis

69 An experiment 12 Genes Expression values at 0, 2, 4, 6, 8 and 10 hours 69

70 Table 4.2 of Campbell/Heyer Name 0 hrs 2 hrs 4 hrs 6 hrs 8 hrs 10 hrs C D E F G H I J K L M N

71 Take logs C D E F G H I J K L M N Compare 71

72 How Similar are two Rows? How similar are the expressions of two genes? First we ll normalize each row Calculate the mean and standard devia?on for each gene Normalize each value by subtrac?ng the mean and dividing by the standard devia?on. 72

73 How Similar are two Rows? Calculate the Pearson Correla?on between pairs of rows Correla?on quan?fies the extent to which the expression paaerns of two genes go up or down together, regardless of their magnitudes. Calculated by taking the dot product of the two vectors > (pc '( ) ; row G '( )) ; row L 1.0 > (pc '( ) ; row G '( )) ; row D

74 Some other pairs Name 0 hrs 2 hrs 4 hrs 6 hrs 8 hrs 10 hrs C D E F G H I J K L M N > (pc '( ) ; row D '( )) ; row M > (pc '( ) ; row G '( )) ; row H

75 Pearson Correla?on pc(g,l) = 1 iden?cally expressed genes pc(g,d) =.897 similarly expressed genes pc(d,m) =.926 reciprocally expressed pc(g,h) =.909 also reciprocally expressed 75

76 Descrip?ve and Predic?ve Modeling Clustering Feature extrac?on/selec?on Classifica?on discrimina?on analysis

77 Analy?c Approaches Clustering: Identification of associations between data points; organization of data into groups Unsupervised Clustering: genes clustered by similarity/ correla?on, or other criteria based on X values no useful external informa?on about the Y variables ( the response), is used doesn t reveal groups of genes with special interest for?ssue discrimina?on Supervised Methods: grouping of variables (genes), controlled by informa?on about the X and Y variables supervised algorithms try to find gene clusters, whose average expression profile has great poten?al for explaining the response Y, i.e. for?ssue discrimina?on

78 Unsupervised Clustering Algorithms Hierarchical K means Self organizing maps Others

79 g e n e s samples Gene Expression Matrix & Hierarchical Clustering Eisen et al. content/full/95/25/14863

80 Theory Hierarchical Clustering works by sequen?ally joining the two nearest clusters and then hierarchically joining the next two closest clusters and so on in this fashion, joining the nearest clusters first and farthest clusters last. Ini?ally each individual data pt is set equal to one cluster

81 Hierarchical Clustering Algorithm Given a set of N items to be clustered, and an N*N distance (or similarity) matrix. 1. Start by assigning each item to a cluster, so that if you have N items, you will now have N clusters, each containing just one item. Let the distances (similari?es) between the clusters be defined as the same as the distances (similari?es) between the items they contain. 2. Find the closest (most similar) pair of clusters and merge them into a single cluster. You now have one cluster less. 3. Compute distances (similari?es) between the new cluster and each of the old clusters. 4. Repeat steps 2 and 3 un?l all items are clustered into a single cluster of size N.

82 Hierarchical in ac?on

83 Varia?ons of Hierarchical Algorithm Step 3 (compu?ng distances between the new cluster and each of the old clusters) can be done in several different ways. Single Linkage, average linkage and complete linkage. In single linkage the distance between clusters is equal to the shortest distance from any one member of one cluster to any one member of the other cluster. In Average linkage the distance between two clusters is defined as the average distance between any member of one cluster to any member of the other cluster. Complete linkage is defined as the the maximum distance from any one member of the first cluster to any one member of the second cluster.

84 Varia?ons of Hierarchical Algorithm Self Organizing Tree Algorithm Unsupervised neural network with a binary tree topology Combina?on of SOM and hierarchical clustering Run?me is approximately linear Faster than normal hierarchical method Uses divisive method In comparison to boaom up method of hierarchical

85 Advantages Hierarchical clustering results in a visual representa?on that is convenient for humans to analyze Unlike k means and SOM, does not have an a priori cluster number

86 Why cluster analysis may not be the answer Clustering methods typically require user inputs: Example: distance measure Clustering methods differ in the way that the number of clusters are specified. Clustering methods are oden sensi?ve to the ini?aliza?on condi?on (star?ng guess) Local vs. global sampling of clustering space

87 Cluster Analysis Challenges Noise in the data itself Large data sets most of the techniques currently used were not developed for mul?dimensional data What about networks? limita?on of cluster analysis: similarity in expression paaern suggests co regula?on but doesn t reveal cause effect rela?onships

88 Feature Selec?on & Classifica?on First, iden?fy features (genes) that discriminate between classes Then use features for classifica?on machine learning approach supervised analysis assignment of a new sample to a previously specified class, based on sample features and a trained classifier

89 Classic Example: Classifica?on of AML vs. ALL Comparing 2 acute leukemias acute myeloid leukemia (AML) acute lymphoid leukemia (ALL) Biological/Clinical Problems: previously, no single reliable test to dis?nguish them differ greatly in clinical course & response to treatments Golub et al., Science Oct :

90 Study Design Golub et al., Science Oct :

91

92 The prediction of a new sample is based on 'weighted votes' of a set of informative genes

93 Results of the study 1) Clustering of microarray data using tumors of known type found 1100 of 6817 genes correlated with class dis?nc?on 2) Forma?on of a class predictor = 50 most informa?ve genes used as a training set classifica?on of unknown tumors Golub et al., Science Oct :

94 Results How to test the validity of class predictors? Cross valida?on tests: The 50 gene predictor assigned 36 of the 38 samples as either AML or ALL and the remaining two as uncertain (PS < 0.3). All 36 predic?ons agreed with the pa?ents' clinical diagnosis; Independent test: The 50 gene predictor was applied to an independent collec?on of 34 leukemia samples. The predictor assigned 29 of the 34 samples, and the accuracy was 100%; Predic?on strength: median PS = 0.77 in cross valida?on and 0.73 in independent test (Fig. 3A).

95 Results Class discovery If the AML ALL dis?nc?on were not already known, could it have been discovered simply on the basis of gene expression?

96 Results Two cluster analysis (1). Cluster tumors by gene expression: A two cluster SOM was applied to automa?cally group the 38 ini?al leukemia samples into two classes on the basis of the expression paaern of all 6817 genes.

97 Results Determine whether puta?ve classes produced are meaningful. The clusters were first evaluated by comparing them to the known AML ALL classes (Fig. 4A). Class A1 contained mostly ALL (24 of 25 samples) and class A2 contained mostly AML (10 of 13 samples). The SOM was thus quite effec?ve at automa?cally discovering the two types of leukemia.

98 Results How could one evaluate such puta?ve clusters if the "right" answer were not already known? Class discovery could be tested by class predic?on; If puta?ve classes reflect true structure, then a class predictor based on these classes should perform well.

99

100

Downstream analysis of transcriptomic data

Downstream analysis of transcriptomic data Downstream analysis of transcriptomic data Shamith Samarajiwa CRUK Bioinforma3cs Summer School July 2015 General Methods Dimensionality reduc3on methods (clustering, PCA, MDS) Visualizing PaKerns (heatmaps,

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays Materials are from http://www.ohsu.edu/gmsr/amc/amc_technology.html The GeneChip high-density oligonucleotide arrays are fabricated

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 31 st January 2007 Aims To give a biologist s view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

CMSC423: Bioinformatic databases, algorithms and tools

CMSC423: Bioinformatic databases, algorithms and tools CMSC423: Bioinformatic databases, algorithms and tools Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Canadian Bioinforma2cs Workshops

Canadian Bioinforma2cs Workshops Canadian Bioinforma2cs Workshops www.bioinforma2cs.ca Module #: Title of Module 2 1 Introduction to Microarrays & R Paul Boutros Morning Overview 09:00-11:00 Microarray Background Microarray Pre- Processing

More information

The essentials of microarray data analysis

The essentials of microarray data analysis The essentials of microarray data analysis (from a complete novice) Thanks to Rafael Irizarry for the slides! Outline Experimental design Take logs! Pre-processing: affy chips and 2-color arrays Clustering

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

CMSC702: Computational systems biology and functional genomics

CMSC702: Computational systems biology and functional genomics CMSC702: Computational systems biology and functional genomics Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,

More information

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Intro to Microarray Analysis. Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000

More information

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1 CAP 5510-9 BIOINFORMATICS Su-Shing Chen CISE 10/5/2005 Su-Shing Chen, CISE 1 Basic BioTech Processes Hybridization PCR Southern blotting (spot or stain) 10/5/2005 Su-Shing Chen, CISE 2 10/5/2005 Su-Shing

More information

Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26

Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26 Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on Dr. Yanjun Qi 2017/11/26 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT Preprocessing Affymetrix GeneChip Data Credit for some of today s materials: Ben Bolstad, Leslie Cope, Laurent Gautier, Terry Speed and Zhijin Wu Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

6. GENE EXPRESSION ANALYSIS MICROARRAYS

6. GENE EXPRESSION ANALYSIS MICROARRAYS 6. GENE EXPRESSION ANALYSIS MICROARRAYS BIOINFORMATICS COURSE MTAT.03.239 16.10.2013 GENE EXPRESSION ANALYSIS MICROARRAYS Slides adapted from Konstantin Tretyakov s 2011/2012 and Priit Adlers 2010/2011

More information

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

Pathway Analysis Adding Func2onal Context to High- Throughput Results

Pathway Analysis Adding Func2onal Context to High- Throughput Results Pathway Analysis Adding Func2onal Context to High- Throughput Results Stephen D. Turner, Ph.D. Bioinforma2cs Core Director bioinforma2cs@virginia.edu Outline Bioinforma2cs & the Bioinforma2cs Core Service

More information

DNA Microarray Data Oligonucleotide Arrays

DNA Microarray Data Oligonucleotide Arrays DNA Microarray Data Oligonucleotide Arrays Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course 2003 Copyright 2002, all rights reserved Biological question Experimental

More information

User Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1

User Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1 User Guide MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit Page 1 Case Western Reserve University February 2012 Page 2 Page 3 1 - Introduction This sec=on will introduce MAGNET:

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute From reads to results: differen1al expression analysis with RNA seq Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute Purported benefits and opportuni1es of RNA seq All transcripts are

More information

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Outline. Analysis of Microarray Data. Most important design question. General experimental issues Outline Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization Introduction to microarrays Experimental design Data normalization Other data transformation Exercises George Bell,

More information

Machine Learning Methods for Microarray Data Analysis

Machine Learning Methods for Microarray Data Analysis Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Marco F. Ramoni Machine Learning Methods for Microarray Data Analysis Marco F. Ramoni Children s Hospital Informatics

More information

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Outline. Array platform considerations: Comparison between the technologies available in microarrays Microarray overview Outline Array platform considerations: Comparison between the technologies available in microarrays Differences in array fabrication Differences in array organization Applications of

More information

Project Alloca,on and Guest Lecture BMS353

Project Alloca,on and Guest Lecture BMS353 Project Alloca,on and Guest Lecture Today s Outline Part A : Summary of the module Alloca,on of projects Project Discussion Break Fes.ve treat -- Part B : Discussion based on your ques,ons from lecture

More information

Computational Biology I

Computational Biology I Computational Biology I Microarray data acquisition Gene clustering Practical Microarray Data Acquisition H. Yang From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol)

More information

Pathway Analysis in other data types

Pathway Analysis in other data types Pathway Analysis in other data types Alison Motsinger-Reif, PhD Associate Professor Bioinforma

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Microarray Technique. Some background. M. Nath

Microarray Technique. Some background. M. Nath Microarray Technique Some background M. Nath Outline Introduction Spotting Array Technique GeneChip Technique Data analysis Applications Conclusion Now Blind Guess? Functional Pathway Microarray Technique

More information

A very brief introduc0on to bioinforma0cs. Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute

A very brief introduc0on to bioinforma0cs. Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute A very brief introduc0on to bioinforma0cs Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute What bioinforma0cs does? Cataloguing Mining Modelling For lab biologists to look at favourite genes etc.

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

Downstream analysis of ChIP- seq data

Downstream analysis of ChIP- seq data Downstream analysis of ChIP- seq data Shamith Samarajiwa Integra/ve Systems Biomedicine Group MRC Cancer Unit University of Cambridge CRUK Bioinforma/cs Summer School July 2015 ChIP- seq workflow overview

More information

Expression summarization

Expression summarization Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene.

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

FDA and the Regula/on of Next Genera/on Sequencing

FDA and the Regula/on of Next Genera/on Sequencing FDA and the Regula/on of Next Genera/on Sequencing David Litwack, Ph.D. Personalized Medicine Staff Office of In Vitro Diagnos@cs and Radiological Health, FDA In Vitro Diagnos/cs in the Age of Precision

More information

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized

More information

Microarray. Key components Array Probes Detection system. Normalisation. Data-analysis - ratio generation

Microarray. Key components Array Probes Detection system. Normalisation. Data-analysis - ratio generation Microarray Key components Array Probes Detection system Normalisation Data-analysis - ratio generation MICROARRAY Measures Gene Expression Global - Genome wide scale Why Measure Gene Expression? What information

More information

Introduction to gene expression microarray data analysis

Introduction to gene expression microarray data analysis Introduction to gene expression microarray data analysis Outline Brief introduction: Technology and data. Statistical challenges in data analysis. Preprocessing data normalization and transformation. Useful

More information

APPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION. A Thesis. Presented to

APPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION. A Thesis. Presented to APPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for

More information

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology

Image Analysis. Based on Information from Terry Speed s Group, UC Berkeley. Lecture 3 Pre-Processing of Affymetrix Arrays. Affymetrix Terminology Image Analysis Lecture 3 Pre-Processing of Affymetrix Arrays Stat 697K, CS 691K, Microbio 690K 2 Affymetrix Terminology Probe: an oligonucleotide of 25 base-pairs ( 25-mer ). Based on Information from

More information

Measuring and Understanding Gene Expression

Measuring and Understanding Gene Expression Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics

More information

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics Lecture: Genetic Basis of Complex Phenotypes 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms A Human Genealogy TCGAGGTATTAAC The ancestral chromosome From SNPS TCGAGGTATTAAC TCTAGGTATTAAC

More information

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background COS 597c: Topics in Computational Molecular Biology Lecture 19a: December 1, 1999 Lecturer: Robert Phillips Scribe: Robert Osada DNA arrays Before exploring the details of DNA chips, let s take a step

More information

Supervised Learning from Micro-Array Data: Datamining with Care

Supervised Learning from Micro-Array Data: Datamining with Care November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian

More information

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Background Correction and Normalization. Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Background Correction and Normalization Lecture 3 Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Feature Level Data Outline Affymetrix GeneChip arrays Two

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 1. Case Study Data Sets Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Gene expression analysis: Introduction to microarrays

Gene expression analysis: Introduction to microarrays Gene expression analysis: Introduction to microarrays Adam Ameur The Linnaeus Centre for Bioinformatics, Uppsala University February 15, 2006 Overview Introduction Part I: How a microarray experiment is

More information

DNA Microarrays and Clustering of Gene Expression Data

DNA Microarrays and Clustering of Gene Expression Data DNA Microarrays and Clustering of Gene Expression Data Martha L. Bulyk mlbulyk@receptor.med.harvard.edu Biophysics 205 Spring term 2008 Traditional Method: Northern Blot RNA population on filter (gel);

More information

DNA Microarray Technology

DNA Microarray Technology CHAPTER 1 DNA Microarray Technology All living organisms are composed of cells. As a functional unit, each cell can make copies of itself, and this process depends on a proper replication of the genetic

More information

CodeLink Human Whole Genome Bioarray

CodeLink Human Whole Genome Bioarray CodeLink Human Whole Genome Bioarray 55,000 human gene targets on a single bioarray The CodeLink Human Whole Genome Bioarray comprises one of the most comprehensive coverages of the human genome, as it

More information

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs Next Genera*on Sequencing II: Personal Genomics Jim Noonan Department of Gene*cs Personal genome sequencing Iden*fying the gene*c basis of phenotypic diversity among humans Gene*c risk factors for disease

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Why DNA Chips? Functional genomics: get information about genes that is unavailable from sequence

More information

Our view on cdna chip analysis from engineering informatics standpoint

Our view on cdna chip analysis from engineering informatics standpoint Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology

More information

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information Jarkko Salojärvi Lecture slides by

More information

Introduction to Bioinformatics. Fabian Hoti 6.10.

Introduction to Bioinformatics. Fabian Hoti 6.10. Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

Exploration and Analysis of DNA Microarray Data

Exploration and Analysis of DNA Microarray Data Exploration and Analysis of DNA Microarray Data Dhammika Amaratunga Senior Research Fellow in Nonclinical Biostatistics Johnson & Johnson Pharmaceutical Research & Development Javier Cabrera Associate

More information

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture Humboldt Universität zu Berlin Microarrays Grundlagen der Bioinformatik SS 2017 Lecture 6 09.06.2017 Agenda 1.mRNA: Genomic background 2.Overview: Microarray 3.Data-analysis: Quality control & normalization

More information

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Feature Selection of Gene Expression Data for Cancer Classification: A Review Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Website: http://www.biostat.jhsph.edu/~jleek/teaching/2011/genomics/ Class Times: MW, 10:30AM-11:50AM + R Lab TBA Grading: 20% Reading Assignments,

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics Measure of the linear correlation (dependence) between two variables X and Y Takes a value between +1 and 1 inclusive 1 = total positive correlation 0 = no correlation 1 = total negative correlation. When

More information

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Vol 1 no 1 2005 Pages 1 5 Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Ozgun Babur 1 1 Center for Bioinformatics, Computer Engineering Department, Bilkent University,

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

Lecture 2: Population Structure Advanced Topics in Computa8onal Genomics

Lecture 2: Population Structure Advanced Topics in Computa8onal Genomics Lecture 2: Population Structure 02-715 Advanced Topics in Computa8onal Genomics 1 What is population structure? Popula8on Structure A set of individuals characterized by some measure of gene8c dis8nc8on

More information

Measuring gene expression

Measuring gene expression Measuring gene expression Grundlagen der Bioinformatik SS2018 https://www.youtube.com/watch?v=v8gh404a3gg Agenda Organization Gene expression Background Technologies FISH Nanostring Microarrays RNA-seq

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

Lecture #1. Introduction to microarray technology

Lecture #1. Introduction to microarray technology Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing

More information

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes

First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from

More information

10.1 The Central Dogma of Biology and gene expression

10.1 The Central Dogma of Biology and gene expression 126 Grundlagen der Bioinformatik, SS 09, D. Huson (this part by K. Nieselt) July 6, 2009 10 Microarrays (script by K. Nieselt) There are many articles and books on this topic. These lectures are based

More information

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in Computa8onal Genomics HMMs for Decoding Chromatin States Epigene8c modifica8ons of the genome have been associated with Establishing

More information

Introduc)on to Pathway and Network Analysis

Introduc)on to Pathway and Network Analysis Introduc)on to Pathway and Network Analysis Alison Motsinger-Reif, PhD Associate Professor Bioinforma)cs Research Center Department of Sta)s)cs North Carolina State University Pathway and Network Analysis

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM

More information

Popula'on Structure Computa.onal Genomics Seyoung Kim

Popula'on Structure Computa.onal Genomics Seyoung Kim Popula'on Structure 02-710 Computa.onal Genomics Seyoung Kim What is Popula'on Structure? Popula.on Structure A set of individuals characterized by some measure of gene.c dis.nc.on A popula.on is usually

More information

Microarrays The technology

Microarrays The technology Microarrays The technology Goal Goal: To measure the amount of a specific (known) DNA molecule in parallel. In parallel : do this for thousands or millions of molecules simultaneously. Main components

More information

3.1.4 DNA Microarray Technology

3.1.4 DNA Microarray Technology 3.1.4 DNA Microarray Technology Scientists have discovered that one of the differences between healthy and cancer is which genes are turned on in each. Scientists can compare the gene expression patterns

More information

Release Notes. JMP Genomics. Version 3.1

Release Notes. JMP Genomics. Version 3.1 JMP Genomics Version 3.1 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

RNAseq / ChipSeq / Methylseq and personalized genomics

RNAseq / ChipSeq / Methylseq and personalized genomics RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School

More information

Functional genomics + Data mining

Functional genomics + Data mining Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data

More information

AFFYMETRIX c Technology and Preprocessing Methods

AFFYMETRIX c Technology and Preprocessing Methods Analysis of Genomic and Proteomic Data AFFYMETRIX c Technology and Preprocessing Methods bhaibeka@ulb.ac.be Université Libre de Bruxelles Institut Jules Bordet Table of Contents AFFYMETRIX c Technology

More information

Canadian Bioinforma3cs Workshops

Canadian Bioinforma3cs Workshops Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Moc/Bio and Nano/Micro Lee and Stowell

Moc/Bio and Nano/Micro Lee and Stowell Moc/Bio and Nano/Micro Lee and Stowell Moc/Bio-Lecture GeneChips Reading material http://www.gene-chips.com/ http://trueforce.com/lab_automation/dna_microa rrays_industry.htm http://www.affymetrix.com/technology/index.affx

More information

DNA Microarrays Introduction Part 2. Todd Lowe BME/BIO 210 April 11, 2007

DNA Microarrays Introduction Part 2. Todd Lowe BME/BIO 210 April 11, 2007 DNA Microarrays Introduction Part 2 Todd Lowe BME/BIO 210 April 11, 2007 Reading Assigned For Friday, please read two papers and be prepared to discuss in detail: Comprehensive Identification of Cell Cycle-related

More information

A Microarray Analysis Teaching Module. for Hamilton College. July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT

A Microarray Analysis Teaching Module. for Hamilton College. July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT A Microarray Analysis Teaching Module for Hamilton College July 2008 Megan Cole Post-doctoral Associate Whitehead Institute, MIT Lecture Topics I. Uses of microarrays developed in 1987 a. To measure gene

More information

RNA Seq: Methods and Applica6ons. Prat Thiru

RNA Seq: Methods and Applica6ons. Prat Thiru RNA Seq: Methods and Applica6ons Prat Thiru 1 Outline Intro to RNA Seq Biological Ques6ons Comparison with Other Methods RNA Seq Protocol RNA Seq Applica6ons Annota6on Quan6fica6on Other Applica6ons Expression

More information

A Prac'cal Guide to NCBI BLAST

A Prac'cal Guide to NCBI BLAST A Prac'cal Guide to NCBI BLAST Leonardo Mariño-Ramírez NCBI, NIH Bethesda, USA June 2018 1 NCBI Search Services and Tools Entrez integrated literature and molecular databases Viewers BLink protein similarities

More information