Lecture 11 Microarrays and Expression Data

Size: px
Start display at page:

Download "Lecture 11 Microarrays and Expression Data"

Transcription

1 Introduction to Bioinformatics for Medical Research Gideon Greenspan Lecture 11 Microarrays and Expression Data

2 Genetic Expression Data Microarray experiments Applications Expression Data Clustering EPClust tool GEO database Data accuracy issues 2

3 DNA Microarrays First introduced in 1987 Still undergoing much development Small piece of glass Thousands/millions of cells Each cell contains a DNA probe Millions of copies bound to glass Cells capture their reverse complement Ordinary DNA/RNA base pair hybridization 3

4 Experimental Protocol Identify RNA/DNA sequences of interest Design probes that are sequence-specific Extract molecules from cell environment Label molecules with flourescent dye Pour solution onto microarray Then wash off excess molecules Shine laser light onto array Scan for presence of flourescent dye 4

5 Comparitive Studies Many factors affect binding process Relative dye intensity between cells is poor measure of relative DNA/RNA quantity Solution: compare against control For each cell, compare DNA/RNA quantity between the two samples Experiment and control use different dyes Red and green (yellow together) 5

6 6

7 Microarray Applications Obtain DNA sequences Array contains all possible short words Identify single nucleotide polymorphisms Array contains reverse complement of both possible sequences of region Test for genetic expression Array contains reverse complement of mrna sequences for all organism genes 7

8 Expression Data Applications Identify genes whose function is related Similar expression in group in many cases Find genes expressed in specific tissues Different expression in different cells Find genes affected by environment Different expression under different conditions Distinguish different forms of a disease Different expression in different patients 8

9 Microarray Images One tissue or condition One gene or mrna Original Image Summary 9

10 Expression Data Format Conditions Genes / mrnas normal hot cold uch gut fip msh vma meu git sec7b apn wos

11 Clustering Hierarchical clustering: generate a tree Each gene is a leaf on the tree Distances reflect similarity of expression Internal nodes represent functional groups Similar approach to phylogenetic trees k-means clustering: generate k groups Number k is chosen in advance Each group represents similar expression 11

12 Hierarchical Clustering Example 12

13 Expression Correlation Causes of similar expression between genes One gene controls the other in a pathway Both genes are controlled by another Both genes relate to same time in cell cycle Both genes have similar function Clusters can help identify regulatory motifs Search for motifs in upstream promoter regions of all the genes in a cluster 13

14 EPClust Input (1) Expression data matrix Extra annotation for gene rows Method of tabulation Name for further analysis 14

15 EPClust Input (2) Method of measuring distance between gene rows Cluster hierarchically Number k of means Cluster into k means 15

16 GEO: Gene Expression Omnibus NCBI database for gene expression data Founded at end of 2000 Platform: set of gene probes (rows) Series: set of platform samples (matrix) Sample: expression levels under one condition (column) 16

17 Querying GEO Browse records Search for entries containing a gene Search for experiments Search with Entrez 17

18 Probe Selection Probe on DNA chip is shorter than target Choice of which section to hybridize Select a region which is unstructured RNA folding, DNA stem-and-loop Choose region which is target-specific Avoid cross-hybridization with other DNA Avoid regions containing variation Minimize presence of SNP sites 18

19 Sources of Inaccuracy Some sequences bind better than others Cross-hybridization, A T versus G C Scanning of microarray images Scratches, smears, cell spillage Effects of experimental conditions Point in cell cycle, temperature, density Tons of data, much less information! 19

20 Other Resources List of gene expression web resources Another list with literature references Cancer Gene Anatomy Project Stanford Microarray Database 20