Computational Biology I

Size: px

Start display at page:

Download "Computational Biology I"

Jocelin Cunningham
5 years ago
Views:

1 Computational Biology I Microarray data acquisition Gene clustering Practical

2 Microarray Data Acquisition H. Yang

From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol) mrna, rrna, trna Proteins Cell membrane DNA Others Hybridization on chip cdna (cdna 1, cdna

3 From Sample to Target cdna Sample Centrifugation (Buffer) Cell pellets lyse cells (TRIzol) mrna, rrna, trna Proteins Cell membrane DNA Others Hybridization on chip cdna (cdna 1, cdna 2, ) Isolation of total RNA (Isopropanol) Purification of mrna (GFX) mrna, rrna, trna mrna (mrna 1, mrna 2, ) RT (random hexamer primer) Labeling (Fluorescent dye/radioactive material)

4 Target cdna Binds with Probe cdna Spotted/synthesized cdna/oligos Probe DNA Denature Spotting or synthsis Target Labeled cdna generated from sample Hybridization of µl on a chip for h at C

5 141µm 2cm Spots on a Microarray 2cm 2cm 2cm 20,000 spot =141µm 141µm/spot cdna arrays/long oligo arrays 2-3 replicate spots per gene Negative control spots for evaluation of cross-hybridization Affymetrix chips: 1 mismatch for each perfect match spots with different sequences for each gene Perfect match Mismatch Background

6 What is the Corrected Intensity? Corrected intensity For a gene spot (perfect match) x = detected intensity background intensity non-specific binding For a negative control spot (mismatch) non-specific binding = detected intensity background intensity

Measured intensity of a pixel in the probe area Ns --- No.

7 What is the Corrected Intensity? Nb --- No. of pixels in the background area For genes For negative controls Measured intensity of a pixel in the probe area Ns --- No. of pixels in the probe area 1 Np x = Ip 1 Np In = Ip 1 Nb 1 Nb Ib In Ib Measured intensity of a pixel in the background area

8 Plastic/Nylon Membrane Microarry Medium 1 Medium 2 RNA isolation mrna purification Reverse transcription Radioactive labeling Hybridization Wash Scanning x y

9 Glass cdna Microarray Mutant Labeling with Cy5 Sample 1 RNA isolation mrna purification Reverse transcription Hybridization Wash Scanning Sample 2 Wildtype Labeling with Cy3 x y

10 Presentation of Microarray Data y x

11 Better Presentation of Microarray Data Arrays x y w z y 1000 Gene 1 Gene 2 x 1 x 2 y 1 y 2 w 1 w 2 z 1 z 2 Log ratio (M=log(y/x)) x Gene i Gene N Log mean intensity (A=log xy) x i x N y i y N w i w N z i z N

12 Gene Clustering H. Yang

13 Clustering Methods Hierarchical clustering Pairwise comparison Cluster tree Partitional clustering Self-organizing maps (SOM) Several distinguishing clusters

14 Hierarchical Clustering Comparison of two genes (gene groups) with increasing distance or decreasing similarity y x Distance Similarity

15 Hierarchical Clustering Distance: Pairwise approach Ratio/log ratio Gene i d(r i,r j ) = r i -r j Gene j Similarity: (r i -r i )(r j -r j ) s(r i,r j ) = (r i -r i ) 2 (r j -r j ) 2 Array Sample r i =(y i1 /x i, y i2 /x i,. ) r i = 1 N n r i i=1

16 Hierarchical Clustering Complete linkage Simple linkage Average linkage 2 5

17 Hierarchical Clustering Using expression ratio of 20% to 5% O 2 Expression at 20% compared to 5% O 2 is: Down regulated Not altered Up regulated Exp 1 Day

18 Partitional Clustering Self-organizing maps are employed A expression vector has three elements Only two elements y Gene Cluster x

19 Self-Organizing Maps (SOM) Iterative training y Gene Cluster x

20 Cluster Determination using SOM f Iterative approach ( h) = f ( h) + ( x f ( h)) 1 i ij j i τ i+ Positions of cluster h at two consecutive steps Learning rate Expression of gene j h =1,, M (number of clusters); j=1,,n (number of genes) Learning rate τ ij = α i ( ) (, ) d m i d h j,

21 Other Clustering Methods K-means KNN (K-nearest neighbors) Principle Component Analysis Neural Network Fuzzy clustering

22 Practical H. Yang

23 Two Sets of Microarray Data 1. Five T-cell culture samples on 5 separate microarrays Nylon membrane array with radioactive labeling Data already corrected and normalized 3000 spots on chip with 1250 genes in duplicate spots and 11 housekeeping genes plus >400 negative controls 2. One microarray with 2 samples from C. acetobutylicum fermentation cdna glass microarray with labeling dyes Cy3 & Cy5 Raw data without subtracting background and nonspecific binding 4000 spots on chip with 1200 genes in >triplicate spots and 120 spots as negative controls

24 1 st Microarray Data Set Given normalized ratios with 631 genes left TO perform Hierarchical clustering SOM clustering with 6 clusters

25 Hierachical Clustering with 1 st Microarray Data Set Cluster (Brown s Lab at Stanford University) Website Cluster TreeView Load File Hierarchical Clustering Average Linkage Clustering Load File

26 SOM Clustering with 1 st Microarray Data Set Cluster2 (Whitehead Institute, Center for Genome Research ) Website Cluster2 File Open Data analysis Find Classes SOM rows: 3 SOM cols: 2 Run Data View View Clusters Compute View

27 2 nd Microarray Data Set Given: raw data To perform Correction of measured intensities by subtracting background and non-specific binding Prefiltering Normalization (global log mean) Identification of differentially expressed genes (with 2.5 fold change) Plot original data and normalized data in A-M plot

28 Processing of 2 nd Microarray Data Set Correction of gene spot intensities and negative control intensities by subtracting corresponding background (non-specific binding ) intensities 1. Corrected* gene spot intensity = gene spot intensity background intensity 2. Corrected negative control intensity = negative control intensity background intensity * Needs to be further corrected by removal of nonspecific binding

29 Processing of 2 nd Microarray Data Set Prefiltering 1. Calculate mean and standard deviation (Std) of 116 negative control intensities 2. Further corrected gene spot intensity = Corrected gene spot intensity corrected negative control mean intensity 3. Final corrected gene intensity x x for x 2 Std x = 2 Std for x<2 Std

30 Processing of 2 nd Microarray Data Set Normalization 1. Using global log mean and calculate the mean intensities of Cy3 and Cy5 dyes 2. Using the mean intensity ratio (Cy3/Cy5) to correct the Cy5 intensities

31 Processing of 2 nd Microarray Data Set Identification of differentially expressed genes 1. Logarithmic intensity ratio (Cy5/Cy3) 2. Identify genes with 2.5 fold change 3. Identify up-regulated genes 4. Identify down-regulated genes

32 Processing of 2 nd Microarray Data Set Plot data in log(x)-log(y) and M-A diagrams 1. x=cy3 & y=cy5 2. plot log(x) vs log(y) 3. plot M=log(y/x) vs A=log(xy)/2