SIMS2003 Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School Introduction to Microarray Technology.
Lecture 1 I. EXPERIMENTAL DETAILS II. ARRAY CONSTRUCTION III. IMAGE ANALYSIS Lecture 2 IV. DATA ANALYSIS: V. DATA INTEGRATION
Microarray is about gene expression. Why bother? Gene expression All information about living being is coded in DNA as a set of genes. Each gene contains structural information about protein sequence and regulatory information about protein expression. Intermediate step between gene and protein is mrna. The concentration of mrna is measured by microarray.
Measuring RNA we know expression profile of cell that defines cell properties and functions Problem: RNA levels and protein levels are not always directly correlated: But no mrna no protein, but relation is not simple and not universal. Functional genomics fill the gap between gene expression and organism function The meaning of life is hidden in gene expression value but it is not easy to get it out
DNA mrna DNA array: Universal Easy to measure Scalable Protein array each protein unique not easy to measure, not scalable PROTEIN
1600 1200 Microarray Publication 1493 Pubmed analysis using keyword microarray total review 800 400 0 1995 1996 1997 1998 1999 2000 2001 2002
Complimentary hybridization the basis of RNA measurement A--T G--C T--A C--G Northern Blot/Southern Blot put RNA on support (membrane, glass) and measure the concentration of unknown sample RNA protection assay NMDAR1 bl 2 5 20 50 B L RT PCR quantitative measurement by amplification
Dot Blot is closest relative for microarray Total RNA -target obe is a labeled fragment used to measure oncentration of unknown sample Probe becomes target Reverse northern blot Difference: Size 1-3 mm vs 0.05-0.1 mm Switch target and probe Microarray Result: revolution 1. Miniaturization 2. Multiple probe 3. Sensitivity 4.Excitement!!!
Selection of cdna - Selection of sequences that represent the gene of interest.. - Finding sequences, usually in the EST database. - Problems : sequencing errors, alternative splicing, chimeric sequences, contamination selected DNA target DNA SEQUENCING ARRAYING
Array Fabrication cdna clones or oligonucleotides (probes) printing microarray 0.1nl/spot Apply target to slide 25x75 mm
Source: Affymetrix website
I. ARRAY CONSTRUCTION WHOLE GENOME ARRAY: e.coli yeast HIV PATHWAY ARRAY apoptotic toxicology array DISCOVERY ARRAY known and novel genes
DISCOVERY ARRAY gene selection: specific tissue library subtractive library genes of interest
DrugAbuse Array (ver. 2) Brain cdna library: ~ 1100 genes Subtractive library:~ 900 genes (following chronic opioid administration) Ver. 3: Ver. 2 + NIA15K set + kidney cdna library NIA15 K set (~15,000 genes): mouse embryonic libraries Embryonic Kidney libraries (~ 2,200 genes)
DrugAbuse array about 20,000 spots
DrugAbuse Array Gene class distribution Regulatory 44% Metabolic 35% Structural 21%
DrugAbuse Array Regulatory gene distribution Cell division & Differentiation 30% Intracellular messengers 20% Membrane associated 15% Transcription factors 45%
A. Synthesis of cdna EXPERIMENTAL PROTOCOL AAAAAA- mrna TTTTTTT- Synthesis of the second strand DNA B. Labeling cdna Single channel Multiple channel 350 nm 480 nm 570 nm 680 nm Cy3 Cy5 C. Hybridization D. Scanning
Control Treatment RNA extraction reverse transcription and labeling Red dye Cy5 Green dye Cy3 hybridization
laser 2 excitation laser 1 II. Image Analysis emission scanning Presentation: overlay images Image analysis
Scanner Details Laser PMT A/D Convertor Dye Photons Electrons Signal excitation amplification filtering
The computer (digital) image is two dimensional array of numbers of pixel intensity (z) z =f(x;y) x, y pixel location 8 bit image corresponds to 256 (2 8) levels of intensity 16 bit image corresponds to 65,000 (2 16-1) gray levels of intensity Image can be considered as a realization of a stochastic process, as a sample of whole class of possible images. Each pixel has a probability distribution f(z). This representation of image used in image inference: to reconstruct the characteristic of true unobservable image X from observed image Z using various statistical methods (MMS, ML, Bayesian estimation etc.)
Image structure
Steps in Images Processing I. Gridding: locate spots II. Segmentation: classification of pixels either as signal or background. III. Measurement: for each spot of the array, calculates signal intensity (mean,median,mode) background and quality measures. Assumption: signal intensity ~ mrna level
Local Background Mixed density distribution F(z)=pf(s)+(1-p)f(b) Local background Signal+background
Steps in Image Processing 3. Data Extraction Spot Intensities mean (pixel intensities). median (pixel intensities). M-estimators for location of pixel intensity distribution Background values Local Constant (global) None Quality Information Pixel distribution. Background Signal
Quality Measurements Spot Signal / Background ratio. Variation in pixel intensities. shape (circularity) Array Correlation between spot intensities. Percentage of bad spots. Distribution of spot signal area.
Pixel distribution C3_5 No of obs Expected Normal Upper Boundaries (x <= boundary)
Pixel distribution 50 B3_6 45 40 35 30 No of obs 25 20 15 10 5 0-100 0 100 300 500 700 900 1100 1300 1500 200 400 600 800 1000 1200 1400 1600 Expected Normal Upper Boundaries (x <= boundary)
Pixel distribution C3_7 No of obs Expected Normal Upper Boundaries (x <= boundary)
IIB. Image Transformation: BACKGROUND CORRECTION: subtract local background from each pixel BACKGROUND SUBTRACTION: subtract local background from mean or median POWER FAMILY OF TRANSFORMATION: Xt=X p, p 0 Xt= lnx, p=0 monotonous, differentiable can adjust nonlinearity and asymmetry
pixel distribution and background correction C3-4 C3-4-bgr C3-3 C3-3-bgr
Background subtracted Power 3/4 Power 1/2 Power 1/4 logarithmic Logarithmic+ background subtraction
Power Family of Transformation and Background Correction NORMAL BGR CORRECTION logarithmic POWER 0.25 POWER 0.5 POWER 0.75
No background correction Background correction Background subtraction Power 0.25
Data diagnostic:non-transformed image Control: label with Cy3-dUTP Stress: label with Cy5 dutp
Data diagnostic:background subtraction Control: label with Cy3-dUTP Stress: label with Cy5 dutp
Data diagnostic:power 1/2 image transformation Control: label with Cy3-dUTP Stress: label with Cy5 dutp
Data diagnostic:background (mean+2s) correction Control: label with Cy3-dUTP Stress: label with Cy5 dutp
Image Analysis Algorithm: optimize scanning parameters spot quantification background determination background subtraction or correction (µ + 1.5 2s) DATA ANALYSIS apply regression analysis detect differentially expressed genes pattern recognition and functional analysis
Normalization: Why We need to remove systematic error Within slide normalization Between slide normalization
Based on data from P.Brown et
Normalization: How Question: What kind of normalization should be applied: No normalization Global normalization with nonlinearity (lowess) correction Using normalization by regression (log-transform ratios-based normalization) Using non-changed gene Housekeeping genes Block-plate (Print-tip) normalization
Spinal cord injury
Spinal cord injury
PKC knockout mouse
PKC knockout mouse
next lecture: DATA ANALYSIS STATISTICAL ANALYSIS array as list of gene PATTERN RECOGNITION expression profile classification FUNCTIONAL ANALYSIS content analysis, gene network construction