This work is licensed under a Creative Commons ttribution-noncommercial-sharelike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2006, The Johns Hopkins University and Rafael. Irizarry. ll rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided S IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.
BIOINFORMTICS ND COMPUTTIONL BIOLOY SOLUTIONS USIN R ND BIOCONDUCTOR Biostatistics 140.688 Rafael. Irizarry Class Information http://www.biostat.jhsph.edu/~ririzarr/teaching/688 Data moslty from microarrays Download and install R 2.2.0 Download and install Bioconductor 1.7 Monday we talk about general methods Wednesday we introduce a problem and analyze related data using R Please bring a laptop on Wednesday If taking for grade final project requires data analysis otherwise literature review Class shaped as we go along Introduction to enome Biology and Microarray Technology Lecture 1 Credit for some of today s materials: Terry Speed, Sandrine Dudoit, Victor Jongeneel, iovanni Parmigiani 1
Today 1. Basics of Transcription 2. Basics of Hybridization Theory 3. How Microarrays Work Cells and the genome Each cell contains a complete copy of an organism s genome, or blueprint for all cellular structures and activities. The genome is distributed along chromosomes, which are made of compressed and entwined DN. Cells are of many different types (e.g. blood, skin, nerve cells), but all can be traced back to a single cell, the fertilized egg. Why are cells different? 2
ene expression experiments measure the amount of mrn to see which genes are being expressed in (used by) the cell. Measuring protein levels directly is also possible, but is currently harder. deoxyribonucleic acid or DN molecule is a double-stranded polymer composed of four basic molecular units called nucleotides. Each nucleotide comprises a phosphate group, a deoxyribose sugar, and one of four nitrogen bases: adenine (), guanine (), cytosine (C), and thymine (T). The two chains are held together by hydrogen bonds between nitrogen bases. Base-pairing occurs according to the following rule: pairs with C, and pairs with T. DN Cells and the genome (protein-coding) gene is a segment of chromosomal DN that directs the synthesis of a protein n intermediate step is the gene being transcribed or expressed Most microarray experiments measure gene expression 3
Transcription DN T T C C T C C T T RN polymerase mrn U U C C From DN to mrn Reverse transcription Clone cdn strands, complementary to the mrn mrn cdn U U C C U C Reverse transcriptase T T C T T C T T C T C T C C T T C C T T T T C T T What are we measuring? We call what we want to measure the target The amount of RN transcripts Expression arrays RT-PCR The existence or abundance of a DN sequence SNP chips, Tiling arrays Yeast mutant representation With T arrays Notice all of them are Nucleic cid molecules uniquely defined by a sequence of bases 4
Nucleic acid hybridization Microarrays: the game plan Use hybridization to measure abundance of target molecule Fix probes to a solid support and create features Hybridize labeled target to probes and wash to get rid of non-hybridized material Use labels to measure feature intensity Hybridization Target (RN) CTT CT 5
Hybridization Labeled Target CTT CT Features or Probes TCT CT Technology Overview Various platforms: Probes can be sequenced or cloned Features can be high-density or circles in a grid One or two samples hybridized to array Sequenced (High density) 6
Spotted Before Labeling Sample 1 Sample 2 rray 1 rray 2 Before Hybridization: One Channel Sample 1 Sample 2 rray 1 rray 2 7
fter Hybridization rray 1 rray 2 Scanner Image rray 1 rray 2 Quantification 4 2 0 3 0 4 0 3 rray 1 rray 2 8
Microarray Image Before Labeling: Two Channel Sample 1 Sample 2 rray 1 Before Hybridization Sample 1 Sample 2 rray 1 9
fter Hybridization rray 1 Scanner Image rray 1 Quantification 4,0 2,4 0,0 3,3 rray 1 10
Microarray Image More on Spotted rrays Pins collect cdn from wells Print-tip group 1 cdn clones Spotted in duplicate 384 well plate Contains cdn probes lass Slide rray of bound cdn probes 4x4 blocks = 16 print-tip groups Print-tip group 6 11
Image analysis With the images in place, we have data for first time First step is image analysis: determine which pixels are part of features and which are not We leave this to the company engineers although some academics have attacked the problems Feature Level Data Image analysis software produces feature level data This is where we starts First step is to get a hold of the files with this data and parse them Currently most files are CEL (ffymetrix), XYS (Nimblegen), and PR (Two color platforms read with genepix scanner). But others exists! We also need to match each feature with a target molecule of interest. This is sometimes done in another file. What we can learn Deal with background noise Normalize across arrays The probe effect Find differentially expressed genes The multiple comparison problem Experimental design Clustering and classification Time series experiments nnotation Using gene information New applications: SNP chips, tiling arrays, etc 12