Predicting Microarray Signals by Physical Modeling. Josh Deutsch. University of California. Santa Cruz

Similar documents
3.1.4 DNA Microarray Technology

Gene Expression Technology

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

DNA Arrays Affymetrix GeneChip System

Introduction to Bioinformatics. Fabian Hoti 6.10.

Bioinformatics III Structural Bioinformatics and Genome Analysis. PART II: Genome Analysis. Chapter 7. DNA Microarrays

DNA Microarray Data Oligonucleotide Arrays

Lecture #1. Introduction to microarray technology

CHAPTER 9 DNA Technologies

Technical Review. Real time PCR

Gene Expression Transcription/Translation Protein Synthesis

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

Computational Biology I LSM5191

Lecture Four. Molecular Approaches I: Nucleic Acids

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Transcription in Eukaryotes

Protein Synthesis: Transcription and Translation

Bio 101 Sample questions: Chapter 10

EECS730: Introduction to Bioinformatics

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Replication Review. 1. What is DNA Replication? 2. Where does DNA Replication take place in eukaryotic cells?

The Polymerase Chain Reaction. Chapter 6: Background

DNA replication: Enzymes link the aligned nucleotides by phosphodiester bonds to form a continuous strand.

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

AP Biology. Chapter 20. Biotechnology: DNA Technology & Genomics. Biotechnology. The BIG Questions. Evolution & breeding of food plants

Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

Intelligent DNA Chips: Logical Operation of Gene Expression Profiles on DNA Computers

Chapter 15 Gene Technologies and Human Applications

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

qpcr Quantitative PCR or Real-time PCR Gives a measurement of PCR product at end of each cycle real time

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

From Gene to Protein Transcription and Translation

Make the protein through the genetic dogma process.

Chapter 8 Lecture Outline. Transcription, Translation, and Bioinformatics

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Recombinant DNA Technology. The Role of Recombinant DNA Technology in Biotechnology. yeast. Biotechnology. Recombinant DNA technology.

Microarray Technique. Some background. M. Nath

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

PROTEIN SYNTHESIS. copyright cmassengale

Chapter 14 Active Reading Guide From Gene to Protein

LATE-PCR. Linear-After-The-Exponential

Transcription Eukaryotic Cells

Chapter 14: Gene Expression: From Gene to Protein

Genetic Engineering & Recombinant DNA

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives

Protein Synthesis Notes

What does PLIER really do?

The Double Helix. DNA and RNA, part 2. Part A. Hint 1. The difference between purines and pyrimidines. Hint 2. Distinguish purines from pyrimidines

RNA and Protein Synthesis

TRANSCRIPTION AND TRANSLATION

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Chapter 17: From Gene to Protein

Bundle 6 Test Review

DNA Transcription. Visualizing Transcription. The Transcription Process

7.2 Protein Synthesis. From DNA to Protein Animation

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016

RNA & PROTEIN SYNTHESIS

DNA & Protein Synthesis #21

Do you remember. What is a gene? What is RNA? How does it differ from DNA? What is protein?

Nucleic Acids: DNA and RNA

Ch. 10 Notes DNA: Transcription and Translation

DNA makes RNA makes Proteins. The Central Dogma

TRANSCRIPTION COMPARISON OF DNA & RNA TRANSCRIPTION. Umm AL Qura University. Sugar Ribose Deoxyribose. Bases AUCG ATCG. Strand length Short Long

Protein Synthesis. OpenStax College

Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme, RNA polymerase.

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

STUDY GUIDE SECTION 10-1 Discovery of DNA

If Dna Has The Instructions For Building Proteins Why Is Mrna Needed

DNA RNA PROTEIN SYNTHESIS -NOTES-

Department. Zoology & Biotechnology QUESTION BANK BIOTECHNOLOGY SEMESTER-V

Optimizing Synthetic DNA for Metabolic Engineering Applications. Howard Salis Penn State University

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).

Gene Signal Estimates from Exon Arrays

REGULATION OF GENE EXPRESSION

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

Click here to read the case study about protein synthesis.

Unit VII DNA to RNA to protein The Central Dogma

Transcription and Translation

Roche Molecular Biochemicals Technical Note No. LC 10/2000

Chapter 13 - Concept Mapping

Goals of pharmacogenomics

Activity A: Build a DNA molecule

SPH 247 Statistical Analysis of Laboratory Data

Reliable classification of two-class cancer data using evolutionary algorithms

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

AP Biology Gene Expression/Biotechnology REVIEW

Gene Expression: Transcription

Year III Pharm.D Dr. V. Chitra

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

From Gene to Protein Transcription and Translation i

Lecture 25 (11/15/17)

Transcription:

Predicting Microarray Signals by Physical Modeling Josh Deutsch University of California Santa Cruz Predicting Microarray Signals by Physical Modeling p.1/39

Collaborators Shoudan Liang NASA Ames Onuttom Narayan UCSC Predicting Microarray Signals by Physical Modeling p.2/39

Outline Background Gene transcription and regulation What are microarrays Example application: Cancer diagnosis The problem: how to improve sensitivity? Microarrays in detail Physical constraints on how well they can work Different approaches to signal extraction Physical model of the system Different types of binding Experimental comparison Comparison with other approaches Conclusions Predicting Microarray Signals by Physical Modeling p.3/39

Gene transcription Transcription factors DNA RNA polymerase A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A spliceosome mrna Predicting Microarray Signals by Physical Modeling p.4/39

Gene transcription DNA A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) Transcription factors RNA polymerase DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A Producing pre-mrna spliceosome mrna Predicting Microarray Signals by Physical Modeling p.4/39

Gene transcription DNA A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) Transcription factors RNA polymerase DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A Producing pre-mrna spliceosome Which is spliced to get rid of junk mrna Predicting Microarray Signals by Physical Modeling p.4/39

Gene transcription protein mrna Ribosome The ribosome translates every three base pairs into one amino acid e.g. CAU Histidine. Predicting Microarray Signals by Physical Modeling p.5/39

Gene transcription protein mrna Ribosome The ribosome translates every three base pairs into one amino acid e.g. CAU Histidine. (Miller, Hamkalo, and Thomas) Predicting Microarray Signals by Physical Modeling p.5/39

Gene regulation promoter site TATAAAA DNA Sequence specific transcription factors Transcription factors RNA polymerase Many genes and cell types use the same proteins to regulate transcription. A specific combination of these is probably needed to turn on a gene. Predicting Microarray Signals by Physical Modeling p.6/39

Genetic networks The state of the cell can be understood from the way different proteins regulate each other and respond to external inputs. Predicting Microarray Signals by Physical Modeling p.7/39

What are microarrays? Gene 1 Gene 2 Gene 3 Gene 4 Fabricate a chip by synthesizing 25 base pair oligomers attached to a substrate. Predicting Microarray Signals by Physical Modeling p.8/39

What are microarrays? Gene 1 Gene 2 Gene 3 Gene 4 Fabricate a chip by synthesizing 25 base pair oligomers attached to a substrate. Predicting Microarray Signals by Physical Modeling p.8/39

Linkage to substrate E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999). The oligomer probes are attached to the surface via polymeric linker molecules. Their density is about 1 probe every 40 square angstroms. Predicting Microarray Signals by Physical Modeling p.9/39

Sample preparation Extract mrna from cell samples mrna Predicting Microarray Signals by Physical Modeling p.10/39

Sample preparation Extract mrna from cell samples mrna mrna reverse transcriptase cdna Predicting Microarray Signals by Physical Modeling p.10/39

Hybridization to array Add fluorescent tags to targets: Predicting Microarray Signals by Physical Modeling p.11/39

Hybridization to array Add fluorescent tags to targets: Then hybridize to probes: Predicting Microarray Signals by Physical Modeling p.11/39

Linkage to substrate E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999). In many experiments, target molecules can be longer or shorter than probe molecules. When they re longer, secondary structure formation can also occur. Predicting Microarray Signals by Physical Modeling p.12/39

A real microarray From Liming Shi /em gene-chips.com Predicting Microarray Signals by Physical Modeling p.13/39

Affymetrix gene-chip K.A. Baggerly (2002) This is a 640 640 cell U95Av2 chip. Note the vertical bands which are evidence of misalignment with the scanner. Predicting Microarray Signals by Physical Modeling p.14/39

Application: Cancer diagnosis Microarrays have been used to differentially diagnose different kinds of cancer, for example Diffuse large B-cell Lymphoma Ovarian Cancer Leukemia Breast Cancer Small Round Blue Cell Tumors Predicting Microarray Signals by Physical Modeling p.15/39

Application: Cancer diagnosis Microarrays have been used to differentially diagnose different kinds of cancer, for example Diffuse large B-cell Lymphoma Ovarian Cancer Leukemia Breast Cancer Small Round Blue Cell Tumors Often it is hard to distinguish different sub-types of cancer by other means. These distinctions are very important because they often determine the course of treatment. Predicting Microarray Signals by Physical Modeling p.15/39

Small Round Blue Cell Tumors Recent work by Khan et al has used microarrays to diagnose four types of pediatric tumors, Small Round Blue Cell Tumors (SRBCT): neuroblastoma Ewing family tumors Rhabdomyosarcoma non-hodgkin lymphoma Using 63 samples for training and 20 for testing, Khan et al were able to correctly predict all data using only 96 genes from a pool of many thousands. Predicting Microarray Signals by Physical Modeling p.16/39

Prediction using minimal gene set EWS BL NB RMS 57 62 45 60 55 58 59 54 53 49 51 44 47 56 48 43 61 50 52 46 33 32 31 42 39 38 37 41 40 36 34 35 9 3 6 2 10 4 5 7 11 8 0 15 1 16 14 18 17 19 12 22 21 13 20 26 30 28 27 25 29 23 24 Results for the prediction of samples for for different kinds of SRBCT. Using GESSES (J.M. Deutsch Bioinformatics (2003)). Predicting Microarray Signals by Physical Modeling p.17/39

High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. Predicting Microarray Signals by Physical Modeling p.18/39

High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. 25000 20000 input output 15000 level 10000 5000 0 0 50 100 150 200 250 index Data from 1532 Series Affymetrix spike-in human gene experiments. Predicting Microarray Signals by Physical Modeling p.18/39

High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. 100000 10000 input output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index Data from 1532 Series Affymetrix spike-in human gene experiments. Predicting Microarray Signals by Physical Modeling p.18/39

Affymetrix Latin Square Experiments Groups of Transcripts pm Concentration A B C D E F G H I J K L M N 1 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 2 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 3 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 GeneChip Experiment 4 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 5 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 6 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 7 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 9 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 10 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 11 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 12 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 13 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 14 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 Predicting Microarray Signals by Physical Modeling p.19/39

Is this variation all noise? This variation is reproducible: Predicting Microarray Signals by Physical Modeling p.20/39

Is this variation all noise? This variation is reproducible: 100000 10000 input output output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index Predicting Microarray Signals by Physical Modeling p.20/39

Is this variation all noise? This variation is reproducible: 100000 10000 input output output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index It is likely due to the physicochemical differences between target and probe molecules. The output measures the amount of binding. In the case of these Affymetrix arrays, the hybridization is between crna targets in solution and DNA probes. Clearly not all target molecules are binding to the probes. Predicting Microarray Signals by Physical Modeling p.20/39

Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). Predicting Microarray Signals by Physical Modeling p.21/39

Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Predicting Microarray Signals by Physical Modeling p.21/39

Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature Predicting Microarray Signals by Physical Modeling p.21/39

Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature The melting temperature depends on the probe s chemical sequence. Predicting Microarray Signals by Physical Modeling p.21/39

Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature The melting temperature depends on the probe s chemical sequence. Because of fluctuation in the melting temperature, probes with lower than average melting curves will show less hybridization. Predicting Microarray Signals by Physical Modeling p.21/39

Affymetrix s approach mrna sequence:...attctcaggatactgcccgttactgtctatggtcggcttcgaatgatactctcgtatatcgatcggcttatacgcgattatacgc... Probe intensities Perfect Match Mismatch TACTGTCTATGGTCGGCTTCGAATG TACTGTCTATGGACGGCTTCGAATG Predicting Microarray Signals by Physical Modeling p.22/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Predicting Microarray Signals by Physical Modeling p.23/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Predicting Microarray Signals by Physical Modeling p.23/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". Predicting Microarray Signals by Physical Modeling p.23/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Predicting Microarray Signals by Physical Modeling p.23/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Do statistics on these numbers to try to obtain a good estimate for the input target concentration. Predicting Microarray Signals by Physical Modeling p.23/39

Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Do statistics on these numbers to try to obtain a good estimate for the input target concentration. This approach ignores the fact that these variations are largely reproducible and depend on the sequence of the probes. Predicting Microarray Signals by Physical Modeling p.23/39

Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Predicting Microarray Signals by Physical Modeling p.24/39

Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Predicting Microarray Signals by Physical Modeling p.24/39

Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Predicting Microarray Signals by Physical Modeling p.24/39

Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Minimizing over (16*2 + 24*2 + 3) parameters, they obtain results far better than the MAS5.0 approach of Affymetrix. Predicting Microarray Signals by Physical Modeling p.24/39

Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Minimizing over (16*2 + 24*2 + 3) parameters, they obtain results far better than the MAS5.0 approach of Affymetrix. But does this model capture the physics? Predicting Microarray Signals by Physical Modeling p.24/39

Important physical effects Predicting Microarray Signals by Physical Modeling p.25/39

Important physical effects Nonspecific Binding Predicting Microarray Signals by Physical Modeling p.25/39

Important physical effects Target Target Binding Nonspecific Binding Predicting Microarray Signals by Physical Modeling p.25/39

Important physical effects Target Target Binding Nonspecific Binding Nonspecific binding and target-target binding are important effects to consider. Predicting Microarray Signals by Physical Modeling p.25/39

Evidence for target target binding 20000 18000 16000 14000 output 12000 10000 8000 6000 4000 2000 0 0 200 400 600 800 1000 1200 input Probes asymptote at different maximum values often correlating with their sequences being similar. Predicting Microarray Signals by Physical Modeling p.26/39

Evidence for nonspecific binding 350 300 250 output 200 150 100 50 0 0.2 0.4 0.6 0.8 1 input When the input target concentration is zero or small, there is still a substantial signal from the corresponding probes. Predicting Microarray Signals by Physical Modeling p.27/39

Partial zippering Because the binding energy is strongly dependent on base pair composition near the melting temperature, we expect that molecules are only partially bound at any one time. Predicting Microarray Signals by Physical Modeling p.28/39

Partial zippering Because the binding energy is strongly dependent on base pair composition near the melting temperature, we expect that molecules are only partially bound at any one time. For a segment bound from monomer n to monomer m, the free energy of the nearest neighbor model is: G mn = n 1 i=m ɛ(i, i + 1) + ɛ initiation Predicting Microarray Signals by Physical Modeling p.28/39

Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Predicting Microarray Signals by Physical Modeling p.29/39

Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 sum over all begin and end points Predicting Microarray Signals by Physical Modeling p.29/39

Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 using the nearest neighbor model Predicting Microarray Signals by Physical Modeling p.29/39

Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 Predicting Microarray Signals by Physical Modeling p.29/39

Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 This depends strongly on the the temperature and the probe sequence. It can be efficiently computed in O(N) operations using a recursion relation for each probe considered. Predicting Microarray Signals by Physical Modeling p.29/39

Fraction bound Z (or G) and the chemical potential give the affinity for binding. That is the fraction f of bound probes is f = 1 G 1 + e β( G µ) Probe Solution µ = const + log(concentration)/β is determined by the requirement that µ solution = µ probes Predicting Microarray Signals by Physical Modeling p.30/39

Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. Predicting Microarray Signals by Physical Modeling p.31/39

Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. However we don t know the concentrations of the target solution because Affymetrix has added uncharacterized concentrations of human RNA as a background. Predicting Microarray Signals by Physical Modeling p.31/39

Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. However we don t know the concentrations of the target solution because Affymetrix has added uncharacterized concentrations of human RNA as a background. If the binding is weak, when can replace Z with an annealed average over different sequences. This leads to an effective nearest neighbor model for this non-specific binding G NS mn = n 1 i=m ɛ NS (i, i + 1) + ɛ NS initiation Here the ɛ NS s have to be empirically determined. Predicting Microarray Signals by Physical Modeling p.31/39

Target-target binding For similar reasons we don t know the concentration of targets. Therefore we replace those interactions with an effective annealed model. G T T mn = n 1 i=m ɛ T T (i, i + 1) + ɛ T T initiation Predicting Microarray Signals by Physical Modeling p.32/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Proportionality factor between binding and light intensity. Predicting Microarray Signals by Physical Modeling p.33/39

The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Proportionality factor between binding and light intensity. A total of 54 parameters, to fit 2464 data points from the Latin Square experiments. Predicting Microarray Signals by Physical Modeling p.33/39

Parameter fitting We minimize the fitness function: I = (log(predicted conc.) log(observed conc.)) 2 all data with respect to all parameters. This is hard because there are many minima in parameter space. This can be solved using simulating annealing monte carlo, and parallel tempering. Predicting Microarray Signals by Physical Modeling p.34/39

Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Predicting Microarray Signals by Physical Modeling p.35/39

Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Exchange neighboring system configurations depending on the relative energies of the systems E and the temperature difference β with probability p = { exp( β E) for β E < 0, 1 otherwise. Predicting Microarray Signals by Physical Modeling p.35/39

Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Exchange neighboring system configurations depending on the relative energies of the systems E and the temperature difference β with probability p = { exp( β E) for β E < 0, 1 otherwise. It is very efficient at finding low energy states of systems with many degrees of freedom Predicting Microarray Signals by Physical Modeling p.35/39

Results I min = 0.188 level level 100000 10000 1000 100 10 1 0.1 100000 10000 1000 100 10 1 0.1 0 20 40 60 80 100 120 140 160 180 index real pred input 0 20 40 60 80 100 120 140 160 180 index real pred input Predicting Microarray Signals by Physical Modeling p.36/39

Leave out one Predict first transcript (16 probes) training on all the other data. level 100000 10000 1000 real pred tex 100 0 2 4 6 8 10 12 14 16 index Predicting Microarray Signals by Physical Modeling p.37/39

Partial binding The ends of the hybridized targets tend to be frayed. Predicting Microarray Signals by Physical Modeling p.38/39

Partial binding The ends of the hybridized targets tend to be frayed. The fraction of the time a nucleotide is unbound depends on the sequence. Predicting Microarray Signals by Physical Modeling p.38/39

Partial binding The ends of the hybridized targets tend to be frayed. The fraction of the time a nucleotide is unbound depends on the sequence. 1 0.8 fraction unbound 0.6 0.4 0.2 0 5 10 15 20 position Predicting Microarray Signals by Physical Modeling p.38/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. This understanding leads a model that predicts mrna levels well. Predicting Microarray Signals by Physical Modeling p.39/39

Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. This understanding leads a model that predicts mrna levels well. It is hoped that this model will be turned into software that will benefit researchers wanting a more accurate determination of mrna levels in their experiments. Predicting Microarray Signals by Physical Modeling p.39/39