Predicting Microarray Signals by Physical Modeling Josh Deutsch University of California Santa Cruz Predicting Microarray Signals by Physical Modeling p.1/39
Collaborators Shoudan Liang NASA Ames Onuttom Narayan UCSC Predicting Microarray Signals by Physical Modeling p.2/39
Outline Background Gene transcription and regulation What are microarrays Example application: Cancer diagnosis The problem: how to improve sensitivity? Microarrays in detail Physical constraints on how well they can work Different approaches to signal extraction Physical model of the system Different types of binding Experimental comparison Comparison with other approaches Conclusions Predicting Microarray Signals by Physical Modeling p.3/39
Gene transcription Transcription factors DNA RNA polymerase A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A spliceosome mrna Predicting Microarray Signals by Physical Modeling p.4/39
Gene transcription DNA A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) Transcription factors RNA polymerase DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A Producing pre-mrna spliceosome mrna Predicting Microarray Signals by Physical Modeling p.4/39
Gene transcription DNA A T C A T G T A C G T A T A G T A C A T G C A T DNA messenger RNA (mrna) Transcription factors RNA polymerase DNA is read by RNA polymerase pre-mrna junk A U C A U G U A C G U A Producing pre-mrna spliceosome Which is spliced to get rid of junk mrna Predicting Microarray Signals by Physical Modeling p.4/39
Gene transcription protein mrna Ribosome The ribosome translates every three base pairs into one amino acid e.g. CAU Histidine. Predicting Microarray Signals by Physical Modeling p.5/39
Gene transcription protein mrna Ribosome The ribosome translates every three base pairs into one amino acid e.g. CAU Histidine. (Miller, Hamkalo, and Thomas) Predicting Microarray Signals by Physical Modeling p.5/39
Gene regulation promoter site TATAAAA DNA Sequence specific transcription factors Transcription factors RNA polymerase Many genes and cell types use the same proteins to regulate transcription. A specific combination of these is probably needed to turn on a gene. Predicting Microarray Signals by Physical Modeling p.6/39
Genetic networks The state of the cell can be understood from the way different proteins regulate each other and respond to external inputs. Predicting Microarray Signals by Physical Modeling p.7/39
What are microarrays? Gene 1 Gene 2 Gene 3 Gene 4 Fabricate a chip by synthesizing 25 base pair oligomers attached to a substrate. Predicting Microarray Signals by Physical Modeling p.8/39
What are microarrays? Gene 1 Gene 2 Gene 3 Gene 4 Fabricate a chip by synthesizing 25 base pair oligomers attached to a substrate. Predicting Microarray Signals by Physical Modeling p.8/39
Linkage to substrate E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999). The oligomer probes are attached to the surface via polymeric linker molecules. Their density is about 1 probe every 40 square angstroms. Predicting Microarray Signals by Physical Modeling p.9/39
Sample preparation Extract mrna from cell samples mrna Predicting Microarray Signals by Physical Modeling p.10/39
Sample preparation Extract mrna from cell samples mrna mrna reverse transcriptase cdna Predicting Microarray Signals by Physical Modeling p.10/39
Hybridization to array Add fluorescent tags to targets: Predicting Microarray Signals by Physical Modeling p.11/39
Hybridization to array Add fluorescent tags to targets: Then hybridize to probes: Predicting Microarray Signals by Physical Modeling p.11/39
Linkage to substrate E. Southern, K. Mir, M. Shchepinov Nature Genetics (1999). In many experiments, target molecules can be longer or shorter than probe molecules. When they re longer, secondary structure formation can also occur. Predicting Microarray Signals by Physical Modeling p.12/39
A real microarray From Liming Shi /em gene-chips.com Predicting Microarray Signals by Physical Modeling p.13/39
Affymetrix gene-chip K.A. Baggerly (2002) This is a 640 640 cell U95Av2 chip. Note the vertical bands which are evidence of misalignment with the scanner. Predicting Microarray Signals by Physical Modeling p.14/39
Application: Cancer diagnosis Microarrays have been used to differentially diagnose different kinds of cancer, for example Diffuse large B-cell Lymphoma Ovarian Cancer Leukemia Breast Cancer Small Round Blue Cell Tumors Predicting Microarray Signals by Physical Modeling p.15/39
Application: Cancer diagnosis Microarrays have been used to differentially diagnose different kinds of cancer, for example Diffuse large B-cell Lymphoma Ovarian Cancer Leukemia Breast Cancer Small Round Blue Cell Tumors Often it is hard to distinguish different sub-types of cancer by other means. These distinctions are very important because they often determine the course of treatment. Predicting Microarray Signals by Physical Modeling p.15/39
Small Round Blue Cell Tumors Recent work by Khan et al has used microarrays to diagnose four types of pediatric tumors, Small Round Blue Cell Tumors (SRBCT): neuroblastoma Ewing family tumors Rhabdomyosarcoma non-hodgkin lymphoma Using 63 samples for training and 20 for testing, Khan et al were able to correctly predict all data using only 96 genes from a pool of many thousands. Predicting Microarray Signals by Physical Modeling p.16/39
Prediction using minimal gene set EWS BL NB RMS 57 62 45 60 55 58 59 54 53 49 51 44 47 56 48 43 61 50 52 46 33 32 31 42 39 38 37 41 40 36 34 35 9 3 6 2 10 4 5 7 11 8 0 15 1 16 14 18 17 19 12 22 21 13 20 26 30 28 27 25 29 23 24 Results for the prediction of samples for for different kinds of SRBCT. Using GESSES (J.M. Deutsch Bioinformatics (2003)). Predicting Microarray Signals by Physical Modeling p.17/39
High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. Predicting Microarray Signals by Physical Modeling p.18/39
High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. 25000 20000 input output 15000 level 10000 5000 0 0 50 100 150 200 250 index Data from 1532 Series Affymetrix spike-in human gene experiments. Predicting Microarray Signals by Physical Modeling p.18/39
High sensitivity needed Many genes critical to the function of a cell are only present in minute concentrations, for example in the initial stages of a regulatory cascade. With many orders of magnitude of concentrations present for different kinds of mrna, this pushes the limits of microarray technology. 100000 10000 input output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index Data from 1532 Series Affymetrix spike-in human gene experiments. Predicting Microarray Signals by Physical Modeling p.18/39
Affymetrix Latin Square Experiments Groups of Transcripts pm Concentration A B C D E F G H I J K L M N 1 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 2 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 3 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 GeneChip Experiment 4 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 5 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 6 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 7 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 9 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 10 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 11 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 12 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 13 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 14 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 Predicting Microarray Signals by Physical Modeling p.19/39
Is this variation all noise? This variation is reproducible: Predicting Microarray Signals by Physical Modeling p.20/39
Is this variation all noise? This variation is reproducible: 100000 10000 input output output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index Predicting Microarray Signals by Physical Modeling p.20/39
Is this variation all noise? This variation is reproducible: 100000 10000 input output output 1000 level 100 10 1 0.1 0 50 100 150 200 250 index It is likely due to the physicochemical differences between target and probe molecules. The output measures the amount of binding. In the case of these Affymetrix arrays, the hybridization is between crna targets in solution and DNA probes. Clearly not all target molecules are binding to the probes. Predicting Microarray Signals by Physical Modeling p.20/39
Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). Predicting Microarray Signals by Physical Modeling p.21/39
Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Predicting Microarray Signals by Physical Modeling p.21/39
Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature Predicting Microarray Signals by Physical Modeling p.21/39
Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature The melting temperature depends on the probe s chemical sequence. Predicting Microarray Signals by Physical Modeling p.21/39
Why not design it to be less variable? It is hard finding optimal conditions for a microarray to work. An important parameter is the melting temperature for DNA/RNA hybridization of a probe and complementary target molecule. At temperatures << melting temperature, the time scale for relaxation becomes very long. The system becomes too sticky and irreversible. (Oligomers are 25 base pairs). At temperature >> melting temperature, nothing hybridizes. Therefore it must operate just under a typical melting temperature The melting temperature depends on the probe s chemical sequence. Because of fluctuation in the melting temperature, probes with lower than average melting curves will show less hybridization. Predicting Microarray Signals by Physical Modeling p.21/39
Affymetrix s approach mrna sequence:...attctcaggatactgcccgttactgtctatggtcggcttcgaatgatactctcgtatatcgatcggcttatacgcgattatacgc... Probe intensities Perfect Match Mismatch TACTGTCTATGGTCGGCTTCGAATG TACTGTCTATGGACGGCTTCGAATG Predicting Microarray Signals by Physical Modeling p.22/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Predicting Microarray Signals by Physical Modeling p.23/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Predicting Microarray Signals by Physical Modeling p.23/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". Predicting Microarray Signals by Physical Modeling p.23/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Predicting Microarray Signals by Physical Modeling p.23/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Do statistics on these numbers to try to obtain a good estimate for the input target concentration. Predicting Microarray Signals by Physical Modeling p.23/39
Affymetrix uses statistics Include probes that have one base altered so that it does not hybridize as readily to the target sequence. Make many probes (typically 16) per mrna, to average out over this variation as follow: Affymetrix MAS5.0 uses "One step Tukey s biweight estimate". It subtracts off the mismatch signal from the perfect match in cases where the perfect match is larger. Do statistics on these numbers to try to obtain a good estimate for the input target concentration. This approach ignores the fact that these variations are largely reproducible and depend on the sequence of the probes. Predicting Microarray Signals by Physical Modeling p.23/39
Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Predicting Microarray Signals by Physical Modeling p.24/39
Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Predicting Microarray Signals by Physical Modeling p.24/39
Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Predicting Microarray Signals by Physical Modeling p.24/39
Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Minimizing over (16*2 + 24*2 + 3) parameters, they obtain results far better than the MAS5.0 approach of Affymetrix. Predicting Microarray Signals by Physical Modeling p.24/39
Using sequence information Zhang, Miles, and Aldape (Nature Biotechnology 2003) invented a model to try to predict hybridization intensities using sequence information. The model they use postulates an "energy" of binding that depends on position along the sequence. One expects that the ends to be less tightly bound than the middle. There are two kinds of energy: Non-specific term that gives the base line binding when the matching target is not present. Specific term that has a linear dependence of input target concentration on the measured signal. Minimizing over (16*2 + 24*2 + 3) parameters, they obtain results far better than the MAS5.0 approach of Affymetrix. But does this model capture the physics? Predicting Microarray Signals by Physical Modeling p.24/39
Important physical effects Predicting Microarray Signals by Physical Modeling p.25/39
Important physical effects Nonspecific Binding Predicting Microarray Signals by Physical Modeling p.25/39
Important physical effects Target Target Binding Nonspecific Binding Predicting Microarray Signals by Physical Modeling p.25/39
Important physical effects Target Target Binding Nonspecific Binding Nonspecific binding and target-target binding are important effects to consider. Predicting Microarray Signals by Physical Modeling p.25/39
Evidence for target target binding 20000 18000 16000 14000 output 12000 10000 8000 6000 4000 2000 0 0 200 400 600 800 1000 1200 input Probes asymptote at different maximum values often correlating with their sequences being similar. Predicting Microarray Signals by Physical Modeling p.26/39
Evidence for nonspecific binding 350 300 250 output 200 150 100 50 0 0.2 0.4 0.6 0.8 1 input When the input target concentration is zero or small, there is still a substantial signal from the corresponding probes. Predicting Microarray Signals by Physical Modeling p.27/39
Partial zippering Because the binding energy is strongly dependent on base pair composition near the melting temperature, we expect that molecules are only partially bound at any one time. Predicting Microarray Signals by Physical Modeling p.28/39
Partial zippering Because the binding energy is strongly dependent on base pair composition near the melting temperature, we expect that molecules are only partially bound at any one time. For a segment bound from monomer n to monomer m, the free energy of the nearest neighbor model is: G mn = n 1 i=m ɛ(i, i + 1) + ɛ initiation Predicting Microarray Signals by Physical Modeling p.28/39
Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Predicting Microarray Signals by Physical Modeling p.29/39
Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 sum over all begin and end points Predicting Microarray Signals by Physical Modeling p.29/39
Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 using the nearest neighbor model Predicting Microarray Signals by Physical Modeling p.29/39
Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 Predicting Microarray Signals by Physical Modeling p.29/39
Partition function The partition function for a target molecule hybridized to a probe, Z = exp( β G) gives the total free energy for binding. It sums over all possible zippered states starting at m and ending at m. For a probe with a total of N monomers: Z = N e β G mn = N e β(p n 1 i=m ɛ(i,i+1)+ɛ initiation) m<n=2 m<n=2 This depends strongly on the the temperature and the probe sequence. It can be efficiently computed in O(N) operations using a recursion relation for each probe considered. Predicting Microarray Signals by Physical Modeling p.29/39
Fraction bound Z (or G) and the chemical potential give the affinity for binding. That is the fraction f of bound probes is f = 1 G 1 + e β( G µ) Probe Solution µ = const + log(concentration)/β is determined by the requirement that µ solution = µ probes Predicting Microarray Signals by Physical Modeling p.30/39
Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. Predicting Microarray Signals by Physical Modeling p.31/39
Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. However we don t know the concentrations of the target solution because Affymetrix has added uncharacterized concentrations of human RNA as a background. Predicting Microarray Signals by Physical Modeling p.31/39
Nonspecific free energy A probe can still bind, though more weakly to target sequences that have mismatches. The above method of calculation still applies. However we don t know the concentrations of the target solution because Affymetrix has added uncharacterized concentrations of human RNA as a background. If the binding is weak, when can replace Z with an annealed average over different sequences. This leads to an effective nearest neighbor model for this non-specific binding G NS mn = n 1 i=m ɛ NS (i, i + 1) + ɛ NS initiation Here the ɛ NS s have to be empirically determined. Predicting Microarray Signals by Physical Modeling p.31/39
Target-target binding For similar reasons we don t know the concentration of targets. Therefore we replace those interactions with an effective annealed model. G T T mn = n 1 i=m ɛ T T (i, i + 1) + ɛ T T initiation Predicting Microarray Signals by Physical Modeling p.32/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Proportionality factor between binding and light intensity. Predicting Microarray Signals by Physical Modeling p.33/39
The full model Given an initial set of concentrations for different target molecules, our model will predict the observed intensities of hybridization of targets to probe molecules. The parameters in the model are: Energies in the nearest neighbor model ɛ(b 1, b 2 ) e.g. ɛ(g, T ). There are 16 possibilities and 3 sets of terms. specific, non-specific, and target-target. 3 Initiation factors. Small additive background constant, Number of probe molecules, Proportionality factor between binding and light intensity. A total of 54 parameters, to fit 2464 data points from the Latin Square experiments. Predicting Microarray Signals by Physical Modeling p.33/39
Parameter fitting We minimize the fitness function: I = (log(predicted conc.) log(observed conc.)) 2 all data with respect to all parameters. This is hard because there are many minima in parameter space. This can be solved using simulating annealing monte carlo, and parallel tempering. Predicting Microarray Signals by Physical Modeling p.34/39
Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Predicting Microarray Signals by Physical Modeling p.35/39
Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Exchange neighboring system configurations depending on the relative energies of the systems E and the temperature difference β with probability p = { exp( β E) for β E < 0, 1 otherwise. Predicting Microarray Signals by Physical Modeling p.35/39
Parallel tempering 1 2 3 4 N 3 N 2 N 1 N T Simulate N copies of a system at temperatures T 1, T 2,..., T N. Exchange neighboring system configurations depending on the relative energies of the systems E and the temperature difference β with probability p = { exp( β E) for β E < 0, 1 otherwise. It is very efficient at finding low energy states of systems with many degrees of freedom Predicting Microarray Signals by Physical Modeling p.35/39
Results I min = 0.188 level level 100000 10000 1000 100 10 1 0.1 100000 10000 1000 100 10 1 0.1 0 20 40 60 80 100 120 140 160 180 index real pred input 0 20 40 60 80 100 120 140 160 180 index real pred input Predicting Microarray Signals by Physical Modeling p.36/39
Leave out one Predict first transcript (16 probes) training on all the other data. level 100000 10000 1000 real pred tex 100 0 2 4 6 8 10 12 14 16 index Predicting Microarray Signals by Physical Modeling p.37/39
Partial binding The ends of the hybridized targets tend to be frayed. Predicting Microarray Signals by Physical Modeling p.38/39
Partial binding The ends of the hybridized targets tend to be frayed. The fraction of the time a nucleotide is unbound depends on the sequence. Predicting Microarray Signals by Physical Modeling p.38/39
Partial binding The ends of the hybridized targets tend to be frayed. The fraction of the time a nucleotide is unbound depends on the sequence. 1 0.8 fraction unbound 0.6 0.4 0.2 0 5 10 15 20 position Predicting Microarray Signals by Physical Modeling p.38/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. This understanding leads a model that predicts mrna levels well. Predicting Microarray Signals by Physical Modeling p.39/39
Conclusions The output of Affymetrix gene chips can be understood in terms of the physicochemical properties of equilibrium statistical mechanics. The processes involve Partially zippered RNA targets and DNA probes. Non-specific binding of other target molecules. Binding of target molecules to each other. Taking into account the nonlinearity (saturation) of binding. This understanding leads a model that predicts mrna levels well. It is hoped that this model will be turned into software that will benefit researchers wanting a more accurate determination of mrna levels in their experiments. Predicting Microarray Signals by Physical Modeling p.39/39