Bayesian Decomposition Michael Ochs
Making Proteins
A Closer Look at Translation Post-Translational Modification RNA Splicing mirna
Identifying Pathways A 1 2 3 B C D A B C D Bioinformatics www.promega.com
Goal of Analysis Take measurements of thousands of genes, some of which are responding to stimuli of interest And find the correct set of basis vectors that link to pathways Bioinformatics 1 2 3 * * * * then identify * the pathways *
BD: Matrix Decomposition gene 1 gene N condition 1 Data condition M = The behavior of one gene can be explained as a mixture of patterns Distribution of Patterns gene 1 gene N pattern 1 pattern k ** ** X condition 1 with different behaviors condition M pattern 1 pattern k Patterns of Behavior
Patterns as Basis Vectors
BD with Knowledge of Classes gene 1 gene N condition 1 Data condition M = Distribution of Patterns gene 1 gene N pattern 1 pattern k ** ** X condition 1 condition M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * * pattern 1 Patterns of Behavior pattern k
BD Structure Atomic Domains Allow Encoding of Biological Information Markov Chain Monte Carlo is used to explore possible sets of distributions and patterns
Project Normal Data Download Data from CAMDA Site Adjust for Background Measurement Take Ratios Calc Mean and SDOM for Each Ratio Eliminate M3T and M4T Data Eliminate 24 Points with Only 1 Data Pt 99% 4 Pts, 1% 3 Pts, 0.1% 2 Pts
Filtering of Genes Eliminated all ESTs Annotated Remaining Genes from Gene Ontology on Unigene Name Annotated all Genes on Clone ID 24% Changed Unigene Cluster 948 Clones Had GO Process Information
Updating Annotations: ASAP http://bioinformatics.fccc.edu/
Bayesian Decomposition Encoded 3 Known Patterns Kidney, 6 Conditions Liver, 6 Conditions Testis, 4 Conditions Allowed 1-3 Additional Patterns Account for Behavior Unrelated to Tissue Specific Expression
Fitting the Data
Four Patterns 0.3 0.25 Kidney Liver Testis Background 0.2 0.15 0.1 0.05 0 Bioinformatics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Five Patterns 0.3 0.25 0.2 Kidney Liver Testis Background 1 Background 2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Four vs Five Patterns
Gene Ontology Identify Genes Only in One Pattern See if Pattern Enhanced in GO Identify Genes in a Pattern 3σ above Zero in Distribution Look at GO Assignments Identify Genes Lacking in Pattern Eliminate Background (Genes > 70%) Look for Genes Not in Pattern (3σ)
Genes Only in Kidney by GO neurotransmitter transport * chloride transport * receptor mediated endocytosis enzyme linked receptor protein signaling pathway * transmembrane receptor protein tyrosine kinase signaling pathway * vitamin/cofactor transport * vitamin B12 transport inorganic anion transport * anion transport * neuropeptide signaling pathway endocytosis * From Old Annotations Sodium transport, vesiclemediated transport, amino acid transport, folate transport, homophilic cell adhesion, cell-cell adhesion, monovalent inorganic cation transport metal ion transport > 10x Enhancement
Genes Only in Liver by GO antigen processing antigen processing, endogenous antigen via MHC class I" cellular defense response response to drug drug susceptibility/resistance * cell-cell adhesion * homophilic cell adhesion * response to abiotic stimulus response to chemical substance response to pest/pathogen/parasite protein targeting From Old Annotations small molecule transport, histogenesis and organogenesis, embryogenesis and morphogenesis, lipid metabolism > 10x Enhancement
Genes Only in Testis by GO DNA recombination meiotic recombination reproduction * gametogenesis * spermatogenesis * regulation of transcription from Pol II promoter microtubule-based movement microtubule-based process development * > 10x Enhancement From Old Annotations nuclear organization and biogenesis, chromosome organization and biogenesis, cell organization and biosynthesis
Kidney Genes, 3σ, > 2 fold amino acid metabolism inflammatory response mitotic cell cycle amine metabolism anion transport nitrogen metabolism perception of abiotic stimulus perception of light cell-cell adhesion homophilic cell adhesion S phase of mitotic cell cycle endocytosis G-protein coupled receptor protein signaling pathway
Testis Genes, 3σ, >4 fold reproduction gametogenesis spermatogenesis regulation of cell shape and cell size mitotic cell cycle microtubule-based movement protein folding S phase of mitotic cell cycle
Liver Genes, 3σ, >3 fold amino acid metabolism response to drug drug susceptibility/resistance energy pathways energy derivation by oxidation of organic compounds main pathways of carbohydrate metabolism catabolic carbohydrate metabolism response to abiotic stimulus response to chemical substance sensory perception morphogenesis organogenesis tricarboxylic acid cycle
Genes Absent in Patterns obsolete Absent in Kidney monosaccharide metabolism regulation of transcription from Pol II promoter regulation of cell shape and cell size biological_process unknown reproduction gametogenesis spermatogenesis microtubule-based process Absent in Liver reproduction gametogenesis spermatogenesis cell differentiation actin filament-based process actin cytoskeleton organization and biogenesis microtubule-based movement
Genes Absent in Background 1 biological_process unknown obsolete protein modification protein targeting actin filament-based process actin cytoskeleton organization and biogenesis endocytosis regulation of transcription from Pol II promoter reproduction gametogenesis spermatogenesis mitotic cell cycle
Genes Present in Two Tissues Kidney/Liver not Testis cell-cell adhesion homophilic cell adhesion defense response immune response amino acid metabolism amine metabolism perception of abiotic stimulus perception of light Kidney/Testis not Liver mitotic cell cycle
Acknowledgements This Work Tom Moloshok DJ Datta (Cambridge) Andrew Kossenkov Bill Speier (JHU) Colleagues J. Robert Beck Frank Manion Programming Jeffrey Grant Elizabeth Goralczyk Luke Somers Others G. Parmigiani (JHU) T. Brown (Columbia) E. Korotkov (RAS)