How to deal with the microarray results. Britt Gabrielsson PhD RCEM, Div of metabolism and cardiovascular research Department of Medicine The Sahlgrenska Academy at Göteborg University
and then we will perform microarray analysis. Project design DNA microarray technology Data analysis Results/verification Biological/functional relevance Follow-up studies
Publications - present to future trends PubMed DNA array + DNA microarray + oligonucleotide array + oligonucleotide microarray
Publications - present to future trends PubMed previous search terms AND cancer, clinical or yeast
Common microarray work flow Planning Pilot Data analysis Verification Revised planning Extended study Data analysis Story selection Follow up studies
Advantages of a pilot study Estimate experimental variability Refine the experimental design Optimize selection of time points or doses Triplicate biological replicates per experimental group in the pilot Possibility to add-on to extend the study Provides preliminary data for project funding
From list of genes to biological context Once again; if you have favourite genes which are your main interest - use an alternative method. There will only be a few genes that you already know the function of. If you don t get the genes you love, love the genes you get!!
The list of genes mental vertigo
Reduced list of genes managable
The reductionist s point of view From the list of regulated genes to future projects - identification of genes with a commonality e.g. pathway, biological process, chromosomal region, upstream regulation - verification and extended data mining
From list of genes to biological context Aim: to define future possible lines of investigation 1 st screen to get an overview of the data Add own keywords to auto-annotations (e.g. apoptosis, lipid metabolism) Use of overview databases 2 nd screen more detailed information Use of more specialized databases
First a word about Gene Ontology The Gene Ontology Consortium (www.geneontology.org) aims to describe the gene product function in a cell and provides a controlled vocabulary to describe these attributes. The three organizing principles of GO are molecular function, biological process and cellular component. A gene product has one or more molecular functions and is used in one or more biological processes; it might be associated with one or more cellular components.
Example; insulin-like growth factor 1 (IGF1) Expressed in most tissues with liver as the main production site Is secreted and has both paracrine and endocrine actions Detected in circulation bound to one of many IGF-binding proteins The main regulators are growth hormone and insulin Acts via the IGF1-receptor dimer or a hybrid composed of IGF1- and insulin receptor monomers Has growth-promoting and metabolic effects
Example; IGF1 (insulin-like growth factor 1) defined by GO classification Biological process; 1501 // skeletal development // traceable author statement /// 6260 // DNA replication // traceable author statement /// 6928 // cell motility // traceable author statement /// 7165 // signal transduction // traceable author statement /// 7265 // Ras prot Cellular component; 5615 // extracellular space // inferred from electronic annotation Molecular function; 5159 // insulin-like growth factor receptor binding // traceable author statement /// 5179 // hormone activity // traceable author statement /// 8083 // growth factor activity // inferred from electronic annotation /// 18445 // prothoracicotrophic hormone ---
1 st screen databases AIM: to get an overview of main functions of the gene products and if possible to add your own key word NCBI: Entrez Gene (OMIM) SwissProt GeneCard
NCBI
NCBI
The importance of knowing the different aliases Example AdipoQ= Adiponectin= APM1= Acrp30= Acdc
GeneCard
GeneCard
ExPASy/Swiss-Prot
ExPASy/Swiss-Prot
ExPASy/Swiss-Prot
For non-reductionists - other resources There are several web-based resources that can used to group regulated genes according to GO classifications i.e. biological processes, molecular function and cellular component. The analysis tests whether the observed number of genes of a GO process differ from the expected number of genes. Examples of such web-sites are FatiGO and GOTree Machine.
GO Tree Machine (GOTM)
Selecting for possible lines of research What was the question again? Identifying putative susceptibility genes Are there linkage studies to this chromosomal region? Does the Ethical licence allow genomic studies? Identifying biomarkers for diagnostic purposes Limited to secreted proteins Are there assays available? Identifying disease mechanisms How to differentiate pathology from mechanism leading to disease? External influences such as funding or competition in the field
The pragmatic view on selecting putative projects General knowledge of the gene/genes Novelty - knowledge of the gene in your field What types of follow-up experiments can be performed (techniques in the lab, collaborators, sample availability). Time/cost/resources
2 nd screen databases Aim: To get more detailed knowledge of the gene products of the reduced list NCBI: PubMed, OMIM, Gene Expression Omnibus GeneCard Applied Biosystems: Panther GNF SymAtlas (Tissue distribution) Nucleic Acids Research publishes an update database issue January each year.
NCBI PubMed and OMIM
NCBI OMIM GENE FUNCTION By RNase protection and Western blot analysis, Schaffler et al. (1999) showed that APM1 is expressed by differentiated adipocytes as a 33-kD protein that is also detectable in serum MOLECULAR GENETICS In 253 nondiabetic Italian subjects, Filippi et al. (2004) found that the 276G-T SNP of the adiponectin gene was associated with higher body mass index (BMI) (p less than 0.01), plasma insulin (p less than 0.02), and homeostasis model assessment-estimated insulin resistance (HOMA-IR) (p less than 0.02) ANIMAL MODEL Maeda et al. (2002) generated mice deficient in adiponectin/acrp30 by targeted disruption. Homozygous mutant mice showed delayed clearance of free fatty acid in plasma, low levels of fatty acid transport protein-1 (FATP1; 600691) mrna in muscle, high levels of TNF-alpha (191160) mrna in adipose tissue, and high plasma TNF-alpha concentrations
GeneCard - a gateway with several links to other sites Chromosomal Location (HGNC and/or Entrez Gene NCBI Genomic Views According to UCSC and Ensembl) Protein info (UniProt/Swiss-Protein, Ensembl) Phenotype (Jackson lab - Mouse Genome Informatics) Ontologies/Pathways (Gene Ontology and KEGG) Transcripts (NCBI, link to Applied Biosystems for assays) Tissue distribution (Affymetrix-based, SAGE)
Applied Biosystems (www.pantherdb.org/)
Applied Biosystems (www.pantherdb.org/)
Tissue distribution - GNF SymAtlas Genomics Institute of the Novartis Foundation (http://symatlas.gnf.org/symatlas/)
Expression databases Large-scale analysis of gene expression has led to a proliferation of databases for storing the vast quantities of expression data. Most are Web-based and are compliant with the MIAME* and the Gene Ontology Consortium. A few examples: Gene Expression Omnibus (GEO NCBI) ArrayExpress (EMBL) Stanford Microarray Database (non-public) Expression Array Manager *Minimum Information About a Microarray Experiment
Gene expression omnibus
Gene expression omnibus Data download Cluster analysis
Follow-up studies Verification of main findings; At transcript level (real-time PCR or Northern blot) At protein level (immunohistochemistry, Western Blot, ELISA/RIA, FACS) Staining and characterizations of cells (tissues) Cell culture/animal studies RNAi/transgenic experiments
Summary The importance of asking a precise question i.e. the project design limits the interpretation of the out-put data Initial reduction of data to identify possible future lines of research. Use over-view databases to annotate the regulated genes In view of the present knowledge in your field and possible follow-up studies, select a few putative lines of research. Go back to further data mining and more detailed bioinformatics
Final words about working in multidisciplinary collaborations Who has the over-all responsibility? Who performs specific parts of the project? How to report forward and to whom? How to report backward and to whom? Understanding each other/communication Decision making How do we publish?
Present to future trends Diagnostics (cancer, identification of biomarkers) Functional studies, the microarray data constitutes a minor part of the article Cross-species comparisons and translational research. Shared transcriptional profiles between species to identify conserved pathways and mechanisms (longevity).
Web-sites Information; NCBI (incl Gene, OMIM, PubMed) ExPASy GeneCards TIGR Gene Ontology Panther/Applied Biosystems Affymetrix GNF SymAtlas Nucleic Acids Research db 2006 Data mining of gene expression data; GO Tree Machine FatiGO Databases for expression data; GEO (NCBI) Stanford Microarray Database (SMD) ArrayExpress Expression Array Manager http://www.ncbi.nlm.nih.gov/ http://www.expasy.org/ http://www.genecards.org/ http://www.tigr.org/ http://www.geneontology.org/ http://www.pantherdb.org/ http://www.affymetrix.com/index.affx http://symatlas.gnf.org/symatlas/ http://nar.oxfordjournals.org/content/vol34/suppl_1/index.dtl http://genereg.ornl.gov/gotm/ http://www.fatigo.org http://www.ncbi.nlm.nih.gov/geo http://genome-www5.stanford.edu/microarray/smd/ http://www.ebi.ac.uk/arrayexpress/index.html http://expression.microslu.washington.edu/expression/ NB there are a number of microarray expression data linked to the publications