Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Analysis Workflow Raw data Normalized, scaled data Differentially expressed genes Identify and partition expression patterns Gene Summaries Biological themes (Pathways, molecular function, etc.)
Analysis Workflow Raw data Normalized, scaled data Differentially expressed genes Identify and partition expression patterns Gene Summaries Data upload Comparison statistics, correction for multiple testing Up and down regulated, magnitude, clustering Annotation (UniGene, Entrez Gene, Gene Ontologies, etc.) Biological themes (Pathways, molecular function, etc.) Ontology report, pathway report, z-score
microarraysuccess.com Experiment Design Experimental design determines what can be inferred from the data as well as determining the confidence that can be assigned to those inferences. Careful experimental design and the presence of biological replicates are essential to the successful use of microarrays. Type of experiment Two groups Three or more groups Time series Dose response Multiple treatment The type of experiment and number of groups will affect the statistical methods used to detect differential expression Replicates The more the better, but at least 3 Biological better than technical Rigorous statistical inferences cannot be made with a sample size of one. The more replicates, the stronger the inference. Supporting material - Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr - http://ra.microslu.washington.edu/presentation/documents/kerrnas.pdf
microarraysuccess.com Differential Expression The fundamental goal of microarray experiments is to identify genes that are differentially expressed in the conditions being studied. Comparison statistics can be used to help identify differentially expressed genes and cluster analysis can be used to identify patterns of gene expression and to segregate a subset of genes based on these patterns. Statistical Significance Fold change Fold change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance. Comparison statistics 2 group t-test, Welch s t-test, Wilcoxon Rank Sum, 3 or more groups ANOVA, Kruskal-Wallis Comparison tests require replicates and use the variability within the replicates to assign a confidence level as to whether the gene is differentially expressed. Supporting material - Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today, 7(11 Suppl).: S55-63.
microarraysuccess.com Differential Expression Correction for multiple testing- Methods for adjusting the p-value from a comparison test based on the number of tests performed. These adjustments help to reduce the number of false positives in an experiment. FWER : Family Wise Error Rate (FWER) corrections adjust the p-value so that it reflects the chance of at least 1 false positive being found in the list. Bonferonni, Holm, W & Y MaxT FDR : False Discovery Rate corrections (FDR) adjust the p-value so that it reflects the frequency of false positives in the list. Benjamini and Hochberg, SAM The FWER is more conservative, but the FDR is usually acceptable for discovery experiments, i.e. where a small number of false positives is acceptable Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18(1): 71-103. Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19(3):368-375.
GeneSifter Microarray Data Analysis Accessibility Web-based Secure Data management Data Annotation (MIAME) Multiple upload tools CodeLink Affymetrix Illumina Agilent GEO CodeLink compatible Differential Expression - Powerful, accessible tools for determining Statistical Significance R based statistics Bioconductor Comparison Tests t-test, Welch s t-test, Wilcoxon Rank sum test, ANOVA, Correction for Multiple Testing Bonferroni, Holm, Westfall and Young maxt, Benjamini and Hochberg Unsupervised Clustering PAM, CLARA, Hierarchical clustering Silhouettes
GeneSifter Microarray Data Analysis Integrated tools for determining Biological Significance One Click Gene Summary Ontology Report Pathway Report Search by ontology terms Search by KEGG terms or Chromosome
The GeneSifter Data Center Free resource Training Research Publishing 5 areas Cardiovascular Cancer Neuroscience Immunology Oral Biology Access to : Data Analysis summary Tutorials WebEx
The GeneSifter Data Center www.genesifter.net/dc
GeneSifter - Analysis Examples 2 groups (Huntingtons Blood vs Healthy Blood) Data Upload CodeLink 3 + groups (Time series, dose response, etc.) Differential expression Fold change Quality t-test False discovery rate Differential expression Fold change Quality ANOVA False discovery rate Visualization Hierarchical clustering PCA Partitioning PAM Silhouettes Biological significance Gene Annotation Ontology report Pathway report
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Background - Huntington s Disease Huntington s Disease (HD) Autosomal dominant neurodegenerative disease Motor impairment Cognitive decline Various psychiatric symptoms Onset 30-50 years Mutant Huntingtin protein (polyglutamine) Effects transcriptional regulation Transcription effects may occur outside of CNS
Background - Data Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8. Collected peripheral blood samples - 14 Controls 12 Symptomatic HD patients 5 Presymptomatic HD patients Identified 322 most differentially expressed genes (Con. Vs Symptomatic HD) using U133A array. Used CodeLink 20K to confirm genes identifed using Affymetrix platform Focused on 12 genes that showed most significant difference between Control and HD Data available from GEO
Pairwise Analysis Human blood expression for Huntington s disease versus control, CodeLink CodeLink Human 20K Bioarray Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis Select group 1 14 normal Select group 2 12 Huntingtons
Pairwise Analysis Already normalized (median) t-test Quality filter 0.75 (filters out genes with signal less than 0.75) Benjamini and Hochberg (FDR) Log transform data
Pairwise Analysis Gene List
One-Click Gene Summary
Biological Significance Gene Annotation Sources UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters. Gene titles are assigned to the clusters and these titles are commonly used by researchers to refer to that particular gene. LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive information, including function, about genes. Gene Ontologies The Gene Ontology Consortium provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products. KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory and metabolic pathways for genes. Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference sequences for both the mrna and protein products of included genes. GeneSifter maintains its own copies of these databases and updates them automatically.
Pairwise Analysis Gene List
Ontology Report
Ontology Report : z-score R = total number of genes meeting selection criteria N = total number of genes measured r = number of genes meeting selection criteria with the specified GO term n = total number of genes measured with the specific GO term Reference: Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7
Z-score Report
Z-score Report
KEGG Report
Pairwise Analysis - Summary Human blood expression for Huntington s disease versus control 12 HD 14 Control t-test, Benjamini and Hochberg (FDR) Pattern selection 2606 increased In HD Z-scores Biological processes Protein biosynthesis (104) Ubiquitin cycle (123) RNA splicing (53) KEGG Oxidataive phosphorylation (35) Apoptosis (22) ~20,000 genes 5684 genes 3078 decreased In HD Biological processes Neurogenesis (90) Cell adhesion (120) Sodium ion transport (29) G-protein coupled receptor signaling (114) KEGG Neuroactive ligand-receptor interaction (56)
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Pairwise Analysis Human blood expression for Huntington s disease versus control, Affymetrix U133A Human Genome Array Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis - Affymetrix Already normalized (median) t-test Quality filter 50 (filters out genes with signal less than 50) Benjamini and Hochberg (FDR) Log transform data
Pairwise Analysis Gene List Human blood expression for Huntington s disease versus control, Affymetrix
Gene Lists Common and Unique Genes
Platform comparison Biological themes Affymetrix
Platform comparison Biological themes CodeLink
GeneSifter - Analysis Examples 2 groups (Huntingtons Blood vs Healthy Blood) Data Upload CodeLink 3 + groups (Time series, dose response, etc.) Differential expression Fold change Quality t-test False discovery rate Differential expression Fold change Quality ANOVA False discovery rate Visualization Hierarchical clustering PCA Partitioning PAM Silhouettes Biological significance Gene Annotation Ontology report Pathway report
Project Analysis - Clustering
Cluster by Samples All Genes CodeLink Affymetrix
Cluster by Samples? CodeLink Affymetrix
Cluster by Samples Y Chrom. Genes CodeLink Affymetrix
Platform Comparison - Summary CodeLink Affymetrix Transcripts Total 19729 22283 Increased in HD 2606 1976 Overlap (LL genes) 41% 65% Top BP Ontologies Ubiquitin cycle RNA splicing Regulation of translation Apoptosis Clustering of samples
Platform Comparison - Summary CodeLink Affymetrix Increased in HD 2606 1976 Decreased in HD 3708 986 Unique ontology Oxidative Phos. IL-6 Biosynthesis
The GeneSifter Data Center www.genesifter.net/dc
MicroarraySuccess.com Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment Two groups Time series Dose Response Multiple treatments Replicates The more the better Technical vs. biological Platforms cdna Oligo One color Two color Feature Extraction Software File formats Databases Raw Data Storing Retrieving Experiment Annotation Samples Protocols Usability Intuitive Special training System Access Single user desktop Single user server Web-based Sharing data In the lab Collaboration Normalization Differential Expression Fold change Comparison statistics FWER/FDR Pattern Identification Clustering Visualization Partitioning Gene Annotation UniGene LocusLink Gene Ontology KEGG OMIM Single Genes Gene Summaries Gene Lists Ontology Report Pathway Report MIAME What is it? Publication Public databases GEO ArrayExpress SMD Using public data Meta analysis Academic partner University of Washington
Thank You CodeLink compatible www.genesifter.net Trial account, tutorials, sample data and Data Center Eric Olson eric@genesifter.net 206.283.4363