Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung

Size: px
Start display at page:

Download "Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung"

Transcription

1 Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung

2 Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA

3 The Microarray Data Analysis Process Experimental Design Number of groups, factors, replicates Data management Data, sample annotation, gene annotation, databases Differential Expression Comparison statistics, Correction for multiple testing, Clustering Biological significance Individual genes, Biological themes Platform Selection One-color, two-color, platform comparisons System access Ease of you, accessibility Making data public and using public data MIAME, Journals, GEO, meta-analysis

4 The Microarray Data Analysis Process Experimental Design Number of groups, factors, replicates Data management Data, sample annotation, gene annotation, databases Differential Expression Comparison statistics, Correction for multiple testing, Clustering Biological significance Individual genes, Biological themes Platform Selection One-color, two-color, platform comparisons System access Ease of you, accessibility Making data public and using public data MIAME, Journals, GEO, meta-analysis

5 Experiment Design Type of experiment Two groups Normal vs. cancer Control vs. treated Three or more groups, single factor Time series Dose response Multiple treatment Four or more groups, multiple factors Time series with control and treated cells The type of experiment and number of groups and factors will determine the statistical methods needed to detect differential expression Replicates The more the better, but at least 3 Biological better than technical Rigorous statistical inferences cannot be made with a sample size of one. The more replicates, the stronger the inference. Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics Sep 1;19(13): Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr -

6 Differential Expression The fundamental goal of microarray experiments is to identify genes that are differentially expressed in the conditions being studied. Comparison statistics can be used to help identify differentially expressed genes and cluster analysis can be used to identify patterns of gene expression and to segregate a subset of genes based on these patterns. Statistical Significance Fold change Fold change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance. Comparison statistics 2 group t-test, Welch s t-test, Wilcoxon Rank Sum, 3 or more groups, single factor One-way ANOVA, Kruskal-Wallis 4 or more groups, multiple factors Two-way ANOVA Comparison tests require replicates and use the variability within the replicates to assign a confidence level as to whether the gene is differentially expressed. Supporting material - Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today, 7(11 Suppl).: S55-63.

7 t-test for comparison of two groups Calculate t statistic t = difference between groups difference within groups = Mean grp 1 Mean grp 2 ((s 12 /n 1 ) + (s 22 /n 2 )) 1/2 s = variance n = size of sample Determine confidence level for t (probability that t could occur by chance) df = n 1 + n 2-2 The larger the difference between the groups and the lower the variance the bigger t will be and the lower p will be

8 Differential Expression 2 groups, 4 replicates each Mean, standard deviation, fold change and p-value calculated 8 Mean Signal Exp Con Gene 1 Fold Change = 5.3 p = 0.19 Mean Signal Exp Con Gene 2 Fold Change = 5.3 p = 0.03 Fold change vs. p value

9 Analysis of Variance (ANOVA) Like t-test, identifies genes with large differences between groups and small differences within groups For use with 3 or more groups One-way and two-way One-way examines effects of one factor on gene expression Two-factor can examine effects of two factors on gene expression as well as the interaction of the two factors Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods Dec;31(4): Glantz S. Primer of Biostatistics. 5 th Edition. McGraw-Hill. Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.

10 Two-way ANOVA Data: Sex differences in salivary glands (CodeLink Ms 10K Bioarray) M Sex F Strain effect Gland Parotid Sublingual Par F Par M Sub F Sub M Gene expression pattern Sex effect Interaction Strain and sex effect (no Interaction)

11 Two-way ANOVA compared to t-test Data: Sex differences in salivary glands (CodeLink Ms 10K Bioarray) M Sex F Gland Parotid Sublingual Two-way t-test Sex Differences Pavlidis P, Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2001;2(10):RESEARCH0042.

12 Analysis Workflow Examples 2 groups (apoe -/- aorta vs. wt aorta) 5 groups, single factor (Drosophila Innate Immune Response Time Series) 12 groups, two factors (Immune response to hookworms in mouse lung) t-test BH (FDR) Up regulated Down regulated Gene Lists One-way ANOVA BH (FDR) Clustering Gene Lists Two-way ANOVA BH (FDR) Clustering Gene Lists Individual genes of interest Biological themes (Pathways, molecular functions, etc.)

13 Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA

14 GeneSifter Microarray Data Analysis Accessibility Web-based Secure Data management Data Annotation (MIAME) Multiple upload tools CodeLink Affymetrix Illumina Agilent Custom Differential Expression - Powerful, accessible tools for determining Statistical Significance R based statistics Bioconductor Comparison Tests t-test, Welch s t-test, Wilcoxon Rank sum test, one-way ANOVA, two-way ANOVA Correction for Multiple Testing Bonferroni, Holm, Westfall and Young maxt, Benjamini and Hochberg Unsupervised Clustering PAM, CLARA, Hierarchical clustering Silhouettes

15 GeneSifter Microarray Data Analysis Integrated tools for determining Biological Significance One Click Gene Summary Ontology Report Pathway Report Search by ontology terms Search by KEGG terms or Chromosome

16 The GeneSifter Data Center Free resource Training Research Publishing 6 areas Cardiovascular Cancer Endocrinology Neuroscience Immunology Oral Biology Access to : Data Analysis summary Tutorials WebEx

17 The GeneSifter Data Center

18 The Gene Expression Omnibus (GEO) Gene expression data repository (mostly microarrays) Over 3000 data sets All array platforms represented Searchable by Platform Species Experiment annotation Downloadable data Using the Gene Expression Omnibus (

19 Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung General microarry data analysis workflow From raw data to biological significance Comparison statistics Two-way ANOVA GeneSifter Overview The Gene Expression Omnibus (GEO) Microarray analysis of gene expression following hookworm infection Data overview Dissection of the immune response using 2-way ANOVA

20 GeneSifter Workflow Examples 2 groups (apoe -/- aorta vs. wt aorta) 5 groups, single factor (Drosophila Innate Immune Response Time Series) 12 groups, two factors (Immune response to hookworms in mouse lung) t-test BH (FDR) Up regulated Down regulated Gene Lists One-way ANOVA BH (FDR) Clustering Gene Lists Two-way ANOVA BH (FDR) Clustering Gene Lists Individual genes of interest Biological themes (Pathways, molecular functions, etc.)

21 Project Analysis : Two-way ANOVA Scott lab, Johns Hopkins University (Bloomberg School of Public Health ) Affymetrix Mouse Wild type and SCID mice Control and 5 time points after infection CEL files available (loaded and MAS5 processed in GeneSifter) Alex Loukas, and Paul Prociv. Immune Responses in Hookworm Infections. Clinical Microbiology Reviews, October 2001, p , Vol. 14, No. 4

22 Analysis of Variance (ANOVA) Like t-test, identifies genes with large differences between groups and small differences within groups For use with 3 or more groups One-way and two-way One-way examines effects of one factor on gene expression Two-way can examine effects of two factors on gene expression as well as the interaction of the two factors Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods Dec;31(4): Glantz S. Primer of Biostatistics. 5 th Edition. McGraw-Hill. Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.

23 Project Analysis : Two-way ANOVA Factor One: Strain (2 levels, SCID, WT) Factor Two: Time after infection (6 levels, con, 2,3,4,8,12 dpi) Strain: Time: WT SCID Gene expression pattern Strain Effect Time Effect Interaction

24 Project Analysis : Two-way ANOVA

25 Project Analysis : Two-way ANOVA Identify Factors Indicate number of levels for each Identify levels for each factor

26 Project Analysis : Two-way ANOVA Assign levels for each factor to cells Include fold-change cutoff if desired Select effect to filter on first (you can switch later)

27 Two-way ANOVA : Strain Effects

28 Biological Significance Gene Annotation Sources UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters. Gene titles are assigned to the clusters and these titles are commonly used by researchers to refer to that particular gene. LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive information, including function, about genes. Gene Ontologies The Gene Ontology Consortium provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products, that can be used by databases such as Entrez Gene. KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory and metabolic pathways for genes. Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference sequences for both the mrna and protein products of included genes. GeneSifter maintains its own copies of these databases and updates them automatically.

29 One-Click Gene Summary

30 Two-way ANOVA : Strain Effects

31 Ontology Report

32 Ontology Report : z-score R = total number of genes meeting selection criteria N = total number of genes measured r = number of genes meeting selection criteria with the specified GO term n = total number of genes measured with the specific GO term Reference: Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7

33 Z-score Report

34 KEGG Report

35 Two-way ANOVA : Strain Effects

36 Strain effects - Visualization Visualization of 517 genes

37 Strain effects - Partitioning Segregation of expression patterns using k-medoids clustering

38 Strain effects - Partitioning Silhouette widths are used to find best number of clusters k mean sil. width Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25.

39 Strain : Cluster 1

40 Strain : Cluster 2

41 Two-way ANOVA : Time Effects

42 Two-way ANOVA : Time Effects

43 Time : Cluster 1

44 Time : Cluster 2

45 Two-way ANOVA : Interaction

46 Two-way ANOVA : Interaction

47 Interaction : Cluster 3

48 Interaction : Cluster 2

49 Two-way ANOVA : Summary Immune response to hookworms in mouse lung 12 groups (3 biological replicates) 2 factors (Strain and Time) Two-way ANOVA Interaction ~39,000 genes 56 genes Pattern selection Hierachical clustering, PAM (Interaction) Z-scores Biological process Transcription (4) Circadian Rhythm (3) Strain Time 517 genes 1054 genes Biological process Immune response (8) Chitin catabolism (4)

50 Strain effects, time effects and interaction

51 GeneSifter Workflow Examples 2 groups (apoe -/- aorta vs. wt aorta) 5 groups, single factor (Drosophila Innate Immune Response Time Series) 12 groups, two factors (Immune response to hookworms in mouse lung) t-test BH (FDR) Up regulated Down regulated Gene Lists One-way ANOVA BH (FDR) Clustering Gene Lists Two-way ANOVA BH (FDR) Clustering Gene Lists Individual genes of interest Biological themes (Pathways, molecular functions, etc.)

52 Resources Monthly Webinar Series 5/23/06 - Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung Archived - The microarray data analysis process - from raw data to biological significance Archived - Microarray analysis of gene expression in androgen-independent prostate cancer Archived - Microarray analysis of gene expression in male germ cell tumors Archived - Microarray analysis of gene expression in Huntington's Disease peripheral blood - a platform comparison

53 Thank You Trial account, tutorials, sample data and Data Center Eric Olson