Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation

Similar documents
Biomarkers, Early Prediction of Vaccine Efficacy and Safety

Immunological Genome Project:

GeneQuery: A phenotype search tool based on gene co-expression clustering. Alexander Predeus 21-oct-2015

Supplementary Figures Supplementary Figure 1

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Supplementary Methods

Analyzing Gene Set Enrichment

Gene List Enrichment Analysis

Representative flow cytometry scatter plots of the sorting panels for each immune cell population measured by LC-MS/MS.

Mouse expression data were normalized using the robust multiarray algorithm (1) using

Integrative Genomics 3b. Systems Biology and Epigenetics

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells

Analysis of Microarray Data

About ATCC. Established partner to global researchers and scientists

April transmart v1.2 Case Study for PredicTox

Effects of Different Normalization Methods on Gene Set Enrichment Analysis (GSEA)

Seven Keys to Successful Microarray Data Analysis

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Strategies for Assessment of Immunotoxicology in Preclinical Drug Development

Validation & Assay Performance Summary

Pathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Gene Expression Microarrays. For microarrays, purity of the RNA was further assessed by

Annotation and Function of Switch-like Genes in Health and Disease. A Thesis. Submitted to the Faculty. Drexel University. Adam M.

SUPPLEMENTARY INFORMATION

BABELOMICS: Microarray Data Analysis

TUTORIAL. Revised in Apr 2015

Nature Genetics: doi: /ng Supplementary Figure 1. H3K27ac HiChIP enriches enhancer promoter-associated chromatin contacts.

Human retinoic acid regulated CD161 + regulatory T cells support wound repair in intestinal mucosa

BL-7040: Oligonucleotide for Inflammatory Bowel Disease

Nature Immunology: doi: /ni Supplementary Figure 1. Data-processing pipeline.

Journal of Biomedical Informatics

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Cell Stem Cell, volume 9 Supplemental Information

Nature Immunology: doi: /ni Supplementary Figure 1

Exercise1 ArrayExpress Archive - High-throughput sequencing example

SUPPLEMENTARY INFORMATION

Stefano Monti. Workshop Format

Supplementary Fig. 1 related to Fig. 1 Clinical relevance of lncrna candidate

Александр Предеус. Институт Биоинформатики. Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется

TRAIL, Death Receptors, TRA-8, and Apoptosis

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

A CRITICAL REVIEW OF GENE MARKER SELECTION METHODS AND CELL COUNT INFERENCE TOOLS. Muying Wang

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

The Glue Grant: Inflammation and the Host Response to Injury

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

Analysis of Microarray Data

Methods for network visualization and gene enrichment analysis February 22, Jeremy Miller Postdoctoral Fellow - Computational Data Analysis

Enabling more sophisticated analysis of proteomic profiles. Limsoon Wong (Joint work with Wilson Wen Bin Goh)

Automated High-dimensional

SUPPLEMENTARY INFORMATION

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison

Novel cytokines in allergic disease

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Package GeneExpressionSignature

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

ISCT Telegraft Column: Mesenchymal Stromal Cell (MSC) Product Characterization and Potency Assay Development

Immunogenicity Prediction Where are we?

Microarray Informatics

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007

Bioinformatics : Gene Expression Data Analysis

ClustalW algorithm. The alignment statistics are 96.9% for both pairwise identity and percent identical sites.

Supplemental Information. Inflammatory Ly6C high Monocytes Protect. against Candidiasis through IL-15-Driven. NK Cell/Neutrophil Activation

Chromatin signature identifies monoallelic gene expression across mammalian cell types

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization

Supporting Information. Exploiting CD22 to Selectively Tolerize Autoantibody Producing B-cells in. Rheumatoid Arthritis

An Introduction to Transcriptome Overlap Measure (TROM)

Unbiased reconstruction of a mammalian transcriptional network mediating the differential response to pathogens

Improving coverage and consistency of MS-based proteomics. Limsoon Wong (Joint work with Wilson Wen Bin Goh)

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.

Comparative Analysis using the Illumina DASL assay with FFPE tissue. Wendell Jones, PhD Vice President, Statistics and Bioinformatics

Signatures of malaria-associated pathology revealed by high-resolution wholeblood transcriptomics in a rodent model of malaria

Determining presence/absence threshold for your dataset

Nature Immunology: doi: /ni.3694

Supplemental Information The Sensing of Environmental Stimuli by Follicular Dendritic Cells Promotes Immunoglobulin A Generation in the Gut

Systems Biology and Systems Medicine

MFMS: Maximal Frequent Module Set mining from multiple human gene expression datasets

Supplementary Figure 1

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome

Using Mechanistic Data to Predict and Interpret Clinical Outcomes in Probiotic Studies on Immunity.

Package disco. March 3, 2017

HSV-2 therapeutic vaccine program. Subunit vaccine candidates

Supplementary Figure 1 Validate the expression of mir-302b after bacterial infection by northern

Supplemental figure 1: Phenotype of IMC and MDSC after purification. A. Gating

Unbiased Reconstruction of a Mammalian Transcriptional Network Mediating Pathogen Responses

Human IFN-γ. Pre-Coated ELISA Kit

Supplementary Methods

ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data

Gene expression connectivity mapping and its application to Cat-App

Companion Diagnostics in Autoimmune Disorders: Improving Outcomes Through Personalized Medicine

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

Supplemental Information. By Capturing Inflammatory Lipids Released. from Dying Cells, the Receptor CD14 Induces

Supplementary Figure 1. Details of the component modelling and steps in generating and validating a PTGS. The steps (boxes) and the flow of

DC enriched CD103. CD11b. CD11c. Spleen. DC enriched. CD11c. DC enriched. CD11c MLN

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information

Transcription:

Immunity Supplemental Information Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation Jernej Godec, Yan Tan, Arthur Liberzon, Pablo Tamayo, Sanchita Bhattacharya, Atul J. Butte, Jill P. Mesirov, W. Nicholas Haining

Supplemental Figure Legends Figure S, related to Figure. Selection of biologically meaningful comparisons. Schematic of an example of comparisons that are biologically meaningful (black arrows) or difficult to interpret (red arrows) in one of the studies included in the ImmuneSigDB (GSE772). Figure S2, related to Figure. ImmuneSigDB largely contains unique gene sets not represented in previous immune signature collections. (A) Left, overlap in gene set membership within ImmuneSigDB. Heatmap indicates significance of overlap indicated by hypergeometric P-values. Highlighted lineages or biological process are found in each of the significantly overlapping clusters. Right, summary of hypergeometric P-value ranges of all pairwise overlaps on the left. (B) Overlap in gene set membership between ImmuneSigDB and gene signature collection developed by Li et al. Heatmap indicates significance of overlap indicated by FDR values (right). Highlighted are representative biological processes or cell lineages in each of the significantly overlapping clusters of gene sets/modules. Summary of FDR ranges of all pairwise overlaps are shown to the right. (C and D) Analysis as in (B) showing overlapping gene memberships between immune gene modules defined by Chaussabel et al. and ImmuneSigDB (C) and between the Li et al. and Chaussabel et al. collections (D). (E) Significance of enrichment of gene sets contained in ImmuneSigDB or modules in collections by Li et al or Chaussabel et al. Each of the three collections was used to analyze four publically available datasets of gene expression datasets from LPS vs. unstimluated DC, Tregs vs. conventional CD4 + T cells, memory B cells vs. naive B cells, and plasma B cells vs. naive B cells.

Figure S3, related to Figure 2. hematopoietic lineages are accurately clustered using ImmuneSigDB enrichments. (A) Unsupervised bi-clustering of ssgsea values using ImmuneSigDB in cell subsets represented in DMAP. Clustering was performed in gene sets with the top 0% highest mean absolute deviation. Figure S4, related to Figure 4. and mouse blood cells in gram positive sepsis undergo similar transcriptional response. (A and B) GSEA of gene sets of genes up-regulated in Gram positive human (GSE9960) or mouse (GSE9668, C57BL/6) in the opposite species mouse gene set enriched in the human dataset (A) and the human gene set enriched in mouse dataset (B). Mountain plots indicate cumulative enrichment, and ticks below the line correspond to the position of respective gene set genes up-regulated in sepsis versus control conditions. (C) Number of significantly enriched gene sets mouse (purple) or human (green) dataset. Statistical significance was calculated by the hypergeometric test. (D) Distribution of the number of gene sets including the shared genes in the leading edges of common enriched gene of mouse and human sepsis dataset enrichments. Statistical significance is calculated by the Spearman test. (E - H) Analysis was done as above using GSEA of gene sets of down-regulated in Gram negative bacteria-induced sepsis in human (GSE9960) or gram positive sepsis mouse (GSE9668, C57BL/6) in the opposite species mouse gene set enriched in the human dataset (left) and the human gene set enriched in mouse dataset (right). Figure S5, related to Figure 5. LEM analysis identifies commonly represented and co-regulated metagenes in immune biological processes that are unique, yet overlapping with Gene Ontology Terms. Heatmap representing the sparse matrix of 2

all leading edge genes (FDR<0.00) in GSEA analysis of human (A) and mouse sepsis (B) as in Figure 5. Gene sets were clustered using Pearson correlation and genes ordered based on their LEM membership, annotated above. (C and D) Overlap in the genes contained in LEMs defined for human (C) and mouse (D) sepsis described in Figure 5 and the predominant GO terms. Numbers of genes are indicated in the Venn diagrams, and the statistical significance of each overlap was assessed using hypergeometric test in the space of 20606 (C) and 583 (D) total genes, based on annotated unique genes or human orthologs. (E and F) Fraction of LEM genes that are contained in the predominant GO term for human (E) and mouse (F) studies. Figure S6, related to Figure 5. Illustration describing metagene overlap representation. (A) Venn diagrams representing absolute gene overlap of each human metagene with each mouse metagene. Numbers represent the number of unique and overlapping genes in each comparison. Statistical significance was assessed using Hypergeometric text in the space of all shared genes from the two datasets (n=2,634). The relative overlap of each human metagene with each mouse metagene is represented in Circos plot (B) and the significance of each overlap is represented in a heat map form (C). 3

Supplemental Table Legends Table S, related to Figure. Left, the list of scientific journals prioritized for selection of immunological studies included in the ImmuneSigDB. Right, Affymetrix gene expression microarray platforms that studies contained in ImmuneSigDB originate from. Table S2, related to Figure and S2. Comparison of features in ImmuneSigDB collection, Chaussabel et al. modules, and Li et al. gene set collection. Table S3, related to Figure and S2E. Top 20 gene sets from ImmuneSigDB, Li, and Chaussabel enriched in LPS-stimulated dendritic cells relative to unstimulated cells (GSE4000), in regulatory T cells (Treg) compared to conventional CD4 + T cells (Tconv) (GSE25087), enriched in peripheral blood plasma cells compared to naive B cells (GSE22886), and in peripheral blood memory B cells compared to naive B cells (GSE42724). Represented are the names and associated description, number of genes in a gene set, statistical significance (false discovery rate, FDR), as well as tissue source, species of origin, and PubMed ID (PMID) of the study where the gene set originates. Table S4, related to Figure 2. Top, ImmuneSigDB gene sets defining each of the clusters in Figure 2 (rows). Bottom, quantification and relative representation of genes across ImmuneSigDB gene set clusters defined in Figure 2. Table S5, related to Figure 3. Statistical significance of enrichment of human and mouse gene sets derived from studies profiling LPS-stimulated versus unstimulated myeloid cells, plasma cells versus naive B cells, regulatory (Treg) versus conventional 4

(Tconv) CD4+ T cells, and memory versus naive B cells; and enriched in a randomly chosen human study examining the same phenotype as described in Figure 3A, 3B, 3C, 3D, respectively. Table S6, related to Figure 4B. Names of ImmuneSigDB gene sets highly enriched in both mouse and human sepsis shown in Figure 4B. Table S7, related to Figures 4D, S4D, S4H. Lists of genes shown in Figure 4D, S4D, S4H. The shared leading edge genes are ranked based on their relative abundance in human and mouse studies. Listed are the rank in the respective species, the relative abundance, and the number of times a particular gene appears in the leading edge matrix of human and mouse GSEA. 5

Figure S GSE772; Amit et al., Science, 2009 Comparison Biologically Meaningful Biology of uncertain significance Duration of Dendritic Cell Stimulation 2 hours 8 hours 24 hours Control (Unstimulated) Stimulation Condition LPS (TLR4 agonist) CpG (TLR9 agonist)

Figure S2 B. B. Li et al.(n=346) C. Chaussabel et al. (n=260) ImmuneSigDB D. Chaussabel et al. (n=260) ImmuneSigDB (n=4872) B cell Myeloid T cell act., TLR T cell proliferation IFNα Naive T cell Cell division T cell act. ImmuneSigDB (n=4872) Li et al. (n=346) B cell Act. cells Treg Myeloid act. ImmuneSigDB Myeloid IFNα IFNα FDR 0-4 0-8 0-2 Effector CD8 + T cell Monocyte Interferon Signaling Activated Myeloid Cell Naive T cell Naive & Memory T cell FDR 0-4 0-8 0-2 FDR 0-4 0-8 0-2 Fraction of pairwise comparisons (%) Fraction of pairwise comparisons (%) Fraction of pairwise comparisons (%) Fraction of pairwise comparisons (%) 00 95 2.0.5.0 0.5 0.2 0. 0.0 00 95 2 0.5 0.2 0.08 0.04 0.00 00 99 0.25 0.20 0.5 0.0 0.05 0.00 FDR. 0-20 0-2 0-8 0-4 Significance of overlap (FDR) 0-20 0-20 0-4 0-8 0-2 80 45 0 3 2 0.4 0.2 0.0 0-6 0-2 0-2 0-2 0-8 0-8 0-8 0-4 Significance of overlap (FDR) 0-4 Significance of overlap (FDR) 0-4 Significance of overlap (FDR) E. Log0 (FDR q val) Log0 (FDR q val) Log0 (FDR q val) Log0 (FDR q val) 0 2 3 4 5 6 7 0 2 3 4 5 6 7 0 2 3 4 5 6 7 LPS vs. Unstim DC Treg vs. Tconv Memory vs. Naive B cells Plasma cell vs. Naive B cells 0 2 3 4 5 6 7 ImmuneSigDB Li et al. Chaussabel et al.

Figure S3 DMAP ssgsea enrichment score Erythroid cell Megakaryocyte row min Myeloid cell SC/Progenitor row max T cell B cell NK cell NKT cell

Figure S4 A Enrichment Score B Enrichment Score 0.5 0.4 0.3 0.2 0. 0.0 0.5 0.4 0.3 0.2 0. 0.0 Sepsis (GSE9960) FDR q-val < 0.00 Nom. p-val < 0.00 FDR q-val = 0.002 Nom. p-val < 0.00 (GSE9668) Sepsis vs. Ctrl Gene Set Control Sepsis vs. Ctrl Gene Set C Enriched gene sets (FDR < 0.00) 835 P =.33 x 0-52 50 Leading Edge Genes 779 64 79 80 D Fraction of Leading Edges Fraction of Leading Edges 40 30 20 0 0 40 30 20 0 CXCL ICAM ILB TNFAIP6 TLR2 IRAK3 CCR, ILA NFKB, NFKBIA, IL0 Gene Rank Spearman r = 0.8534 P < 0.000 Sepsis Control P =.8 x 0-22 0 Gene Rank E Enrichment Score F Enrichment Score 0.0-0.2-0.4 0.0-0.2-0.4 FDR q-val < 0.00 Nom. p-val < 0.00 Sepsis FDR q-val = 0.004 Nom. p-val = 0.009 (GSE9960) Sepsis vs. Ctrl Gene Set (GSE9668) Sepsis vs. Ctrl Gene Set 20606 Control 544 G Enriched gene sets (FDR < 0.00) 386 P = 5.22 x 0-4 83 Leading Edge Genes 20 202 795 247 P = 2.76 x 0-77 H Fraction of Leading Edges Fraction of Leading Edges 30 20 0 0 30 20 0 0 TCF7 LEF CCR7 IL7R Gene Rank Gene Rank Spearman r = 0.937 P < 0.000 Sepsis Control

Figure S5 A Genes Metagene 2 3 B Genes Metagene 2 3 C Gene Sets Gene Sets Signal-to-Noise Signal-to-Noise 0 0.9 0 4.8 E D Mitosis 05 65 Metagene 2 Phagocytic vesicle Metagene 3 Inflammatory response 88 59 33 37 7 3 6 P = 4.9x0-22 P = 2.02 x 0-3 P = 0.00037 F Fraction of LE genes in GO term (%) 20 5 0 5 0 2 3 Metagene Type I IFN signaling 62 53 Metagene 2 TRIF-mediated Metagene 3 TLR signaling Phagocytic vesicle 6 64 09 63 Fraction of LE genes in GO term (%) 20 5 0 5 6 8 9 P = 7.00 x 0-23 P = 3.8 x 0-7 P = 4.0 x 0-9 0 2 3 Metagene

Figure S6 (GSE9960) Metagene 2 Metagene 3 78 22 3 3 75 98 75 36 (GSE9668) Metagene 2 P = 0.0964 24 22 P = 0.0238 P = 0.208 6 8 95 P = 0.672 P = 0.2239 90 34 05 P = 2.79 x 0-23 Metagene 3 08 0 2 72 46 55 7 38 P = 0.006 P =.099 x 0-3 P = 0.0723 D Metagene 2 (20 genes) E Metagene 2 Metagene 3 Metagene 3 (39 genes) 46 (22 genes) Metagene 2 Metagene 3 (78 genes) Metagene 2 (24 genes) Metagene 3 (8 genes) Metagene Overlap (P-val) 0-38

Table S Journal Platform ID Name Organism Nature GPL26 430_2 Mus musculus Science GPL339 MOE430A Mus musculus Cell GPL832 430A_2 Mus musculus Nature Immunology GPL6246 MoGene-_0-st Mus musculus Immunity GPL8 MG_U74Av2 Mus musculus Journal of Experimental Medicine GPL570 HG-U33_Plus_2 Homo sapiens Journal of Clinical Investigation GPL96 HG-U33A Homo sapiens Cell Host and Microbe GPL57 HG-U33A_2 Homo sapiens Blood GPL6244 HuGene-_0-st Homo sapiens Proc Natl Acad Sci USA GPL8300 HG_U95Av2 Homo sapiens Current Opinion in Immunology GPL97 HG-U33B Homo sapiens Trends in Immunology GPL9 HG_U95A Homo sapiens Journal of Allergy and Clinical Immunology GPL392 HT_HG-U33A Homo sapiens Plos Pathogens Plos One Mucosal Immunology Arthritis and Rheumatism Seminars in Immunology Autoimmunity Journal of Immunology European Journal of Immunology Genes and Immunity Infection and Immunity Immunology and Cell Biology Vaccine Cytokine Journal of Clinical Immunology Immunology

Table S2 Parameter ImmuneSigDB Li Chaussabel Number of gene sets 4872 346 260 Fraction of gene sets annotated 00% 75% 5% Number of Studies 389 540 9 Number of Samples 6283 32766 40 Species 2 Cells/tissue types 3 2

Table S6 Rank Gene set GSE22886_NAIVE_BCELL_VS_NEUTROPHIL_DN 2 GSE2968_MONOCYTE_VS_PDC_UP 3 GSE6269_E_COLI_VS_STREP_PNEUMO_INF_PBMC_DN 4 GSE22886_NAIVE_CD4_TCELL_VS_MONOCYTE_DN 5 GSE3456_UNTREATED_VS_24H_NOD2_AND_TLR_TLR2_LIGAND_TREATED_MONOCYTE_DN 6 GSE3456_UNTREATED_VS_24H_TLR_TLR2_LIGAND_TREATED_MONOCYTE_DN 7 GSE6269_E_COLI_VS_STREP_AUREUS_INF_PBMC_DN 8 GSE22886_NAIVE_TCELL_VS_MONOCYTE_DN 9 GSE2968_MONOCYTE_VS_MDC_UP 0 GSE22886_NAIVE_CD8_TCELL_VS_MONOCYTE_DN