SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 SUPPLEMENTARY INFORMATION ARTICLE NUMBER: DOI: /NMICROBIOL Species-function relationships shape ecological properties of the human gut microbiome Sara Vieira-Silva 1,2*, Gwen Falony 1,2*, Youssef Darzi 1,2,3, Gipsi Lima-Mendez 1,2,3, Roberto Garcia Yunta 2,3, Shujiro Okuda 4, Doris Vandeputte 1,2,3, Mireia Valles-Colomer 1,2, Falk Hildebrand 2,3, Samuel Chaffron 1,2,3 and Jeroen Raes 1,2,3** KU Leuven University of Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium 1 ; VIB, Center for the Biology of Disease, Leuven, Belgium 2 ; Microbiology Unit, Faculty of Sciences and Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium 3 ; Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan 4. Correspondence and requests for materials should be addressed to Jeroen Raes. jeroen.raes@kuleuven.be This file includes: Figures S1 to S7 Legends for Data S1 to S3 Legends for Table S1 to S11 Other Supplementary material for this manuscript include: Data S1 and S2 Data S3 (Excel file containing Supplementary Table S1 to S11) NATURE MICROBIOLOGY 1

2 Supplementary figures and legends Figure S1. Distributions of detection of saccharolytic (N=22), proteolytic (N=36-, and lipolytic (N=6) GMMs across 562 gastrointestinal tract reference genomes (img/hmp v4.0).

3 Figure S2. Metabolic consistency between species belonging to the same genus. Dissimilarity in GMM profiles of pairs of conspecies of the human gut microbiota, in the 9 genera with more than 5 sequenced species representatives.

4 Figure S3. Differences in median saccharolytic, proteolytic, and lipolytic GMM abundances between the different enterotypes (Bacteroides [B], Ruminococcaceae [R], and Prevotella [P] enterotypes; 277 MetaHIT samples).

5 Figure S4. Decomposition of functional shifts between the Bacteroides [B], Ruminococcaceae [R], and Prevotella [P] enterotypes into individual genus-level contributions. Each of the four plots displays the functions increased (functional shifts) between the first and the second enterotype. Two plots (R vs B and P vs B) were omitted because no functions were increased in that direction. Each significant shift is decomposed into taxa contributing to a reduction of the shift (Wilcoxon test statistic: W<0) and those driving the shift (W>0), in the each enterotype. Interpretation example: the increased saccharolytic potential in the B- compared to R- and P-enterotype is mainly driven by the Bacteroides genus, with part of the community s saccharolytic potential being compensated by Prevotella in the P-enterotype.

6 Figure S5. Proteolytic:saccharolytic potential ratio variation between the different enterotypes (Bacteroides [B], Ruminococcaceae [R], and Prevotella [P] enterotypes (277MetaHIT samples). Figure S6. Linear least squares fit between community average maximum growth rates and both functional redundancy and observed richness (277 MetaHIT samples).

7 Figure S7. Growth rate (median peak-to-trough ratio) variation between the different enterotypes (Bacteroides [B], Ruminococcaceae [R], and Prevotella [P] enterotypes; 277 MetaHIT samples). MWU FDR<0.05*,<0.01**,<0.001***. Figure S8. Expected functional dissimilarity (Y) was determined by linear least squares fitting of the relation between pairwise phylogenetic distance (X; calculated from the 16S rrna gene phylogenetic tree) and pairwise GMM-based functional dissimilarity restricting to intra-genus comparisons (194 img/hmp v4.0 reference genomes, 52 genera).

8 Figure S9. Metabolic consistency between strains belonging to the same species. Dissimilarity in GMM profiles of pairs of conspecific strains of the human gut microbiota, in the 12 genera with more than 5 sequenced strain representatives. Dark green bars only cover completed genome sequences, while light green bars also include draft genomes.

9 Supplementary Data S1. Gut-specific metabolic modules formatted for easy integration into bioinformatics pipelines. The current set includes 103 GMMs annotated using exclusively prokaryotic KEGG orthologs (KO). GMMs pathway variants syntax follows KEGG syntax: alternative KOs are tab-separated (OR operation), while return- andcomma-separated KOs are all required for process completeness (AND operation). GMMs description includes input/output compounds ( cpd ), position ( pos ) in the gut metabolic map and two higher functional hierarchical levels ( hier ), grouping GMMs in 10 metabolic categories and 30 sub-categories. Supplementary Data S2. Gut reference genomes phylogenetic tree. Maximum likelihood reconstruction (model GTR+8Γ+I; 100 rapid bootstraps) from 16S rrna gene sequences alignment for the gut reference genomes (one randomly selected representative per species, N=260). Metadata concerning the reference genomes, including taxonomy, can be found in Supplementary Table S3. Supplementary Data S3. Excel file containing Supplementary Table S1 to S11 and their legends. Supplementary Table S1. Gut-specific metabolic module (GMM) abundances in gut reference genomes (N=532). Minimum detection threshold was set at 66% pathway coverage. Metadata concerning the reference genomes (IMG taxon_oid used as identifiers), including taxonomy, can be found in Supplementary Table S3. Supplementary Table S2. Saccharolytic, proteolytic and lipolytic fermentation potential in gut reference genomes (N=532). For each category, genomic potential was defined as the sum of abundances of individual GMMs in each category ( carbohydrate degradation, amino acid degradation, and lipid degradation, respectively) divided by the number of GMMs in the category (N=22, N=36, and N=6 respectively). According to their position on the triplot representation of the three genomic potentials (c.f. Figure 1), gut reference genomes are categorized as S, P, L or G. Supplementary Table S3. Gut reference genomes (N=532) metadata. The metadata includes IMG identifiers (taxon_oid) the random selection of non-redundant genomes (one representative per species, N=260), taxonomic annotation, genome status, estimated maximum growth rates (MGR) and transcription factor (TF) and transporters (TR) counts.

10 Supplementary Table S4. Significant associations between gut specific functions and the different phyla covering the gut reference genomes. Association between a phylum and a GMM are defined as a significant over-representation of the number of genera within the phylum encoding the GMM, in proportion to those not encoding it, compared to the proportion for the rest of the phyla (Fisher exact test). A genus was defined as encoding the GMM is any genome encoded the GMM. A summary table of the results is also provided. Supplementary Table S5. Gut-specific module description. Description includes hierarchical classification, position in the metabolic network, distribution in reference genomes, and centrality index. Supplementary Table S6. Gut-specific metabolic module (GMM) abundances in the MetaHIT gut metagenomes (N=277). Minimum detection threshold was set at 66% pathway coverage. Metadata concerning the samples can be found in Supplementary Table S7. Supplementary Table S7. Saccharolytic, proteolytic and lipolytic potential in the MetaHIT metagenomes (N=277). Samples enterotypes and gradients commonly used for community typing (Firmicutes:Bacteroides, Prevotella:Firmicutes, and Prevotella:Bacteroides ratios) are also provided. Supplementary Table S8. Differences in saccharolytic, proteolytic and lipolytic potential in the MetaHIT metagenomes (N=277) between community type. A) Community types were defined by enterotyping (categorical; MWU test), gradient (continuous; Spearman correlation) approach (Firmicutes:Bacteroides, Prevotella:Firmicutes, and Prevotella:Bacteroides ratios) or a combined approach (gradients within two enterotypes; e.g. Prevotella:Bacteroides ratio in B- and P- enterotype samples); B) Differences between community type, in each GMM within the saccharolytic, proteolytic and lipolytic potential categories. Supplementary Table S9. Stability and resilience indicators in the MetaHIT metagenomes (N=277). Stability and resilience indicators used were functional redundancy (FR), observed genus richness, predicted average maximum growth rate (amgt), and predicted average growth rate at the time of sampling (PTR).

11 Supplementary Table S10. Differences in stability and resilience indicators in the MetaHIT metagenomes (N=277) between community types. Community types were defined by enterotyping (categorical; MWU test) and gradient (continuous; Spearman correlation) approaches (Firmicutes:Bacteroides, Prevotella:Firmicutes, and Prevotella:Bacteroides ratios). Stability and resilience indicators used were: observed genus richness, functional redundancy (FR), predicted average maximum growth rate (amgt), and predicted average growth rate at the time of sampling (PTR). Supplementary Table S11. Overlap between GMMs. Maximum coverage obtained in the detection of other GMMs (subject GMMs) using one GMM as query. Only non-zero overlap results are reported in the table.