Methodological impact on metagenomics analyses: the skin microbiome and beyond

Size: px
Start display at page:

Download "Methodological impact on metagenomics analyses: the skin microbiome and beyond"

Transcription

1 Methodological impact on metagenomics analyses: the skin microbiome and beyond Cyrille Jarrin 1, Patrick Robe 1, Daniel Auriol 1, David Villanova 1, Kuno Schweikert 2 1 Libragen, Canal Biotech, 3 rue des satellites, F-31400, Toulouse, France 2 Induchem, industriestrasse 8a, CH-8604 Volketswil, Switzerland Introduction The term metagenome was coined by Handelsman and coworkers in 1998 [1] and was defined as the collective genome of microflora found in a defined environment. Metagenomics is a culture-independent biomolecular way of analyzing environmental samples of cohabiting microbial populations that has allowed biologists to access to the huge hard to isolate fraction of environmental microbial communities. The initial studies aiming at the characterization of the full extent of microbial diversity included DNA purification, fragmentation and cloning in an easily cultivable host. The recombinant cells were subsequent grown to obtain sufficient DNA for the subsequent analyses. These analyses were designed to elucidate taxonomic compositions (16S rrna) or to identify functional properties (in particular using sometimes sophisticated methods of detection of enzymatic activities). With the advent of the so called next generation technologies of sequencing (NGS), metagenome cloning is no longer necessary since the required DNA amounts are by far less important. Often nowadays inaccurately restricted to the study of environmental samples using sequencing [2], metagenomics has promoted a considerable increase in knowledge of both the taxonomic and functional microbial diversity of natural ecosystems. Moreover, the exploration of microbial communities associated with human body sites (gut, skin, mouth, vagina ) enables the deciphering of close relationships between human health and inhabiting microbiota (e.g. [3]). Realizing the potential of metagenomics for discovering novel genes from the yet untapped microbial diversity, libragen has performed for 15 years analyses of microbial communities [4] as well as research programs aiming at the discovery of new and performant biocatalysts and metabolic pathways that give solutions to industrial issues [5-6]. The apparent practical simplicity of DNA extraction from natural samples using dedicated commercial kits, and the explosion of NGS facilities, allows to define the genetic diversity of bacterial communities and enables prediction of associated gene functions. However, the applied procedures in the publications are non-standardized, sometimes poorly described. As a consequence, results can hardly be compared in particular when a defined environment such as human body is considered. It must be mentioned that many sources of technical biases that can significantly impact the results have been identified: experimental design, sampling, sample storage [7], insufficient purity of extracted nucleic acids, inappropriate selection of 16S variable region [8], poor choice of primers set [9], insufficient control of produced libraries, inappropriate raw data processing [10] or deficient statistical analyses. The objective of libragen was to evaluate more precisely the impact of the conditions of sample storage, DNA extraction methods and primers set selection on the observed

2 taxonomic profiles of two selected environments, human gut and human skin. This will allow to have detailed and reliable methodological information when establishing an experimental plan dedicated to the characterization of microbiomes. Materials and Methods Microbiota sampling and storage Gut microbiota. Stool from a healthy adult volunteer was freshly collected. Three different ways were considered for DNA extraction: 1) from a fraction of the fresh stool using different commercial kits (see DNA extraction section); 2) from the fresh stool that was previously frozen (-20 C); 3) from two fractions of the initial fresh stool treated with storage commercial kits. The first fraction was treated with the Omnigene Gut kit (DNA Genotek, Ottawa, Canada) and the second with the PSP Spin stool DNA plus kit (Stratec Molecular GmbH, Berlin, Germany). Both resulting media were kept at 4 C until DNA extraction. Skin microbiota. Non-invasive skin samples were collected from forehead and forearm of healthy volunteers, by swabbing with sterile 5x5cm gauze, pre-moistened with a sterile solution of 0.15 M NaCl 0.1% Tween 20. Gauze samples were collected in a 50 ml plastic tube and frozen at -20 C until DNA extraction. DNA extraction Gut microbiota. Four different commercial kits were used for DNA extraction following the manufacturer recommendations: PowerLyzer PowerSoil DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, USA) QIAamp Fast DNA Stool Kit (Qiagen, Valencia, USA) PSP Spin stool DNA plus kit (Stratec Molecular GmbH, Berlin, Germany) FastDNA SPIN Kit for Feces (MP Biomedicals, Santa Ana, USA). Skin microbiota. Each tube was fully filled up with NaCl-Tween 20 solution, horizontally shaked 15 min at 800 rpm. The NaCl-Tween 20 suspension was transferred in new tubes and gauzes spined-dry (biological safety cabinet) to collect most of the suspension volume. Suspensions were then centrifuged at 8000 rpm during 30 min to obtain a cell pellet. DNA extraction was performed using different methods based on mechanical lysis: 1) In house method. Briefly, the cell pellet was resuspended with a hexadecyltrimethylammonium bromide (CTAB) extraction buffer (0.5 ml) and split in two equal parts for duplicated DNA extractions. An equal volume of phenol-chloroform-isoamyl alcohol 25:24:1 was added to each cell suspension. Cells were lysed 30 sec at 5.5 m/s using the FastPrep FP120 bead beating system (Bio-101, Vista, California). Samples were centrifuged 10 min, 4 C at rpm. The aqueous phase was collected and the precipitation was done with 2 volumes of 100% ethanol and 1/10 volume of 5 M NaCl at 4 C, overnight. Then, purification was performed using the Illustra GFX purification kit (GE HealthCare, Pittsburgh, USA) to obtain highly purified DNA. 2) Commercial method. The PowerLyzer PowerSoil DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, USA) was used, according to the recommended procedure.

3 Sequencing 16S rrna gene sequencing Illumina technology. Sequencing was performed with the MiSeq device (Illumina, Inc., San Diego, CA, USA) through a 600 cycles paired-end run, targeting three 16S variable regions: the V1V2 region; amplicon length: 350 bp, the V3V4 region (16S-Mi341F forward primer 5 - CCTACGGGNGGCWGCAG-3 and 16S-Mi805R reverse primer 5 -GACTACHVGGGTATCTAATCC-3 ) producing about 460 bp amplicons, and the V4V5 region; amplicon lengths: 425 bp (V4V5a) and 470 bp (V4V5b). PCR1s were performed as follows: 4 µl of template DNA (20 ng) were mixed with 0.6 µl of each reverse and forward primers (10 µm), 6 µl of KAPA HiFi Fidelity Buffer (5X), 0.9 µl of KAPA dntp Mix (10 mm each), 6.5 µl of distilled water (DH 2 O), and 0.6 µl of KAPA HiFi hotstart Taq (1 U/µL), for a total volume of 30 µl. Each amplification was duplicated, and duplicates were pooled after amplification. PCR1 cycles consisted of 95 C for 3 min and then 27 cycles of 95 C for 30 s, 59 C for 30 s, and 72 C for 30 s, followed by a final extension at 72 C for 5 min, with a MJ Research PTC200 thermocycler. Negative controls were included in all steps to check for contamination. All duplicate pools were controlled by gel electrophoresis, and amplicons were quantified using fluorometry. Libraries ready for analyses were then produced following the Illumina guidelines for 16S metagenomics libraries preparation. Briefly, the PCR1 amplicons were purified and controlled using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). To enable the simultaneous analysis of multiple samples (multiplexing), Nextera XT indexes (Illumina) were added during PCR2 using between 15 to 30 ng of PCR1 amplicons. PCR2 cycles consisted of 94 C for 1 min and then 12 cycles of 94 C for 60 s, 65 C for 60 s, and 72 C for 60 s, followed by a final extension at 72 C for 10 min. Indexed libraries were purified, quantified and controlled using an Agilent 2100 Bioanalyzer. Validated indexed libraries were pooled in order to obtain an equimolar mixture. The run was achieved on MiSeq sequencer (Illumina) using the MiSeq Reagent Kit v3 600 cycles (Illumina). It allowed an output of 25 million of paired-end reads of 300 bases, i.e. up to 15 Gigabases. The libraries and the MiSeq run were performed by libragen, at the GeT- PlaGe platform (INRA, Auzeville, France). After MiSeq run, raw data sequences were demultiplexed and quality-checked to remove all reads with ambiguous bases. Indexes and primers sequences were then trimmed, and the forward and reverse sequences were paired. The paired-sequences were then treated using Qiime pipeline [11] to remove chimeras and reads with PCR errors. Good quality pairedsequences were mapped to the RDP database (Release 11, update 3; for taxonomic assignation. Assigned sequences were finally split into Operational Taxonomic Unit (OTU) at a 3% dissimilarity level. Roche technology. Sequence analysis was achieved using the pyrosequencing method of DNA sequencing. It was performed by Beckman Coulter Genomics Company (Danvers, MA, USA) through the 454 GS FLX platform technology, targeting two 16S variable regions: the V1V3 region (16S-0027F forward primer 5 -AGAGTTTGATCCTGGCTCAG-3 and 16S- 0533R reverse primer 5 -TTACCGCGGCTGCTGGCAC-3 ) producing about 520 bp

4 amplicons and the V4V6 region (16S-0515F forward primer 5 -TGYCAGCMGCCGCGGTA-3 and 16S-1061R reverse primer 5 -TCACGRCACGAGCTGACG-3 ) producing about 560bp amplicons. To enable the simultaneous analysis of multiple samples (multiplexing), GS FLX Standard Multiplex Identifiers (MID, Roche Life Sciences, Indianapolis, IN, USA) were tailed to each end of the primers. The PCRs were carried out as follows: 3 ng of each template DNA were mixed with 2.5 µl of each reverse and forward primers (10 µm), 1.25 unit of Promega Taq polymerase (Promega Corporation, Madison USA) and 7.9 µl of distilled water (DH2O). PCR cycles consisted of 95 C for 3 min and then 25 cycles of 95 C for 1 min, 58 C for 30 s, and 72 C for 1 min, followed by a final extension at 72 C for 5 min, with a MJ Research PTC200 thermocycler (Ramsey, MN, USA). Negative controls were included in all steps to check for contamination. Bioinformatic treatment of raw data sequences including the demultiplexing of sequenced samples, the assembling of forward and reverse sequences (clustering step), the Blastn analysis (i.e. searching nucleotide databases using nucleotides queries[12]) of both clustered and singletons reads against a curated copy of the 16S RDP database [13], and the taxonomic assignments through comparison of taxonomy and scores of the 25 best hits for each blasted sequence (MEGAN4) [14], was implemented by Beckman Coulter Genomics Company. Whole (meta)genome Sequencing (WGS) 25 ng of metagenomic DNA were used for the library preparation. Metagenomic DNA was fragmented in a Covaris M220 instrument (Woburn, MA, USA) to an average size of approximatively 250 bp, according to the supplier suggested protocol. Fragmented DNA was used to synthesize indexed sequencing libraries using the TruSeq Nano DNA Sample Prep Kit (Illumina, Inc., San Diego, CA, USA) according to the manufacturer recommended protocol. Cluster generation was performed on the cbot instrument using the TruSeq PE Cluster Kit v3 reagents (Illumina). Libraries were sequenced with an Illumina HiSeq 2000 using the TruSeq SBS Kit v3 reagents (Illumina) for paired end sequencing with reads lengths of 150 base pairs (300 cycles). High throughput sequencing reads were quality filtered using the fastq_quality_filter program provided with the FASTX-Toolkit. Only the reads with a quality score higher than 17 for at least 80% of the read length (i.e., probability of correct base call close to 98%) were conserved. Gene catalogs for each sample were created using the MOCAT pipeline [14]. Briefly, the pipeline performs quality control of the raw reads, removes human contamination by mapping to the reference human genome, assembles the reads and predicts protein-coding genes on the assembled overlapping reads (contigs) and scaftigs (contigs that were extended and linked using the paired-end information of sequencing reads). Predicted proteins were compared to the non-redundant NCBI RefSeq database using BLAST [4]. Taxonomic analysis was based on the NCBI taxonomy; functional analysis was performed by MEGAN4 using the SEED classification [13,14-16]. Taxonomic analysis is performed by placing each sequence read onto a node of the NCBI taxonomy, based on gene content. For each read that matches the sequence of some gene, the program places the read on to the lowest common ancestor (LCA) node of those taxa in the taxonomy that are known to have that gene. This is called the LCA algorithm.

5 Results Impact of the DNA extraction methods DNA extraction was carried out as follows: human gut microbiota study: in triplicate from a unique frozen stool using four commercial kits; skin microbiota study: in duplicate from a unique sample using an in house procedure and a commercial kit. Sequencing targeted the 16S rrna gene V3V4 region (Illumina technology). Gut microbiota The first step of the experimental plan was to confirm that the selected kits for DNA extraction gave a stool microbial profile at the phylum level globally consistent with the actual knowledge, i.e. showing a majority of Firmicutes and Bacteroidetes [17]. The second step was to assess the reproducibility of the different methods for DNA extraction, considering that a good reproducibility between triplicates would also deliver a good reproducibility of the sequencing process (reproducibility will indeed be evaluated on the basis of sequencing results). Figure 1. Relative abundance (%) of the most abundant phyla when considering 4 commercial kits. DNA extraction kit suppliers: Stratec Molecular GmbH (STRATEC), MP Biomedicals (MP), MO BIO Laboratories (MOBIO) and Qiagen (QIAGEN). Figure 1 reports the relative abundances of the most abundant phyla obtained with the four selected DNA extraction kits. All the obtained phyla profiles are coinciding with the present knowledge about the stool microbial composition; nevertheless, some variations regarding the used DNA extraction kit can be observed. Indeed, the Verrucomicrobia phylum is only detected using the kits supplied by MP Biomedicals and Stratec Molecular GmbH. Relative abundances are respectively 1.2% and 0.1%. The heat map representation (Figure 2) shows that the triplicates of a given method are gathered: the Pearson Product Moments Correlation values are between 0.91 and 1; such a range of values indicates that results are overlapping. As a consequence, it will be possible to work with the triplicates average counts for the next analyses. There are some differences between the extraction methods since the genetic information obtained in each case can t be superposed with that from the others.

6 Figure 2. Observed distances, based on the normalized genus counts, between samples from a unique frozen stool treated with four different DNA extraction kits (triplicates). Dark blue squares highlight close samples; white squares indicate strong differences between samples. The notations -1, -2 and -3 mark the DNA extraction triplicates. DNA extraction kits suppliers: Qiagen (Qia), Stratec Molecular GmbH (Str), MO BIO Laboratories (Mo), MP Biomedicals (MP). Figure 3 reports the (r) values between the different methods, based on the normalized genus counts. The methods involving the kit supplied by MO BIO Laboratories and by MP Biochemicals gave a (r) value of The other (r) values are lower than 0.9 and linear correlations between samples can t be considered. Figure 3. Correlation studies of the normalized genus counts between the different extraction methods. Plots illustrate the linear relation between each condition, and the value of the correlation coefficients (r) highlights the strength of this relation. DNA extraction kits suppliers: Qiagen, Stratec Molecular GmbH (Stratec), MO BIO Laboratories (MoBio), MP Biomedicals (MP).

7 Fold Change The Qiagen QIAmp Fast DNA Stool Kit was chosen as reference to evaluate the efficiency of the other kits (Figure 4). Strictly the same genera were detected for all the 4 selected extraction methods. However, differences were observed in the relative occurrence of each genus. For example, using the extraction kits supplied by MO BIO Laboratories and MP Biomedicals, the Bacteroides genus appeared significantly less abundant than when using the kits supplied by Qiagen and Stratec Molecular GmbH. 4 2 (Mo)/(Qia) (MP)/(Qia) (Stra)/(Qia) * * * * * * * * * * * * * -6 * * Figure 4. Each DNA extraction kit is compared to the Qiagen QIAmp Fast DNA Stool kit. A Fold Change of +2 implies a doubling of the normalized read counts for the compared kit in regard to the Qiagen kit. A Fold Change of -2 implies a 2 fold reduction. Significant differences are marked with a star *. DNA extraction kit suppliers: Qiagen (Qia), Stratec Molecular GmbH (Stra), MO BIO Laboratories (Mo), MP Biomedicals (MP). Skin microbiota DNA was extracted from a human skin microbiota sample using either an in house method or a commercial kit. Sequencing using the Illumina technology was performed by targeting the V3V4 region of the 16S rrna gene. Results reported in Figure 5 show the correlations obtained when considering two libraries obtained with the in house method (Lib-1 and Lib-2) and two libraries obtained using the PowerLyzer PowerSoil DNA Isolation kit from MO BIO Laboratories (MoBio-1 and MoBio-2). The extractions made in duplicate overlap (r > 0.99) when considering the results at the genus level.

8 Figure 5. Correlation studies of the normalized genus counts between the different extraction methods. Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight the strength of this relation. Despite the strong correlation between the observed profiles, the relative proportions obtained using one method are different from those obtained using the other method (Figure 6). In comparison with the profiles obtained with the in house method, the kit provided by MO BIO Laboratories seems to enhance the proportion of Propionibacterium and to reduce the Corynebacterium proportion. Figure 6. Relative abundances (%) of the most abundant genera. Conditions: DNA from a human skin microbiota sample; DNA extraction using either an in house method (LIB), or a MO BIO Laboratories commercial kit (MOBIO). Impact of samples storage conditions Four different storage conditions of a stool sample were compared in order to evaluate the impact of the storage on the observed microbiota composition. The microbiota profile of the fresh stool was regarded as the reference. DNA extraction kit provided by Qiagen was used.

9 Sequencing using the Illumina technology was performed by targeting the V3V4 region of the 16S rrna gene. When considering the bacterial community composition at the genus level, (r) values higher than 0.96 were obtained; all the profiles were thus well correlated (Figure 7). Figure 7. Correlation studies of the normalized genus counts between the following storage conditions: fresh, frozen, treated with Omnigene Gut kit (Omnigut_Stab) and PSP Spin stool DNA plus kit (Stratec_Stab). Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight the strength of this relation. In accordance with the higher than 0.96 (r) values, the distribution of the 15 most abundant genera is slightly dispersed (Figure 8). It can thus be stated that the storage condition has only a quiet impact on the microbial composition. Figure 8. Box plot representation of the normalized reads counts for the most abundant genera among the 4 storage conditions.

10 Impact of the 16S rrna gene targeted region and the primers set selection The impact of targeting different regions of 16s rrna gene, or using different primers sets, on the observed composition of a given microbial community was studied as follows: Gut microbiota: DNA was extracted from a frozen stool with the kit provided by Qiagen; Skin Microbiota study: DNA was extracted using the in house method. Sequencing was carried out using the Illumina technology. Gut microbiota Considering the relative abundance of the most abundant genera (Figure 9), the same genera are observed for the four primers sets, but with different levels of relative abundance. It must be mentioned that the most abundant genera are representative of the human stool microbiota of healthy human adult. As far as the impact of the primers sets used is concerned, V1V2 and V3V4 regions gave very similar diversity patterns at the genus level. When using the V4V5a primers set, the Akkermansia genus was very highly represented. The V4V5b primers set favors the representation of the Rhodobacter genus and limits that of the Ruminococcus genus. The apparent proximity between the V1V2 and V3V4 profiles is clearly confirmed by the Pearson Product Moments Correlation value of reported in Figure 10. The (r) values between the V4V5b profile and either the V1V2 or V3V4 profiles are rather high, respectively and The (r) values between the V4V5a profile and the V3V4, V1V2 and V4V5b profiles are rather low, respectively , and Figure 9. Relative abundance (%) of the most abundant genera. DNA was extracted from a frozen stool. 16S rrna gene regions were targeted using 4 primers sets: V1V2, V3V4, V4Va and V4V5b. Amplicons were then sequenced using Illumina technology.

11 Figure 10. Correlation studies of the normalized genus counts between the different primers sets. Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight the strength of this relation. Skin microbiota Figure 11 shows the relative abundances of the most abundant genera obtained when using the selected four primers sets. Figure 11. Relative abundance (%) of the most abundant genera. Conditions: DNA extracted from a human skin microbiota sample using an in house method (LIB); targeted 16S rrna gene regions: V1V2, V3V4, V4V5a, V4V5b; sequencing using Illumina technology. The V3V4 and V4V5a primers sets gave very close profiles. In comparison with the V3V4, V4V5a and V4V5b profiles, the profile obtained with the V1V2 primers set shows a high proportion of the genus Propionibacterium and an extremely low representation of the genus Corynebacterium. V4V5b primers set allowed to obtain a profile closer to that obtained with the V3V4 and V4V5a than the one obtained with V1V2 primers sets.

12 Impact of PCR The possible bias introduced by the PCR amplification was assessed through the comparison of the profile produced from a selected sample of skin microbiota either with PCR of the V4V6 and V1V3 regions of the 16S gene (sequenced using Roche technology) or without PCR. In the latter case, sequencing was performed using the Whole (meta)genome Sequencing (WGS) approach. DNA extraction was done through the in house method. Figure 12 shows the different microbial profiles obtained using either amplicon-based sequencing (V1V3 and V4V6) or WGS approaches. Figure 12. Relative abundance of the most abundant orders, regarding the sequencing method. V1V3 and V4V6 make reference to the targeted regions of the 16S gene sequenced by pyrosequencing. The WGS approach produces a relative abundance of the two major orders (Bacillales and Actinomycetales) similar of that displayed using the V3V4 amplicon-based approach. Additional information is specifically provided by the WGS approach, concerning the relative abundance of fungal organisms (Malasseziales and Ustilaginales, see also Table 1). Using the V1V3 primers set reveals a highly unbalanced microbiota with the two major orders Bacillales and Actinomycetales. In return, using the V4V6 primers set, a more balanced bacterial diversity is displayed. The orders reported in Table 1 show that some bacterial orders (e.g. Burkholderiales and Pasteurellales) need a PCR step to be detected. These low abundance orders would require a huge effort of sequencing to be detected using WGS approaches.

13 ORDERS KINGDOM WGS V4V6 V1V3 BACILLALES Bacteria 42,0 31,8 8,3 ACTINOMYCETALES Bacteria 30,2 28,4 83,4 MALASSEZIALES Fungi 13,8 - - USTILAGINALES Fungi 7,5 - - TREMELLALES Fungi 1,0 - - SCHIZOSACCHAROMYCETALES Fungi 0,8 - - SACCHAROMYCETALES Fungi 0,7 - - EUROTIALES Fungi 0,3 - - LACTOBACILLALES Bacteria 0,3 3,2 0,7 CLOSTRIDIALES Bacteria 0,3 8,2 1,9 ANAEROLINEALES Bacteria 0,0 1,6 0,2 BURKHOLDERIALES Bacteria 0,0 7,5 0,6 NEISSERIALES Bacteria 0,0 1,5 0,2 PASTEURELLALES Bacteria 0,0 3,2 1,0 PSEUDOMONADALES Bacteria 0,0 0,8 0,2 SPHINGOMONADALES Bacteria 0,0 0,8 0,0 SPIROCHAETALES Bacteria 0,0 1,0 0,2 Table 1. Relative abundance (%) of the most abundant orders, regarding the sequencing method. V1V3 and V4V6 refer to the targeted regions of the pyrosequenced 16S rrna gene. Discussion Samples storage and DNA extraction reproducibility Sampling, samples storage before their processing and DNA extraction are critical steps, which can greatly affect the produced results. Processing fresh microbial samples can only be considered as a theoretical scenario, since microbial sampling and DNA processing are most often carried out at different places. Immediate sample freezing is a widely used approach, from which stabilization of microbial communities is expected. From a practical point of view, the requirement of a freezer or dry ice for sample freezing may be an issue. The stabilization of samples mediated by homogenization in stabilizing solutions, is a more recent commercially available strategy claimed to allow a good an even better maintenance of the microbial community integrity. Though in the processing sequence, DNA extraction comes after material sampling and sample storage, the focus was first on DNA extraction methods since a sine qua non condition to continue the investigations was that the obtained results were in global accordance with the actually accepted gut or skin microbiota. In both cases, the most abundant phyla (stool) and genera (stool and skin) obtained with the selected DNA extraction kits were those usually reported in the literature. The reliability and robustness of DNA extraction methods was investigated considering the gut microbiota. The Pearson correlation values ranged, for each set of triplicate, from 0.91 to 1.0: it was then concluded that, for each extraction kit, results were overlapping. Nevertheless, when looking at more precise taxonomic information, in particular as far as less abundant phyla or orders are concerned, some differences could be detected from one to other DNA extraction kit. The

14 less performant DNA extraction kit in term of triplicate results closeness was also the one for which the DNA yield was the lowest. For the impact of storage conditions on the microbial community profile, it was shown that the three options have a quite low impact when compared to the fresh stool results. Impact of methodological choices on revealed diversity With the objective of describing the microbiome diversity, the method for DNA extraction has first to be chosen. For human skin microbiota, both tested methods gave the same results in term of microbial profile. According to our experience, technical expertise and habits would be relevant for the choice. For human gut microbiome, even if the methods gave very close global results, significantly different results were obtained when considering less abundant phyla or orders. Depending on the context (existing anterior studies, data about the presence of certain phyla, classes or genera, looking for most abundant organisms or less abundant ones), several DNA extraction methods will be considered. The second choice is about the requirement or not of PCR, this requirement being linked to the sequencing technology. The strategy to choose is case dependent since PCR is not required in the WGS approach and thus no amplification biases will alter the population composition in one hand, and in the other hand, WGS only allows access to the most dominant taxa, giving as a consequence an incomplete representation of the microbiome. On the contrary, PCR-based approach enables to zoom on some specific taxa, which could be non-dominant. Then, two possible objectives can be considered: to formally characterize a given microbiota: a WGS approach with strong allocation of sequencing depth should be preferred; to evaluate the dynamic modification of a given microbial community over time or after applying a treatment: the PCR-based approach should be preferred. Finally, if the PCR-based approach is selected, the most appropriate primers sets for the study must be defined. Since it has just been shown that some primers sets promote the relative abundance of specific genus, like for the DNA extraction methods, the most appropriate primers set for the study must be chosen. Conclusion The objectives of this work were to have detailed and reliable information able to properly establish experimental plans dedicated to the characterization of microbiomes. Supported by two different microbiomes and consequently different microorganisms populations (phyla to orders), it clearly shows that several technical solutions exist for DNA extraction. The evaluated techniques are robust but not strictly identical in the populations identified, in terms of quantity and quality. In addition, several storage techniques exist that preserve the integrity of the microbial populations from sampling to DNA extraction and thus proximity between the collecting unit and the analysis team is not mandatory. Once highly purified DNA is obtained, methodological choices (in terms of sequencing method, bioinformatics, ) must be made that will severely impact the metagenomics analyses final results. Some knowledge of the environment of interest may direct these choices (provided that the available tools are sufficiently characterized to be able to discriminate between them). In the case of completely unknown environments, the less biased solution would be selected.

15 It is important to note that neither targeted sequencing nor WGS approaches are absolute quantitative methods. They allow to compare environments from a space-time perspective, and, when quantitative data are required, to formulate assumptions that have to be confirmed with specialized techniques, for example RT-PCR. Considering the taxonomic analyses, the present weakness of assignation is due to the short length of reads; indeed, usually obtained 300 bp reads do not allow a good resolution, and assignation beyond the genus level is very hazardous. WGS enable to inform about the species, provided that literature is sufficiently rich. It also allows to inform about potential functions, with the risk of missing the lowest abundant species. In order to circumvent the short reads issues, the long reads Pacific Biosciences technology ( is a very promising technique we are investigating to sequence full length amplified 16S rrna genes. Finally, the take home message is that metagenomics analyses require appropriate experimental conditions from sampling to data processing and the libragen 15 years experience in the field is a clear advantage. Bibliographic references 1. Handelsman, J. et al. (1998) Chem. Biol., 5(10): Tringe, S.G. and Rubin E.M. (2005) Nat. Rev. Genet. 6(11): Surano, N.K. and Kasper D.L. (2014) J. Clin. Invest., 124(10): Manichanh, C. et al. (2006) Gut 55(2): Lefevre, F. et al. (2007) Biocat. Biotrans., 25(2-4): Lefevre, F. et al. (2008) Res. Microbiol., 159(3): Cardona, S. et al. (2012) BMC Micribiol., 12: Guo, F. et al. (2013) PLoS one 8(10): e Starke, I.C. et al. (2014) Mol. Biol. Int., 2014: Zhou, Q. et al. (2014) Sci. Rep., 4: e Caporaso J.G. et al. (2010) Nat. Methods, 7(5): Altschul, S.F. et al. (1997) Nucleic Acids Res., 25(17): Huson, D.H. et al. (2011) Genome Res., 21(9): Overbeek, R. et al. (2005) Nucleic Acids Res., 33(17): Kultima, J.R. et al. (2012) PLoS One, 7(10): e Huson, D.H. et al. (2007) Genome Res., 17(3): Durban, A. et al. (2011) Microb. Ecol., 61(1):