Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5

Size: px
Start display at page:

Download "Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5"

Transcription

1 QIIME Analysis 1 Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5 Report Overview... 5 How to Obtain Microbiome Data... 6 How to Setup QIIME... 7 Essential files for QIIME... 7 Sequence File (.fna)... 8 Quality File (.qual)... 8 Mapping File... 9 Basic Statistics on Sequence Data Otu Picking Basic Statistics on OTU Table OTU Heatmap Data Analysis Summarize Communities by Taxonomic Composition Investigating Alpha Diversity Identifying Differentially Abundant OTUs Normalizing OTU Table Beta-diversity and PCoA Jackknifed Beta Diversity Analysis... 26

2 QIIME Analysis 2 Make Bootstrapped Tree Comparing Categories Conclusion REFERENCES... 32

3 QIIME Analysis 3 Tables and Figures Figure 1. FastaQ File Format... 8 Figure 2. Mothur output for sequence summary Figure 3. Summary for biom file Figure 4. rep_set_tax_assignments.txt Figure 5. Heatmap for HMP data Figure 6. Pie plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room Figure 7. Area plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room Figure 8. Bar plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room Figure 9. Microbial composition of the microbial taxa in 14 collected samples Figure 10. Rarefraction Plot for date_s Figure 11. Rarefraction plot for sample_type_s Figure 12. Diff_otus.txt for Computer Mouse and Countertop Figure 13. MA plot for differential abundance of Computer Mouse and Countertop... 22

4 QIIME Analysis 4 Figure 14. Dispersion Estimate Plot for Differential Abundance of Computer Mouse and Countertop Figure 15. MA plot for Computer Mouse Samples Figure 16. Dispersion Estimate Plot for Computer Mouse Samples Figure 17. PCoA plot for the bacterial community collected in the hospital room. Community were characterized by samples collected in February and April. Bray-Curtis is used as distance metric Figure 18.PCoA plot for the bacterial community collected in the hospital room Figure 19. 3D PCoA Plots for HMP samples Figure 20. Distance Boxplot for Surface type Figure 21. Distance Comparison among surface types Figure 22. Jackknifed UPGMA clustering (using the weighted UniFrac metric) showing the similarity of bacterial communities based on 16S rrna genes

5 QIIME Analysis 5 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME Report Overview The rapid progress of that DNA sequencing techniques has changed the way of metagenomics research and data analysis techniques over the past few years. Sequencing of 16S rrna gene has become a relatively easy way to study microbial composition and diversity (Fierer et al., 2007). High-throughput bioinformatics analyses increasingly rely on pipeline frameworks to process sequence and metadata. Popular bioinformatics pipelines in the literature are QIIME, Mother and Uparse. In this study, QIIME (Quantitative Insights Into Microbial Ecology) (Caporaso et al., 2010), which is an open-source bioinformatics pipeline, is planned to use for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to create quality graphics and statistics from raw sequencing data generated on the Illumina or other platforms. Typical QIIME analysis workflow is consisted of demultiplexing, quality filtering, clustering (OTU detection), chimera removal, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations. This document is organized as an introduction tutorial on how to analyze 16S sequencing data using current methods. During microbiome analysis, there are basic questions about microbiome data. The following questions were covered in this tutorial document: 1. Proportionally, what microbes are found in each sample community? 2. How many species are in each sample? 3. Are there species significantly more abundant in one set of samples than in another? 4. How much does diversity change between samples? 5. Do different sample groupings significantly differ in their microbial composition?

6 QIIME Analysis 6 This documents is structured as answer for these questions concerned so that each section is primarily concerned with how to find the answer to a particular question about the microbiome data. How to Obtain Microbiome Data The Sequence Read Archive (SRA) is a bioinformatics database that provides a public repository for DNA sequencing data obtained from next generation sequence (NGS) technology. Raw sequence data and metadata could be searched as well as downloaded for further downstream analysis. Biotechnology companies such as 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics, provide a line of products and services on sequencing, genotyping and gene expression. Illumina is one of the successful company that their technology reduced the cost of sequencing a human genome reasonable prices. Since Illumina will be used for our data sequencing purposes eventually in the project, 16s rrna data obtained Illumina system was searched from SRA database and Hospital Microbiome Project data obtained from the database. Every experiment in SRA database has an accession codes and metadata such as study abstract, experiment attributes and owner of the data. Raw sequence data related that experiment can be downloaded in fasta and fastaq format using accession codes. Hospital Microbiome Project (HMP) (Shogan et al., 2013) aims to collect microbial samples from surfaces, air, staff, and patients from the University of Chicago's new hospital pavilion, involving 10 patient rooms, 2 nursing stations, staff, water and air sampling, both daily and weekly during a year in order to better understand the factors that influence bacterial population development in health care environments.

7 QIIME Analysis 7 As a preliminary exploration, a small data set from HMP was analyzed. Data collected from seven different point (countertop, computer mouse, station phone, chair armrest, corridor floor, hot tap water faucet and cold tap water faucet) in the same room (S10) at two different time point (27/02/2013 and 17/04/2013) was used. How to Setup QIIME QIIME is a software package of python wrapper scripts and it can be downloaded and used on Linux system. It can also be used on Virtual Box with Windows operation system. I used QIIME version on Virtual box in Windows OS. Essential files for QIIME QIIME works with FASTAQ file format. A FASTQ file uses four lines per sequence. A typical sequence file in FASTAQ format as described below: Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description. Line 2 is the raw sequence letters. Line 3 begins with a '+' character and is optionally followed by the same sequence identifier again as line 1. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. FASTAQ format has sequence data as well as its quality data. QIIME has convert_fastaqual_fastq.py script in order to convert FASTQ data file as a qual file with for quality scores and fna file for sequence data.

8 QIIME Analysis 8 ( convert_fastaqual_fastq.py -f seqs.fastq -c fastq_to_fastaqual ) Figure 1. FastaQ File Format Sequence File (.fna) Sequence file shows the raw sequence data for each sequence. A typical sequence file in fna format as described below: Line 1 begins with a '>' character and is followed by an Accession Run Code. Line 2 is the raw sequence letters. Quality File (.qual) Quality file shows the quality scores for each sequence. A typical sequence file in qual format as described below: Line 1 begins with a '>' character and is followed by a Accession Run Code. Line 2 is the quality scores.

9 QIIME Analysis 9 Mapping File QIIME requires a metadata mapping file for most analysis. Mapping file is generated by user and contains all of the information, categorical or numeric, about the samples necessary to perform the data analysis. Excel or text file can be used to create mapping file. It should be tabdelimited. Mapping file is important because it links sample identifier with its metadata. In a typical mapping file, each line refers to a specific sample data. Line starts with a SampleID, the BarcodeSequence used for each sample, the LinkerPrimerSequence used to amplify the sample, and ends with a description column. First column should be SampleID and sampleid could have any alphanumeric characters and periods, cannot have underscores. SampleID should refer to the sequence headers used in FASTA files. Moreover, any metadata that relates to the samples and any additional information relating to specific samples that may be useful to have at hand when considering outliers. The last column must be Description. In some circumstances, users may need to generate a mapping file that does not contain barcodes and/or primers. To generate such a mapping file, fields for Barcode Sequence and LinkerPrimerSequence can be left empty. In order to check whether created mapping file is in the right format validate_mapping_file.py is implemented in QIIME. This script tests many problems in the mapping file and a _corrected.txt form of the mapping file is generated in output folder. If BarcodeSequence and LinkerPrimerSequence fields are empty, then barcode and primer testing need to be disabled with the -p and -b parameters. validate_mapping_file.py -m <mapping_filepath> -o <outputpath> -p b

10 QIIME Analysis 10 Basic Statistics on Sequence Data count_seqs.py -i <sequence_file.fna> script is implemented in QIIME to count sequences and calculate sequence length mean and standard deviation. Our file had total sequence, 151 sequence length mean and 0 standard deviation. Mothur gives more detailed statistics such as min, max, median and quartiles. Running summary.seqs(fasta=<sequence_file.fna>) command, the following screen is displayed and summary output file created. Figure 2. Mothur output for sequence summary Otu Picking Picking OTUs is called "clustering" as sequences with some threshold of identity are clustered together to into an OTU. There are three different methods for OTU picking: De novo Clustering Closed-reference Open-reference

11 QIIME Analysis 11 The answer to which method to choose is depend on what is known about the microbiome community priori. If the studied microbial community is well studied, then 16S databases has many representatives and closed reference otu picking strategy is suitable. De novo method is suitable to discover new species. Open reference method is combined of two methods, closed and de novo method, and is highly suggested method by QIIME developers. First it clusters sequences against a database of 16S references sequences called greengenes, then uses de novo clustering on those sequences which are not similar to the reference sequences. Table 1. Which OTU picking strategies in which study? OTU Picking Strategies Closed reference pick_closed_reference_otus.py De novo pick_de_novo_otus.py Open reference pick_open_reference_otus.py In Which Study? Human,mouse, gut, skin, oral microbiome Environmental, soil, water etc. hazy microbiome Any microbiome studies. QIIME developers suggests this method. compared. In the following table, advantages and disadvantages of OTU picking strategies are Table 2. Advantages and Disadvantages of OTU Picking Strategies OTU Picking Strgs. Advantages Disadvantages Closed reference De novo Open reference Fast and parallelizable. Suitable for big datasets. Since it uses reference databases, creates qualified taxonomies and trees. Clusters all sequences. Clusters all sequences. Some part of the work is being parallelized. Faster Not possible to find new species. Parallelizable is not enabled so slow for big datasets. Not parallelizable part of the work is slow. It might take very long

12 QIIME Analysis 12 OTU Picking Strgs. Advantages Disadvantages than De novo. time in the case of finding new species except in the reference databases. Open reference Otu picking strategy was used for our HMP data analysis and QIIME has pick_open_reference_otus.py script. This script walks through many substeps in a single step: it has (1) picked OTUs, (2) generated a representative sequence for each OTU, (3) assigned known taxonomy to those OTUs, (4) created a phylogenetic tree, and (5) created an OTU table. >pick_open_reference_otus.py -i <sequence_file.fna> r <97_otus.fasta > -o <outputpath > -s 0.1 -m <clustering algorithm> -p <parameter_file> 97_otus.fasta is the reference OTU file from Greengenes. Greengenes is the database of reference 16S sequences that is used to assign taxonomy. 97_otus.fasta file is created by clustering all the sequences in the Greengenes database into 97% identity clusters. A representative sequence is chosen from each of those clusters to be used to create the 97_tree and 97_taxonomy. Sequences in our data are compared by representative sequences in 97_otus.fasta and the most similar sequence s taxonomy is assigned to our sequence. Default clustering algorithm is UCLUST for pick_open_reference_otus.py script. But usearch is widely used for OTU picking, Usearch was used as clustering algorithm for our data. Parameter file was created by user with pick_otus:enable_rev_strand_match True line. This line is needed if most or all of the sequences are failing to hit the reference during the prefiltering or closed-reference OTU picking steps, sequences may be in the reverse orientation

13 QIIME Analysis 13 with respect to the reference database. This line addresses this problem, however it doubles the amount of memory used in the workflow. An index.html file was created and it is a navigation page and has an informative table about output files. The important outputs of the script are the following four files: rep_set.tre: The phylogenetic tree describing the relationship of all of our sequences rep_set.fna: The list of representative sequences for each Otu. otu_table_mc2_w_tax.biom: The final OTU results, including taxonomic assignments and per-sample abundances, stored in a biom file. Mc2 refers to minimum size 2 that means each OTU requires at least 2 sequences. This is the file mostly used for deeper analysis. final_otu_map_mc2.txt: the listing of which reads were clustered into which OTU. Basic Statistics on OTU Table biom summarize-table -i <biom_file> -o <outputpath> script is implemented in QIIME to create a summarization for otu table. Figure shows the summary file for biom file OUT was picked. If the representative sequence file rep_set.fna is counted, the same number of sequences should be displayed. assign_taxonomy.py -i <rep_set.fna> -o <taxonomyresults_outputpath> script is used to assign taxonomy for each OTU representative sequence. It creates rep_set_tax_assignments.txt file that contains an entry for each representative sequence, listing taxonomy to the greatest depth allowed by the confidence threshold (80% by default, can be

14 QIIME Analysis 14 changed with the -c option), and a column of confidence values for the deepest level of taxonomy shown. Figure 3. Summary for biom file Figure 4. rep_set_tax_assignments.txt OTU Heatmap make_otu_heatmap.py -i <biom file > -o <heatmap.pdf> script creates a pdf file with a visualization of OTU table. Each row corresponds to an OTU and each column corresponds to a sample. The higher the relative abundance of an OTU in a sample, the more intense the color at the corresponding position in the heatmap.

15 QIIME Analysis 15 Figure 5. Heatmap for HMP data Data Analysis Summarize Communities by Taxonomic Composition Looking at the relative abundances of taxa per sample in the OTU table, we could understand what microbes are found in each sample community. Question: Proportionally, what microbes are found in each sample community? Scripts: summarize_taxa.py and plot_taxa_summary.py Output: Visualized plots showing relative abundance data per samples summarize_taxa.py -i <biom file> -o <taxasummary_outputpath> script is used to generate text files with relative abundance data per samples to obtain a basic overview of the members of the community for all taxonomic ranks. The level specified at specific taxonomic ranks can be

16 QIIME Analysis 16 specified by -L parameters for the script (1 for kingdom, 2 for phylum, 3 for class, 4 for order, 5 for family, 6 for genus, 7 for species). Output text files can be passed to plot_taxa_summary.py script to create visualized plots a following command: plot_taxa_summary.py -i <taxasummary_outputpath/otu_table_w_tax.txt> -l <taxonomic rank> -c pie,bar,area -o < taxscharts_outputpath> The following pie plot show the total relative abundance for all data. Figure 6. Pie plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room. Following area and bar plot shows the relative abundance of taxa for each sample.

17 QIIME Analysis 17 Figure 7. Area plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room. Figure 8. Bar plot of the degree of sharing of microbial taxa in 14 collected samples from 7 different point with four months interval in a hospital room. The following table shows the microbial composition for each sample at two different time point at phylum level. From the plots, it looks like there is higher taxa change on computer mouse, counter top and tab faucet handles between two time points. On the other hand, those samples show similar taxa proportion in the same time point. This might be because the person who used

18 QIIME Analysis 18 those locations is the same person and in second time points, the person using those locations had been changed and it had modified the microbial abundance of taxa of samples in second time point. Corr.Floor February Comp. Mouse February Countertop February Station Phone February Chair Armr. February Cold Tap W.F.H. February Hot Tap W.F.H. February Corr. Floor April Comp. Mouse April Countertop April Station Phone April Chair Armr. April Cold Tap W.F.H. April Hot Tap W.F.H. April Figure 9. Microbial composition of the microbial taxa in 14 collected samples Investigating Alpha Diversity Diversity of species in a single sample or environment is described by alpha diversity. Question: How many species are in each sample? Script: alpha_rarefaction.py -i <biom file > -o < alphadiversity_outputpath> -p < parameters.txt > -m < mapping file > Output: Rarefaction plots. This script is performed several steps: (1) generate rarefied OTU tables; (2) compute alpha diversity metrics for each rarefied OTU table; (3) collate alpha diversity results; and (4) generate alpha rarefaction plots. Alpha diversity increases with sequencing depth and rarefaction plots are useful to compare alpha diversity between two or more samples which may have unequal sequence depth. This plot uses alpha diversity value versus number of included

19 QIIME Analysis 19 sequences. To build rarefaction curves, each community is randomly subsampled without replacement at different intervals, and the average number of OTUs at each interval is plotted against the size of the subsample. As parameter file, alpha diversity metric is listed in a text file. Observed_species, shannon, chao1 metrics are commonly used alpha diversity metrics. Observed_species is the number of OTUs identifier per sample. Shannon diversity is a measure of entropy and chao1 is a measure which predicts OUT richness at high depth of sequencing. echo 'alpha_diversity:metrics observed_species,shannon,chao1' > parameters.txt command creates a parameter.txt file. After running the script on our data, a html page with rarefraction plots were created. Figure 10. Rarefraction Plot for date_s

20 QIIME Analysis 20 Figure 11. Rarefraction plot for sample_type_s Identifying Differentially Abundant OTUs Question: Are there species significantly more abundant in one set of samples than in another? Which microbes are significantly different between two sample groupings? Do specific groups of samples differ in their microbial composition? Script: differential_abundance.py -i < biom file > -o <output.txt> -m <mapping file> -a DESeq2_nbinom c <mapping category> -x < subcategory 1> -y <subcategory 2> -d plot. Output: text file with a list of differentially observed OTUs and their statistics and a MA OTU differential abundance testing is used to identify OTUs that differ between two mapping file sample categories denoted by x and y in the script. Differentially abundant OTUs identification method is denoted by a. DESeq2_nbinom and metagenomeseq_fitzig are differential abundance algorithm can be used in QIIME (Paulson, Stine, Bravo, & Pop, 2013). -d option creates a MA plot. The MA plot allows to look at the relationship between intensity and difference between two data stores. The x-axis represents the average quantitated

21 QIIME Analysis 21 value across the data stores, and the y axis shows the difference between them. It also creates a Dispersion Estimate plot that visualize the fitted dispersion vs. mean relationship. In order to see if there are any OTUs which are significantly more abundant in the countertop environment samples than in the computer mouse environment samples, countertop was passed as y option and computer mouse was passed as x option. Checking the output text file, the members of Actinobacteria are significantly more abundant in the countertop samples. Figure 12. Diff_otus.txt for Computer Mouse and Countertop

22 QIIME Analysis 22 Figure 13. MA plot for differential abundance of Computer Mouse and Countertop Figure 14. Dispersion Estimate Plot for differential abundance of Computer Mouse and Countertop Checking the microbial abundance of taxa of computer mouse samples taken in february and april, it was seen visually different taxonomy fromthe pie charts. To do an experiment, differential abundance script was run on those samples and Figure 15 and 16 shows the MA plot and dispersion estimate plots.

23 QIIME Analysis 23 Figure 15. MA plot for Computer Mouse Samples. Figure 16. Dispersion Estimate Plot for Computer Mouse Samples Normalizing OTU Table When analyzing microbial data, uneven sequencing depth could lead biased results. Having different number of sequences for each sample will cause inaccurate results in beta diversity analyses. Question: How to prevent bias as result of uneven sequencing depth? Script: normalize_table.py -i <biom file> -a CSS -o <normalized biom file> Output: Biom table with normalized counts. This table is used as input biom file for beta diversity script. -a option determines the normalization algorithm to apply to input bio table. Default algorithm is CSS. CSS is stand for cumulative sum scaling normalization which is an adaptive extension of the quantile normalization approach that is better suited for marker gene survey data whereby

24 QIIME Analysis 24 raw counts are divided by the cumulative sum of counts up to a percentile determined using a data-driven approach (Paulson, J.N., Stine, O.C., Corrada Bravo, H., Pop, 2013). DESeq2 is another normalization algorithm option. DESeq2 outputs negative values for lower abundant OTUs as a result of its log transformation and throws away low depth samples (e.g. less that 1000 sequences/sample). This presents a problem when using Bray Curtis and Unifrac metrics which are common metrics to calculate ecological distance. There is not a good solution yet, but CSS is currently recommanded normalization algorithm. Beta-diversity and PCoA It is important to analyze how different every sample is from all of the rest in microbiome research. On the other hand, another important information is whether any grouping of samples are more similar in composition than the average. Beta diversity is a metric of diversity that describes how different the species composition of different sample is. Question: How much does diversity change between samples? Script: beta_diversity.py, principal_coordinates.py, make_2d_plots.py Output: Distance matrix and visualized Principle Coordinate plots In order to measure the difference between two samples mathematical and phylogenetic metrics can be used. Two commonly used metrics in microbiome studies are Bray_Curtis and unweighted_unifrac. >beta_diversity.py -i <normalized biom file> -m <distance metric> -o <beta_div_output_path> -t <rep_set.tre>

25 QIIME Analysis 25 The output of the command is a distance matrix defines distance between every pair of samples. I used Bray-Curtis metric to calculate distance. This matrix can be visualized in a Principle Coordinate plot (PCoA). principal_coordinates.py -i <beta_div_output_path>/<metric_normalized_otu_table.txt > -o <beta_div_coords.txt> make_2d_plots.py -i <beta_div_coords.txt> -m <mapping file> The resulting PCoA plot is shown in the following charts. Figure 15 shows microbial community similarity change between two sample collection dates and it looks like overall community mostly changed in two timepoint. Figure 16 shows the microbial community similarity among sample types. It looks like computer mouse, countertop, stationary phone, armchair rest visualized together meaning that they have similar microbial community. Computer mouse - countertop samples collected in february but stationary phone - armchair rest samples collected in april. It can also be visually displayed in the pie charts that these samples have very similar charts. Pie charts shows very different composition for computer mouse and countertop samples in two different time point. It can also be viewed from the PcoA plots. For example, two purple circle stay far away between each other on the PC1-PC2 and PC1-PC3 plots in Figure 18. April February Figure 17. PCoA plot for the bacterial community collected in the Hospital Room. Community were characterized by samples collected in February and April. Bray-Curtis is used as distance metric.

26 QIIME Analysis 26 Cold T.W.F.H Hot T.W.F.H Comp. Mouse Countertop Station Phone Armchair Rest Corridor Floor Figure 18.PCoA plot for the bacterial community collected in the Hospital Room. Community were characterized by type of samples collected. Bray-Curtis is used as distance metric. Jackknifed Beta Diversity Analysis Question: How to compare samples to each other? Script: jackknifed_beta_diversity.py -i < biom file > -t <rep_set.tre> -m <mapping file > -o <Jackknife_Output folder> -e <rarefaction_depth>; Output: 3D PcoA plots with Emperor This script does the following steps: i. Compute a beta diversity distance matrix from the full data set ii. Perform multiple rarefactions at a single depth (-e option is to change the rarefaction depth) iii. Compute distance matrices for all the rarefied OTU tables iv. Build UPGMA trees for all the rarefactions v. Compare all the trees to get consensus and support values for branching vi. Perform principal coordinates analysis on all the rarefied distance matrices vii. Generate plots of the principal coordinates

27 QIIME Analysis 27 Emperor is an interactive next generation tool for analysis, visualization and interpretation of high throughput microbial ecology datasets (Vázquez-Baeza, Pirrung, Gonzalez, & Knight, 2013). After running script, three sub-folder for each distance metric and 3D PCoA plots are created. Unweighted_uniFrac /emperor_pcoa_plot folder has a html file has visualized 3D PCoA Plots as in Figure 12. Each point represents one of the samples and distances between samples were calculated using unweighted UniFrac. Samples stay close to each other means that those samples have communities with very similar overall phylogenetic trees. Figure 19. 3D PCoA Plots for HMP samples Jackknife analysis created a large collection of distance matrices to do statistics on. Question: How to analyze distance matrices? Script: dissimilarity_mtx_stats.py i < Jackknife_Output folder/unweighted_unifrac/rare_dm> - o <stat_output_folder> Output: Three files; means.txt, medians.txt, and stdevs.txt files for the mean, standard deviation and means of the distance between two samples are created.

28 QIIME Analysis 28 Question: Are the samples in an individual category closer to each other than they are to samples outside the category? Script: make_distance_boxplots.py m <mapping file> -o <BoxPlot_Outout_Folder> -d stat_output_folder/means.txt f <category> --save_raw_data Output: Boxplot Plot as a pdf file The first and second boxplots represent all within distances and all between distances, respectively in Figure 14. Figure 20. Distance Boxplot for Surface type Question: How to compare between samples grouped at different field states of a mapping file field? Script: make_distance_comparison_plots.py -m <mapping file> -d <unweighted_unifrac_otu_table.txt> -f <category from mapping file> -c <comparison_groups> -o <output_folder> -a <label_type> -t <plot_type>

29 QIIME Analysis 29 Output: Distance Comparison Plot Figure 14 shows the boxplots that allow for the comparison among surface types. Countertop, Corridor Floor and Station Phone were taken as comparison groups and those were compared with other surface types. Make Bootstrapped Tree Figure 21. Distance Comparison among surface types Question: How to make a bootstrapped tree? Script: make_bootstrapped_tree.py -m <Jackknife_Output folder/unweighted_unifrac/upgma_cmp/master_tree.tre> -s <Jackknife_Output folder /unweighted_unifrac/upgma_cmp/jackknife_support.txt> -o <Jackknife_Output folder /unweighted_unifrac/upgma_cmp/tree.pdf>

30 QIIME Analysis 30 St. Phone February Cold T. W. F. H. February Countertop April Cold. T. W. F. H. April Comp. Mouse April Corr. Floor April Corr. Floor February St. Phone April Ch. Armrest April Hot. T. W. F. H. April Countertop February Comp. Mouse February Ch. Armrest February Hot T.W. H. February Figure 22. Jackknifed UPGMA clustering (using the weighted UniFrac metric) showing the similarity of bacterial communities based on 16S rrna genes. Comparing Categories In HMP data, seven different points in a room were sampled: countertop, computer mouse, station phone, chair armrest, corridor floor, hot tap water faucet and cold tap water faucet. Visual graphs reveal how different a microbial composition of sample from other samples, but a statistical support is needed. To generate statistical support for hypotheses, adonis and anosim (analysis of similarity) statistical tests can be used. Adonis is a nonparametric statistical method that takes beta diversity distance matrices, a mapping file and a category in the mapping file to determine sample grouping from. It computes an R2 value (effect size) which shows the percentage of variation explained by the supplied mapping file category, as well as a p-value to determine the statistical significance. Anosim (Permanova) is a method that tests whether two or more groups of samples

31 QIIME Analysis 31 are significantly different. Anosim only work with categorical variable that is used to do the grouping. Question: Do the samples grouped by a parameter in the mapping file (i.e. sample type) are statistically significant? Script 1: compare_categories.py --method adonis -i <metric_normalized_otu_table.txt > -m <mapping file> -c <comparingcategory> <adonis_out_folder> Script 2: compare_categories.py --method anosim -i <metric_normalized_otu_table.txt > -m <mapping file> -c <comparingcategory> -o <anosim_out_folder> Output: p-value and R 2 value. p-value indicates the statistically significance of grouping of samples by the parameter. R 2 value indicates the percentage of variation in distances is explained by the grouping. Adonis and anosim statistical tests were applied for sample_type_s and date_s categories in HMP data. date_s and sample_type_s do not differ significantly from each other in terms of microbial composition (p = 0.2, p = 0.58). Conclusion As a preliminary exploration, a small data set from HMP was analyzed. Data collected from seven different point (countertop, computer mouse, station phone, chair armrest, corridor floor, hot tap water faucet and cold tap water faucet) in the same room (S10) at two different time point (27/02/2013 and 17/04/2014) was used. For each sample, how many and what kind of microbes are found, diversity change between samples and microbial composition comparison among sample groupings were investigated using QIIME pipeline. Moreover, significant

32 QIIME Analysis 32 abundance change among samples was investigated. Visualization and statistical tools were used to draw conclusions. REFERENCES Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), Fierer, N., Breitbart, M., Nulton, J., Salamon, P., Lozupone, C., Jones, R., Jackson, R. B. (2007). Metagenomic and small-subunit rrna analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Applied and Environmental Microbiology, 73(21), Paulson, J.N., Stine, O.C., Corrada Bravo, H., Pop, M. (2013). Robust methods for differential abundance analysis in marker gene surveys. Nature Methods, 10(12), Paulson, J. N., Stine, O. C., Bravo, H. C., & Pop, M. (2013). Differential abundance analysis for microbial marker-gene surveys. Nature Methods, 10(12), Shogan, B. D., Smith, D. P., Packman, A. I., Kelley, S. T., Landon, E. M., Bhangar, S., Gilbert, J. (2013). The Hospital Microbiome Project: Meeting report for the 2nd Hospital Microbiome Project, Chicago, USA, January 15(th), Standards in Genomic Sciences, 8(3),

33 QIIME Analysis 33 Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A., & Knight, R. (2013). EMPeror: a tool for visualizing high-throughput microbial community data. GigaScience, 2(1), 16.

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

CBC Data Therapy. Metagenomics Discussion

CBC Data Therapy. Metagenomics Discussion CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,

More information

Microbiomes and metabolomes

Microbiomes and metabolomes Microbiomes and metabolomes Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271

More information

Introduc)on to QIIME on the IPython Notebook

Introduc)on to QIIME on the IPython Notebook Strategies and Techniques for Analyzing Microbial Population Structures Introduc)on to QIIME on the IPython Notebook Rob Knight Adam Robbins- Pianka Will Van Treuren Yoshiki Vázquez- Baeza ( @yosmark )

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. MBQC base beta diversity, major protocol variables, and taxonomic profiles.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. MBQC base beta diversity, major protocol variables, and taxonomic profiles. Supplementary Figure 1 MBQC base beta diversity, major protocol variables, and taxonomic profiles. A) Multidimensional scaling of MBQC sample Bray-Curtis dissimilarities (see Fig. 1). Labels indicate centroids

More information

Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine

Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine Microbiomics I August 24th, 2017 Introduction Robert Kraaij, PhD Erasmus MC, Internal Medicine r.kraaij@erasmusmc.nl Welcome to Microbiomics I Infection & Immunity MSc students Only first day no practicals

More information

What is metagenomics?

What is metagenomics? Metagenomics What is metagenomics? Term first used in 1998 by Jo Handelsman "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments,

More information

Metagenomics Computational Genomics

Metagenomics Computational Genomics Metagenomics 02-710 Computational Genomics Metagenomics Investigation of the microbes that inhabit oceans, soils, and the human body, etc. with sequencing technologies Cooperative interactions between

More information

Introduction to Bioinformatics analysis of Metabarcoding data

Introduction to Bioinformatics analysis of Metabarcoding data Introduction to Bioinformatics analysis of Metabarcoding data Theoretical part Alvaro Sebastián Yagüe Experimental design Sampling Sample processing Sequencing Sequence processing Experimental design Sampling

More information

mothur Workshop for Amplicon Analysis Michigan State University, 2013

mothur Workshop for Amplicon Analysis Michigan State University, 2013 mothur Workshop for Amplicon Analysis Michigan State University, 2013 Tracy Teal MMG / ICER tkteal@msu.edu Kevin Theis Zoology / BEACON theiskev@msu.edu mothur Mission to develop a single piece of open-source,

More information

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2

More information

Conducting Microbiome study, a How to guide

Conducting Microbiome study, a How to guide Conducting Microbiome study, a How to guide Sam Zhu Supervisor: Professor Margaret IP Joint Graduate Seminar Department of Microbiology 15 December 2015 Why study Microbiome? ü Essential component, e.g.

More information

Microbiome: Metagenomics 4/4/2018

Microbiome: Metagenomics 4/4/2018 Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms

More information

HMP Data Set Documentation

HMP Data Set Documentation HMP Data Set Documentation Introduction This document provides detail about files available via the DACC website. The goal of the HMP consortium is to make the metagenomics sequence data generated by the

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig. S1 - Nationwide contributions of the most abundant genera. The figure shows log 10 of the relative percentage of genera, forming 80% of total abundance. (Russian

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature09944 Supplementary Figure 1. Establishing DNA sequence similarity thresholds for phylum and genus levels Sequence similarity distributions of pairwise alignments of 40 universal single

More information

Package GUniFrac. February 13, 2018

Package GUniFrac. February 13, 2018 Type Package Title Generalized UniFrac Distances Version 1.1 Date 2018-02-14 Author Jun Chen Maintainer Jun Chen Package GUniFrac February 13, 2018 Generalized UniFrac distances for

More information

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data What can we tell about the taxonomic and functional stability of microbiota? Why? Nature. 2012; 486(7402): 207 214. doi:10.1038/nature11234

More information

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA A. MURAT EREN Department of Computer Science, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148, USA Email: aeren@uno.edu MICHAEL

More information

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Microbiome Analysis. Research Day 2012 Ranjit Kumar Microbiome Analysis Research Day 2012 Ranjit Kumar Human Microbiome Microorganisms Bad or good? Human colon contains up to 100 trillion bacteria. Human microbiome - The community of bacteria that live

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Pannaraj PS, Li F, Cerini C, et al. Association between breast milk bacterial communities and establishment and development of the infant gut microbiome. JAMA Pediatr. Published

More information

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11) Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis

More information

Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity

Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity He et al. Microbiome (2015) 3:20 DOI 10.1186/s40168-015-0081-x RESEARCH Open Access Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity Yan He

More information

Parts of a standard FastQC report

Parts of a standard FastQC report FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia Development of NGS metabarcoding for the characterization of aerobiological samples Lucia Muggia Alberto Pallavicini, Elisa Banchi, Claudio G. Ametrano, David Stankovic, Silvia Ongaro, Enrico Tordoni,

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador I AM NOT A METAGENOMIC EXPERT I am merely the MESSENGER Blaise T.F. Alako, PhD EBI Ambassador blaise@ebi.ac.uk Hubert Denise Alex Mitchell Peter Sterk Sarah Hunter http://www.ebi.ac.uk/metagenomics Blaise

More information

Sequencing Errors, Diversity Estimates, and the Rare Biosphere

Sequencing Errors, Diversity Estimates, and the Rare Biosphere Sequencing Errors, Diversity Estimates, and the Rare Biosphere or Living in the shadow of Errares Susan Huse Marine Biological Laboratory June 13, 2012 Consistent Community Profile across samples and environments

More information

Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef

Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef K.L. Huebner, J.N. Martin C.J. Weissend, K.L. Holzer, M. Weinroth, Z. Abdo,

More information

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport Evgueni Doukhanine, Anne Bouevitch, Ashlee Brown, Jessica Gage LaVecchia, Carlos Merino and Lindsay

More information

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

An introduction into 16S rrna gene sequencing analysis. Stefan Boers An introduction into 16S rrna gene sequencing analysis Stefan Boers Microbiome, microbiota or metagenomics? Microbiome The entire habitat, including the microorganisms, their genomes (i.e., genes) and

More information

choose MBL-REGISTER user: dm00834 password: dm00834 http://register.mbl.edu/ stamps.mbl.edu this uses the username and password on your STAMPS name badge Strategies for Analysis of Microbial Population

More information

COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES

COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES Tyler Bradley * Jacob R. Price * Christopher M. Sales * * Department of Civil, Architectural, and Environmental Engineering,

More information

Diversity Profiling Service: Sample preparation guide

Diversity Profiling Service: Sample preparation guide Diversity Profiling Service: Sample preparation guide CONTENTS 1 Overview: Microbial Diversity Profiling at AGRF... 2 2 Submission types to the Microbial Diversity Profiling Service... 3 2.1 Diversity

More information

Introduction to OTU Clustering. Susan Huse August 4, 2016

Introduction to OTU Clustering. Susan Huse August 4, 2016 Introduction to OTU Clustering Susan Huse August 4, 2016 What is an OTU? Operational Taxonomic Units a.k.a. phylotypes a.k.a. clusters aggregations of reads based only on sequence similarity, independent

More information

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Experimental Design Microbial Sequencing

Experimental Design Microbial Sequencing Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing

More information

NGS part 2: applications. Tobias Österlund

NGS part 2: applications. Tobias Österlund NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Diversity Profiling Service: Sample preparation guide

Diversity Profiling Service: Sample preparation guide Diversity Profiling Service: Sample preparation guide CONTENTS 1 Overview: Microbial Diversity Profiling at AGRF... 2 2 Submission types to the Microbial Diversity Profiling Service... 3 2.1 Diversity

More information

Bioinformatics for Microbial Biology

Bioinformatics for Microbial Biology Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:

More information

SMRT Analysis Barcoding Overview (v6.0.0)

SMRT Analysis Barcoding Overview (v6.0.0) SMRT Analysis Barcoding Overview (v6.0.0) Introduction This document applies to PacBio RS II and Sequel Systems using SMRT Link v6.0.0. Note: For information on earlier versions of SMRT Link, see the document

More information

Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada June 26, 2017

Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada   June 26, 2017 Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada jeff.xia@mcgill.ca www.xialab.ca June 26, 2017 Metabolomics http://metaboanalyst.ca Systems transcriptomics http://networkanalyst.ca

More information

Protist diversity along a salinity gradient in a coastal lagoon

Protist diversity along a salinity gradient in a coastal lagoon The following supplement accompanies the article Protist diversity along a salinity gradient in a coastal lagoon Sergio Balzano*, Elsa Abs, Sophie C. Leterme *Corresponding author: sergio.balzano@nioz.nl

More information

Supplementary Figure 1: Seasonal CO2 flux from the S1 bog The seasonal CO2 flux from 1.2 m diameter collars during (a) fall 2014, (b) winter 2015,

Supplementary Figure 1: Seasonal CO2 flux from the S1 bog The seasonal CO2 flux from 1.2 m diameter collars during (a) fall 2014, (b) winter 2015, Supplementary Figure 1: Seasonal CO2 flux from the S1 bog The seasonal CO2 flux from 1.2 m diameter collars during (a) fall 2014, (b) winter 2015, (c) and summer 2015 across temperature treatments. Black

More information

mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University

mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur Mission to develop a single piece of open-source, expandable

More information

Fungal ITS Bioinformatics Efforts in Alaska

Fungal ITS Bioinformatics Efforts in Alaska Fungal ITS Bioinformatics Efforts in Alaska D. Lee Taylor ltaylor@iab.alaska.edu Institute of Arctic Biology University of Alaska Fairbanks Shawn Houston Minnesota Supercomputing Institute University of

More information

Report on database pre-processing

Report on database pre-processing Multiscale Immune System SImulator for the Onset of Type 2 Diabetes integrating genetic, metabolic and nutritional data Work Package 2 Deliverable 2.3 Report on database pre-processing FP7-600803 [D2.3

More information

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS. TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS. Ian Jeffery I.Jeffery@ucc.ie What is metagenomics Metagenomics is the study of genetic material recovered directly from environmental

More information

SANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM

SANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM SANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM 10/23/2017 Microbiome : Analysis of NGS Data 1 Outline Background Wet Lab! Raw reads Quality Assessment

More information

Measuring the human gut microbiome: new tools and non alcoholic fatty liver disease

Measuring the human gut microbiome: new tools and non alcoholic fatty liver disease Western University Scholarship@Western Electronic Thesis and Dissertation Repository July 2016 Measuring the human gut microbiome: new tools and non alcoholic fatty liver disease Ruth G. Wong The University

More information

Functional analysis using EBI Metagenomics

Functional analysis using EBI Metagenomics Functional analysis using EBI Metagenomics Contents Tutorial information... 2 Tutorial learning objectives... 2 An introduction to functional analysis using EMG... 3 What are protein signatures?... 3 Assigning

More information

A Guide to Enterotypes across the Human Body: Meta- Analysis of Microbial Community Structures in Human Microbiome Datasets

A Guide to Enterotypes across the Human Body: Meta- Analysis of Microbial Community Structures in Human Microbiome Datasets A Guide to Enterotypes across the Human Body: Meta- Analysis of Microbial Community Structures in Human Microbiome Datasets Omry Koren 1., Dan Knights 2., Antonio Gonzalez 2., Levi Waldron 3,4, Nicola

More information

Getting of the representative sequences from the clusters (consensus/most abundant) *(MAFFT) Identification of OTUs *(BLAST)

Getting of the representative sequences from the clusters (consensus/most abundant) *(MAFFT) Identification of OTUs *(BLAST) Illumina pair-end data (R1 & R2 FASTQ) FASTA FASTQ TEXT joining of pair-end data *(fastq-join) v2.0 Quality filtering/sequence trimming/removing of ambiguous bases Grouping sequences by BARCODE motives

More information

QIIME allows analysis of high-throughput community sequencing data

QIIME allows analysis of high-throughput community sequencing data nature methods QIIME allows analysis of high-throughput community sequencing data J Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D Bushman, Elizabeth K Costello, Noah Fierer,

More information

Analysis of milk microbial profiles using 16s rrna gene sequencing in milk somatic cells and fat

Analysis of milk microbial profiles using 16s rrna gene sequencing in milk somatic cells and fat Analysis of milk microbial profiles using 16s rrna gene sequencing in milk somatic cells and fat Juan F. Medrano Anna Cuzco* Alma Islas-Trejo Armand Sanchez* Olga Francino* Dept. of Animal Science University

More information

Ganatum: a graphical single-cell RNA-seq analysis pipeline

Ganatum: a graphical single-cell RNA-seq analysis pipeline Ganatum: a graphical single-cell RNA-seq analysis pipeline User Manual February 28, 2017 University of Hawaii 2017 Contents 1. Introduction... 1 2. Upload... 1 3. Batch-effect removal... 4 4. Outlier removal...

More information

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Enabling reproducible data analysis for metagenomics eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Outline 16S rrna analysis Current CBIO 16S rrna analysis setup H3ABioNet hackathon

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Metagenomic species profiling using universal phylogenetic marker genes

Metagenomic species profiling using universal phylogenetic marker genes Metagenomic species profiling using universal phylogenetic marker genes Shinichi Sunagawa, Daniel R. Mende, Georg Zeller, Fernando Izquierdo-Carrasco, Simon A. Berger, Jens Roat Kultima, Luis Pedro Coelho,

More information

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-

More information

Analysis of Microbial Diversity in Disturbed Soil

Analysis of Microbial Diversity in Disturbed Soil The University of Akron IdeaExchange@UAkron Honors Research Projects The Dr. Gary B. and Pamela S. Williams Honors College Spring 2017 Analysis of Microbial Diversity in Disturbed Soil Tyler G. Sanda University

More information

A Methodology Study for Metagenomics using Next Generation Sequencers

A Methodology Study for Metagenomics using Next Generation Sequencers A Methodology Study for Metagenomics using Next Generation Sequencers Presenter: Sushmita Singh An ABRF 2011-12 DSRG Study Definitions?! Metagenomics: Metagenomics is the study of metagenomes, genetic

More information

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine Introduction to Microbial Community Analysis Tommi Vatanen CS-E5890 - Statistical Genetics and Personalised Medicine Structure of the lecture Motivation: human microbiome Terminology Data types, analysis

More information

David Jacob Meltzer m. Supervisor: Dr. Umer Zeeshan Ijaz

David Jacob Meltzer m. Supervisor: Dr. Umer Zeeshan Ijaz AMPLIpyth: A Python Pipeline for Amplicon Processing David Jacob Meltzer 0803837m MSc Bioinformatics, Polyomics and Systems Biology Supervisor: Dr. Umer Zeeshan Ijaz A report submitted in partial fulfillment

More information

Getting of the representative sequences from the clusters (consensus/most abundant) *(MAFFT) Identification of OTUs *(BLAST)

Getting of the representative sequences from the clusters (consensus/most abundant) *(MAFFT) Identification of OTUs *(BLAST) Illumina pair-end data (R1 & R2 FASTQ) FASTA FASTQ TEXT joining of pair-end data *(fastq-join) v2.0 Quality filtering/sequence trimming/removing of ambiguous bases Grouping sequences by BARCODE motives

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Microbial community dynamics and stability during an ammonia- induced shift to syntrophic acetate oxidation Jeffrey J. Werner 1,2, Marcelo L. Garcia 3, Sarah D. Perkins 3,

More information

Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef

Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef K.L. Huebner, J.N. Martin C.J. Weissend, K.L. Holzer, M. Weinroth, Z. Abdo,

More information

Chapter 2: Access to Information

Chapter 2: Access to Information Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI

More information

Microbiome analysis of skin undergoing acne treatments

Microbiome analysis of skin undergoing acne treatments Microbiome analysis of skin undergoing acne treatments Groups Sample size Time points Head Site Code Healthy, No treatment Acne, Receiving Spironolactone 4 0 2 0,1 Forehead Cheek Nose Chin Fh Ck No Ch

More information

Evaluation of a Short-Term Scientific Mission (STSM) Cost Action ES1406 KEYSOM soil biodiversity of European transect

Evaluation of a Short-Term Scientific Mission (STSM) Cost Action ES1406 KEYSOM soil biodiversity of European transect Evaluation of a Short-Term Scientific Mission (STSM) Cost Action ES1406 KEYSOM soil biodiversity of European transect Name: KEYSOM soil biodiversity of European transect COST STSM Reference Number: COST-STSM-ES1406-35328

More information

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007 Methods for comparing multiple microbial communities. james robert white, whitej@umd.edu Advisor: Mihai Pop, mpop@umiacs.umd.edu October 1 st, 2007 Abstract We propose the development of new software to

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

The virome of the human gut: metagenomic analysis of changes associated with diet

The virome of the human gut: metagenomic analysis of changes associated with diet The virome of the human gut: metagenomic analysis of changes associated with diet James Lewis Gary Wu Frederic Bushman Diet, Genetic Factors, and the Gut Microbiome in Crohn s Disease University of Pennsylvania

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION ARTICLE NUMBER: 16088 DOI: 10.1038/NMICROBIOL.2016.88 Species-function relationships shape ecological properties of the human gut microbiome Sara Vieira-Silva 1,2*, Gwen Falony

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome Allali et al. BMC Microbiology (2017) 17:194 DOI 10.1186/s12866-017-1101-8 RESEARCH ARTICLE Open Access A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the

More information

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for you to discover methylation changes at specific genomic

More information

Microbial community structure and a core microbiome in biological rapid sand filters at Danish waterworks

Microbial community structure and a core microbiome in biological rapid sand filters at Danish waterworks Downloaded from orbit.dtu.dk on: Jan 24, 2019 Gülay, Arda; Musovic, Sanin; Albrechtsen, Hans-Jørgen; Smets, Barth F. Publication date: 2013 Document Version Publisher's PDF, also known as Version of record

More information

Quality Filtering of Illumina Sequences. Susan Huse Brown University August 6, 2015

Quality Filtering of Illumina Sequences. Susan Huse Brown University August 6, 2015 Quality Filtering of Illumina Sequences Susan Huse Brown University August 6, 2015 Illumina FASTQ Files File naming: NA10831_ATCACG_L002_R1_001.fastq.gz FA1_S1_L001_R1_001.fastq.gz Sample_Barcode/Index_Lane_Read#_Set#.fastq.gz

More information

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian Metagenomics T TM storage genetics assembly ncrna custom genotyping RNA-seq de novo mendelian ChIP-seq exome genomics indel ngs trio prediction metagenomics SNP resequencing bioinformatics diagnostics

More information

1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute

1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute 1 Abstract 2 Introduction The human body is home to an enormous number and diversity of microbes. These microbes, our microbiome, are increasingly thought to be required for normal human development, physiology,

More information

Applications of Next Generation Sequencing in Metagenomics Studies

Applications of Next Generation Sequencing in Metagenomics Studies Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno

More information

Analysis Datasheet Exosome RNA-seq Analysis

Analysis Datasheet Exosome RNA-seq Analysis Analysis Datasheet Exosome RNA-seq Analysis Overview RNA-seq is a high-throughput sequencing technology that provides a genome-wide assessment of the RNA content of an organism, tissue, or cell. Small

More information

Supplementary Note 1. Description of the main MetaPhlAn2 additions compared to MetaPhlAn1

Supplementary Note 1. Description of the main MetaPhlAn2 additions compared to MetaPhlAn1 MetaPhlAn2 for enhanced metagenomic taxonomic profiling Duy Tin Truong 1, Eric Franzosa 2,3, Timothy L. Tickle 2,3, Matthias Scholz 1, George Weingart 2, Edoardo Pasolli 1, Adrian Tett 1, Curtis Huttenhower

More information

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College. Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:

More information

points in a line over time.

points in a line over time. Chart types Published: 2018-07-07 Dashboard charts in the ExtraHop system offer multiple ways to visualize metric data, which can help you answer questions about your network behavior. You select a chart

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Complete Report Catalogue # and Service: IR16001 rrna depletion (human, mouse, or rat) IR11081 Total RNA Sequencing (80 million reads, 2x75 bp PE) Xxxxxxx - xxxxxxxxxxxxxxxxxxxxxx

More information

Optimizing taxonomic classification of marker gene amplicon sequences

Optimizing taxonomic classification of marker gene amplicon sequences 1 2 3 4 5 Optimizing taxonomic classification of marker gene amplicon sequences Nicholas A. Bokulich 1# *, Benjamin D. Kaehler 2# *, Jai Ram Rideout 1, Matthew Dillon 1, Evan Bolyen 1, Rob Knight 3, Gavin

More information

HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies

HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies Zheng et al. Genome Biology (2018) 19:82 https://doi.org/10.1186/s13059-018-1450-0 SOFTWARE HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome

More information

Microgenetix Meaningful Microbial Diversity Gut Health for Good Health

Microgenetix Meaningful Microbial Diversity Gut Health for Good Health Microgenetix Meaningful Microbial Diversity Gut Health for Good Health J G (Greg) Taylor B.Sc., Grad.Dip.Bus.Admin., AAIFST, MASM. Tara Cassidy B.Sc (Animal Science), B.Sc (Hons), MASM. Sept 2018 Microgenetix

More information

SHAMAN : SHiny Application for Metagenomic ANalysis

SHAMAN : SHiny Application for Metagenomic ANalysis SHAMAN : SHiny Application for Metagenomic ANalysis Stevenn Volant, Amine Ghozlane Hub Bioinformatique et Biostatistique C3BI, USR 3756 IP CNRS Biomics CITECH Ribosome ITS (1) : located between 18S and

More information

METAGENOMICS. Aina Maria Mas Calafell Genomics

METAGENOMICS. Aina Maria Mas Calafell Genomics METAGENOMICS Aina Maria Mas Calafell Genomics Introduction Microbial communities Primary role in biogeochemical systems Study of microbial communities 1.- Culture-based methodologies Only isolated microbes

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

Metastats 2.0: An improved method and software for analyzing metagenomic data

Metastats 2.0: An improved method and software for analyzing metagenomic data Metastats 2.0: An improved method and software for analyzing metagenomic data Joseph N Paulson Mihai Pop Héctor Corrada Bravo October 20, 2011 Abstract This document outlines the project proposal for the

More information

Joint RuminOmics/Rumen Microbial Genomics Network Workshop

Joint RuminOmics/Rumen Microbial Genomics Network Workshop Joint RuminOmics/Rumen Microbial Genomics Network Workshop Microbiome analysis - Amplicon sequencing Dr. Sinéad Waters Animal and Bioscience Research Department, Teagasc Grange, Ireland Prof. Leluo Guan

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.4 Advanced analytics at your hands Today, most organizations are stuck at lower-value descriptive analytics. But more sophisticated analysis can bring great business value. TARGET APPLICATIONS Business

More information