Introduc)on to QIIME on the IPython Notebook

Similar documents
Microbiome Analysis. Research Day 2012 Ranjit Kumar

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

Functional analysis using EBI Metagenomics

Metagenome Analysis With MG- RAST

Phylogenetic methods for taxonomic profiling

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport

Infectious Disease Omics

David Jacob Meltzer m. Supervisor: Dr. Umer Zeeshan Ijaz

A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome

Introduction to OTU Clustering. Susan Huse August 4, 2016

Supplementary Figure and Table Legends

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017

CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Bioinformatic Suggestions on MiSeq-Based Microbial Community S

Bioinformatic tools for metagenomic data analysis

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities

METAGENOMICS. Aina Maria Mas Calafell Genomics

Sanger vs Next-Gen Sequencing

MICROBIOME SOFTWARE: END OF BEGINNING.

Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture

Using Rule Induction to Elucidate Co-Occurrence Patterns in Microbial Data. K. Kumar Thurimella. A thesis submitted to the

Welcome to the NGS webinar series

Microbial Diversity and Assessment (III) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Biochemistry 412. New Strategies, Technologies, & Applications For DNA Sequencing. 12 February 2008

Genomics and High Performance Computing. Folker Meyer Argonne National Laboratory and University of Chicago

Chapter 12: Human Microbiome Analysis

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Human-microbe mutualism: stability and resilience in health and disease

Quality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Prokaryotic Diversity of the Wastewater Outfalls, Reefs, and Inlets of Broward County

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

SHAMAN : SHiny Application for Metagenomic ANalysis

16S rrna gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles

ELE4120 Bioinformatics. Tutorial 5

Why learn sequence database searching? Searching Molecular Databases with BLAST

Improved taxonomic assignment of human intestinal 16S rrna sequences by a dedicated reference database

DNA extraction protocols cause differences in 16S rrna amplicon sequencing efficiency but not in community profile composition or structure

Next-Generation Sequencing. Technologies

Introduction. Jullien M. Flynn 1, Emily A. Brown 1,2,Frederic J. J. Chain 1, Hugh J. MacIsaac 2 & Melania E. Cristescu 1. Abstract

M1D2: Diagnostic Primer Design 2/10/15

scgem Workflow Experimental Design Single cell DNA methylation primer design

Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics. A proposal to the Gordon and Betty Moore Foundation

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

Kristin Tweel, PhD, MBA

RESEARCH INSTITUTION: : BASELINE AND OIL SPILL IMPACTED MARINE SPONGE MICROBIAL COMMUNITIES AND GENE EXPRESSION ANALYSIS WITH METAGENOMICS

RNA-seq Data Analysis

Microbiome analysis of skin undergoing acne treatments

Introns early. Introns late

Korilog. high-performance sequence similarity search tool & integration with KNIME platform. Patrick Durand, PhD, CEO. BIOINFORMATICS Solutions

Supplementary Information

arxiv: v1 [q-bio.gn] 25 Nov 2015

Finding Biology in the Human Microbiome. George Weinstock

Recent urbanization in China is correlated with a Westernized microbiome encoding increased virulence and antibiotic resistance genes

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

Assigning Sequences to Taxa CMSC828G

Predictive functional profiling of microbial communities using 16S rrna marker gene sequences

Plan, Deploy and Configure Microsoft InTune

How much sequencing do I need? Emily Crisovan Genomics Core

Turning Customers into Marketers Kim Johnston, VP of Marketing, Parallels Emily Johnson, Account Director, Banyan Branch

De Novo Assembly of High-throughput Short Read Sequences

(SHOTGUN) METAGENOMICS. Hélène Touzet, CNRS, CRIStAL

Metagenomic Analysis in Human- Associated Projects

Exploring Microbial Diversity and Taxonomy Using SSU rrna Hypervariable Tag Sequencing

Introduction to NGS Analysis Tools

Microbial Biogeography of Public Restroom Surfaces

Forest soil bacterial community analysis using high-throughput amplicon sequencing

Water Quality and Waller Creek Dr. Kinney & UTBIOME Collaborators. What is in Waller Creek? A Wide Variety of Biota!

NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

6 Keys to SharePoint User Adoption.

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

AP BIOLOGY. Investigation #2 Mathematical Modeling: Hardy-Weinberg. Slide 1 / 35. Slide 2 / 35. Slide 3 / 35. Investigation #2: Mathematical Modeling

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Synthetic spike-in standards for high-throughput 16S rrna gene amplicon sequencing

Theory and Application of Multiple Sequence Alignments

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.

Basic Bioinformatics: Homology, Sequence Alignment,

A proposal to the Gordon and Betty Moore Foundation

T he diverse microbial communities that dwell in the human body are linked intimately with aspects of host

Novel bacterial taxa in the human microbiome

Supplemental Information. Temperature-Phased Conversion of Acid. Whey Waste Into Medium-Chain Carboxylic. Acids via Lactic Acid: No External e-donor

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

Introduction to Bioinformatics and Gene Expression Technologies

Barcoded primers used in multiplex amplicon pyrosequencing bias amplification

Bioinformatics and computational tools

Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rrna Gene Sequence Analysis

HLA and Next Generation Sequencing it s all about the Data

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Experimental design and quantitative analysis of microbial community multiomics

Student Learning Outcomes (SLOS)

Product presentation. Fujitsu HPC Gateway SC 16. November Copyright 2016 FUJITSU

Transcription:

Strategies and Techniques for Analyzing Microbial Population Structures Introduc)on to QIIME on the IPython Notebook Rob Knight Adam Robbins- Pianka Will Van Treuren Yoshiki Vázquez- Baeza ( @yosmark ) Luke Ursell

A microbe dominated world The universal nature of biochemistry. Pace NR. Proc Natl Acad Sci U S A. 2001 Jan 30;98(3):805-8.

Vast microbial diversity in every question: ecosystem, how including human our are own we? Human: 10 trillion human cells 20,000 human genes Microbiota: 100 trillion microbial cells Microbiota: 2-20 million microbial genes 99.9% of our genomes the same, but our microbes...?

How do we assay this diversity?

Sequencing output (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files Metadata mapping file www.qiime.org Pre-processing e.g., remove primer(s), demultiplex, quality filter OTU (or other sample by observation) table Phylogenetic Tree Evolutionary relationship between OTUs Denoise 454 Data Database Submission α-diversity and rarefaction β-diversity and rarefaction PyroNoise, Denoiser (In development) e.g., Phylogenetic Diversity, Chao1, Observed Species e.g., Weighted and unweighted UniFrac, Bray- Curtis, Jaccard Pick OTUs and representative sequences Reference based BLAST, UCLUST, USEARCH De novo e.g., UCLUST, CD-HIT, MOTHUR, USEARCH Interactive visualizations e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering. Assign taxonomy BLAST, RDP Classifier Align sequences e.g., PyNAST, INFERNAL, MUSCLE, MAFFT Legend Currently supported for marker-gene data only Currently supported for general sample by observation data Build 'OTU table' i.e., sample by observation matrix Build phylogenetic tree e.g., FastTree, RAxML, ClearCut (i.e., 'upstream' step) Required step or input (i.e., 'downstream' step) Optional step or input

Samples to sequences Sequencing output (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files Metadata mapping file Pre-processing e.g., remove primer(s), demultiplex, quality filter Denoise 454 Data PyroNoise, Denoiser Database Submission (In development)

Error- correczng codes allow mulzplex sequencing >GCACCTGAGGACAGGCATGAGGAA >GCACCTGAGGACAGGGGAGGAGGA >TCACATGAACCTAGGCAGGACGAA >CTACCGGAGGACAGGCATGAGGAT >TCACATGAACCTAGGCAGGAGGAA >GCACCTGAGGACACGCAGGACGAC >CTACCGGAGGACAGGCAGGAGGAA >CTACCGGAGGACACACAGGAGGAA >GAACCTTCACATAGGCAGGAGGAT >TCACATGAACCTAGGGGCAAGGAA >GCACCTGAGGACAGGCAGGAGGAA >PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTAC GCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAG GTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCT CTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACA TGGGCTAGG! >PC.354_3 FLP3FBN01EEWKD! TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATC CATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATC CCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGC CATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGT TGGATACGTGTTACTCACCCGTGCGCCGGT! Micah Hamady, et al., Nature Methods, 2008. Error- correczng barcodes for pyrosequencing hundreds of samples in mulzplex.

Sequences to OTUs and Phylogeny Pick OTUs and representative sequences Reference based BLAST, UCLUST, USEARCH Assign taxonomy BLAST, RDP Classifier De novo e.g., UCLUST, CD-HIT, MOTHUR, USEARCH Align sequences e.g., PyNAST, INFERNAL, MUSCLE, MAFFT e.g. p Build 'OTU table' i.e., sample by observation matrix Build phylogenetic tree e.g., FastTree, RAxML, ClearCut

OTU Picking de- novo Clustering Algorithm Clustered Sequences TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA Experimental Sequences OTU1! OTUS OTU2! OTU3!

OTU Picking Closed Reference Reference! Sequences TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA Sequences that hit a reference TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! Sequences that failed to hit TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA Experimental Sequences OTUS OTU1! OTU1! OTU1!

OTU Picking Open Reference Reference! Sequences TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA Sequences that hit a reference TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! TTGGAAGATGTCTCAGTTCCAG! Sequences that failed to hit TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA! TTGGAAGATGTCTCAGTTCCAG! TTGGGCCGTATGTCAGTCCCTA Experimental Sequences Clustering Algorithm OTU4! OTU5! OTU6! OTU1! OTUS OTU2! OTU3!

CompuZng alpha and beta diversity OTU (or other sample by observation) table Phylogenetic Tree Evolutionary relationship between OTUs α-diversity and rarefaction e.g., Phylogenetic Diversity, Chao1, Observed Species β-diversity and rarefaction e.g., Weighted and unweighted UniFrac, Bray- Curtis, Jaccard

Comparing microbial communizes Who s there? How many are are there? α (i.e., within sample) diversity How similar are any two samples? Treatments? β (i.e., between sample) diversity

PhylogeneZc Diversity (PD): a qualitazve, phylogenezc α- diversity metric Sum of branch length covered by a sample Faith DP (1992) ConservaZon evaluazon and phylogenezc diversity. Biological ConservaZon. 61:1-10.

Unweighted UniFrac: a qualitazve, phylogenezc β- diversity metric IdenZcal communizes D = 0.0 Related communizes D ~ 0.5 Unrelated communizes D = 1.0 Percent of observed branch length that is unique to either sample Lozupone and Knight, 2005, Appl Environ Microbiol 71:8228

Clustering by UniFrac distance

Extract DNA and amplify marker gene with barcoded primers Pool amplicons and sequence www.qiime.org >GCACCTGAGGACAGGCATGAGGAA >GCACCTGAGGACAGGGGAGGAGGA >TCACATGAACCTAGGCAGGACGAA >CTACCGGAGGACAGGCATGAGGAT >TCACATGAACCTAGGCAGGAGGAA >GCACCTGAGGACACGCAGGACGAC >CTACCGGAGGACAGGCAGGAGGAA >CTACCGGAGGACACACAGGAGGAA >GAACCTTCACATAGGCAGGAGGAT >TCACATGAACCTAGGGGCAAGGAA >GCACCTGAGGACAGGCAGGAGGAA Assign reads to samples RefSeq 1 RefSeq 2 RefSeq 3 RefSeq 4 RefSeq 5 RefSeq 6 RefSeq 7 RefSeq 8 RefSeq 9 RefSeq 10 Assign millions of sequences from thousands of samples to OTUs Compute UniFrac distances and compare samples

Key QIIME files Mapping file: per sample meta- data, user- defined OTU table: sample x OTU matrix, central to downstream analyses [now in biom format] Parameters file: defines analyses, for use with the workflow scripts (opzonal)

Parameters Can Be Set In a Few Ways qiime_config files Environment Variable $QIIME_CONFIG_FP User s home directory Parameter files Command line

Mapping file

Mapping file: always run check_id_map.py! = required field

OTU table (classic format) sample x OTU matrix

OTU table (classic format) sample x OTU matrix OTU idenzfiers

OTU table (classic format) sample x OTU matrix Sample idenzfiers

OTU table (classic format) sample x OTU matrix OpZonal per OTU taxonomic informazon

OTU tables are now in biological observazon matrix (.biom) format (QIIME 1.4.0- dev and later) Google: biom format hsp://biom- format.org See convert_biom.py for translazng between classic and biom otu tables

sample x observa/on con/ngency matrix OTUs Samples Observa/on counts

sample x observa/on con/ngency matrix Functions Metagenomes Observa/on counts

sample x observa/on con/ngency matrix Samples Genomes Samples OTUs Marker gene (e.g., 16S) surveys Ortholog groups ComparaZve genomics Taxa Marker gene (e.g., 16S) surveys Functions Metagenomes Metagenomics Metabolites Samples Metabolomics... Metatranscriptomics

The Biological ObservaZon Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome- ome JSON- based format for represenzng arbitrary sample x observazon conzngency tables with opzonal metadata McDonald et al., GigaScience (2012). hsp://www.biom- format.org

Running QIIME NaZve installazon on Mac (OS X) or Linux From laptops to 16,000+ core compute cluster qiime- deploy Ubuntu Virtual Box Cloud- based installazons hsp://ncar.janus.rc.colorado.edu/

Amazon ElasZc Compute Cloud (EC2)

Moving Pictures of the Human Microbiome Two subjects sampled daily, one for six months, one for 18 months Four body sites: tongue, palm of le{ hand, palm of right hand, and gut (via fecal swabs). Caporaso JG et al. (2011) Moving pictures of the human microbiome. Genome biology 12: R50.

Moving Pictures of the Human Microbiome InvesZgate the relazve temporal variability of body sites. Is there a temporal core microbiome? Technical points: do we observe the same conclusions on 454 and Illumina data?

Moving Pictures of the Human Microbiome: QIIME tutorial A small subset of the full data set to facilitate short run Zme: ~0.1% of the full sequence colleczon. Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454.

Tutorial Click on the link in the wiki. Find your user name in the notebook. It will look something like: wvtreuren_stamps_2013.ipynb Click this link. It will open in a new window. Don t do anything else un)l we complete the next 4 slides.

IPython reference IPython acts like a hybrid python/bash environment. The way we interact with the IPython notebook is through the cells

IPython reference Commands prefixed by a '!' character are issued to the shell (just like what your terminal runs). Commands not prefixed with '!' are issued to python, and behave as they normally would in python. Each 'cell' of the notebook is executable. ShiR+Enter (or the play buton) is the way you execute (or re- execute) the commands in a given cell. You must click in the cell to gain focus in that cell, and then type ShiR+Enter or hit the play buton

IPython reference Each executable has a prefix that shows you its status (if it has been run, if it hasn t been run, or if its szll running) Hasn t been run Has been run SZll running

Tree Building Experimental Sequences TTGGAAGATGTCTCAGTTCCAGA! TTGGGCCGTATGTCAGTCCCTAAGGAG! CTGGGCCGTGTCTCAGTCCCAATCA! TTGGAAGATGTCTCAGTTCCAGGGGCTATAA! TTGGGCCGTATGTCAGTCCCTACGTAACA Phylogeny! CTG-CGCCGTGTCTCAGT CCTC--AA! TTGGAAGATGTCTCAGT----TCCAGA! TTGGGCCGTATGTCAGTCCCTAAGGAG! CTG-GGCG--TGTCTCAGTCCCAATCA! TTGGAAGATGT--CTCAGT-GCTATAA! TTGG---ATGTCAGTCCCTACGTAACA Aligned! Sequences CTG-CGCCGTGTCTCAGT CCTC--AA! CG! C! TTGGAAGATGTCTCAGT----TCCAGA! AA! A! TTGGGCCGTATGTCAGTCCCTAAGGAG! GC! A! CTG-GGCG--TGTCTCAGTCCCAATCA! GG! G! TTGGAAGATGT--CTCAGT-GCTATAA! AA! A! TTGG---ATGTCAGTCCCTACGTAACA - Masked and aligned! sequences

In the ancient times of... 2012 We used KiNG for viewing 3D plots in QIIME.

It's 2013! Emperor

Description 3D visualizazon tool Cross- pla orm Integrates with QIIME and it's workflows Use case- driven Easy to use In aczve development hsp://www.khronos.org/webgl/ hsp://www.oracle.com/

hsp://24.media.tumblr.com/tumblr_m6q4dgigkw1qzjxifo1_1280.jpg

Issues, suggestions, feature requests? Contact us: o www.github.com/qiime/emperor Or contact the QIIME Forum o hsp://groups.google.com/group/qiime- forum

Now try the Taxa Summary Plots and OTU Category Significance seczons on your own