Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine

Similar documents
Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Microbiome: Metagenomics 4/4/2018

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Functional profiling with HUMAnN2

Microbiomes and metabolomes

Introduction to metagenomic analysis

Metagenomics Computational Genomics

Human Microbiome Project: First Map of the World Within Us. Hsin-Jung Joyce Wu "Microbiota and man: the story about us

What is metagenomics?

Nature Biotechnology: doi: /nbt Supplementary Figure 1. MBQC base beta diversity, major protocol variables, and taxonomic profiles.

CBC Data Therapy. Metagenomics Discussion

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

Bioinformatics for Microbial Biology

Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada June 26, 2017

Microbiota and What the Clinical Gastroenterologist Needs to Know

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport

NGS part 2: applications. Tobias Österlund

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

THE HUMAN MICROBIOME: RECENT DISCOVERIES AND APPLICATIONS TO MEDICINE

Infectious Disease Omics

HMP Data Set Documentation

Applications of Next Generation Sequencing in Metagenomics Studies

Experimental Design Microbial Sequencing

MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

At the age of big data sequencing, what's new about the naughty and efficient microbes within the WWTPs

SUPPLEMENTARY INFORMATION

Shantelle Claassen-Weitz Division of Medical Microbiology Department of Pathology

dbcamplicons pipeline Amplicons

Lecture 01: Overview of Metagenomics

Recent urbanization in China is correlated with a Westernized microbiome encoding increased virulence and antibiotic resistance genes

METAGENOMICS. Aina Maria Mas Calafell Genomics

SUPPLEMENTARY INFORMATION

Mini-Symposium MICROBIOTA. Free. Meet the speaker. 14. November :30 19:00 Bohnenkamp Haus. Everyone welcome. Sponsored by:

Human-microbe mutualism: stability and resilience in health and disease

Conducting Microbiome study, a How to guide

Measuring the human gut microbiome: new tools and non alcoholic fatty liver disease

Functional annotation of metagenomes

Supplementary Figures

CBC Data Therapy. Metatranscriptomics Discussion

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

Next G eneration Generation Microbial Microbial Genomics : The H uman Human Microbiome P roject Project George Weinstock

MB 668 Microbial Bioinformatics and Genome Evolution. 4 credits Spring, 2017

Name: Ally Bonney. Date: January 29, 2015 February 24, Purpose

dbcamplicons pipeline Amplicons

Sequencing Errors, Diversity Estimates, and the Rare Biosphere

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador

ST 591: Introduction to Quantitative Genomics Syllabus

Chapter 12: Human Microbiome Analysis

Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics. A proposal to the Gordon and Betty Moore Foundation

Supplementary Information for

Customized Phage Therapies To Eradicate Harmful Bacteria In Chronic Diseases. Europe Microbiome Congress London, 14 Nov., 2018

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Computing for Metagenome Analysis

Comparative genomics of clinical isolates of Pseudomonas fluorescens, including the discovery of a novel disease-associated subclade.

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.

Introduction to Bioinformatics

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Metagenomic Analysis in Human- Associated Projects

Robert Edgar. Independent scientist

Metagenomic species profiling using universal phylogenetic marker genes

Metagenomics of the Human Intestinal Tract

Lecture 8: Predicting metagenomic composition from 16S survey data

A proposal to the Gordon and Betty Moore Foundation

Molecular Evolution and Ecology. Martin Polz


Finding Biology in the Human Microbiome. George Weinstock

IMPACT OF DIET ON MICROBIOTA COMPOSITION AND FUNCTION IN THE SMALL INTESTINE Microbiome Drug Development Summit

Introduc)on to QIIME on the IPython Notebook

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007

CONSIDERING THE MICROBIOME AS PART OF FUTURE MEDICINE AND NUTRITION STRATEGIES: Challenges and proposed answers

MICROBIOMICS Current and future tools of the trade

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Genetics and Bioinformatics

Nagahama Institute of Bio-Science and Technology. National Institute of Genetics and SOKENDAI Nagahama Institute of Bio-Science and Technology

Supplementary Note 1. Description of the main MetaPhlAn2 additions compared to MetaPhlAn1

Microbial Community Assembly and Dynamics:.from AMD biofilms to colonization of the premature infant gut

Next Generation Sequencing. Tobias Österlund

Introduction to Microbiome Omics Technologies

Human Microbiome Project: A Community Resource. Lita M. Proctor, Ph.D. Coordinator, Human Microbiome Project NHGRI/NIH

Parts of a standard FastQC report

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

The human microbiome and cancer: New opportunities for population studies

Bioinformatic tools for metagenomic data analysis

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Introduction to OTU Clustering. Susan Huse August 4, 2016

Analysis of milk microbial profiles using 16s rrna gene sequencing in milk somatic cells and fat

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology. Tour the Bay Paul Center Keck Sequencing Facility

SUPPLEMENTARY INFORMATION

Functional analysis using EBI Metagenomics

The virome of the human gut: metagenomic analysis of changes associated with diet

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Transcription:

Introduction to Microbial Community Analysis Tommi Vatanen CS-E5890 - Statistical Genetics and Personalised Medicine

Structure of the lecture Motivation: human microbiome Terminology Data types, analysis Examples from human microbiome project (HMP) and DIABIMMUNE project

Our microbial selves: microbes are in, on & around us

Our microbial selves: microbes are in, on & around us More microbial cells than human cells in the human body (1-2 kg, mostly in gut) 1000s of species, each containing 1000s of genes (outnumber human genes 100:1) Under ideal conditions Aid in digestion, make nutrients (vit. K), keep bad guys out, train immune system Under non-ideal conditions Predispose, exacerbate, or directly cause deviations from health

What is metagenomics? Total collection of microorganisms within a community Also microbial community or microbiota Total genomic potential of a microbial community Study of uncultured microorganisms from the environment, which can include humans or other living hosts Total biomolecular repertoire of a microbial community

Sequencing techniques Massive parallel DNA sequencing revolutionized the study of microbial communities No need to isolate bacteria in lab Purify DNA and sequence Golden age of microbial community studies

What to do with your metagenome? Basic science Reservoir of gene and protein functional information Comprehensive snapshot of microbial ecology and evolution Translational science Public health tool monitoring population health and epidemiology Diagnostic or prognostic biomarker for host disease

Examples of metagenomic studies: Global ocean sampling 2003/2004 - ongoing

The NIH Human Microbiome Project (HMP): A comprehensive microbial survey What is a normal human microbiome? 300 healthy human subjects Multiple body sites 15 male, 18 female Multiple visits Clinical metadata www.hmpdacc.org

DIABIMMUNE study on the infant gut microbiome Follow developing infant gut microbiome in Finland, Estonia and Russian Karelia 222 infants, at risk for autoimmune diseases by genotype Monthly stool samples from birth until 3 years Clinical metadata: Diet, antibiotics, mode of birth, vaccinations https://pubs.broadinstitute.org/diabimmune/

Talking about microbes: Phylogenies OTU = operational taxonomic unit

Talking about microbes: Relative abundance Absolute abundance is always masked in data obtained by techniques discussed here Information is measured in relative abundances 30 % of the bacteria are XXX,

Talking about microbes: Abundance vs. prevalence Abundant but not prevalent Prevalent but not abundant Abundant and prevalent

Talking about microbes: diversity Diversity: broadly, a community s number and distribution of organisms Also community composition or structure Alpha-diversity refers to a diversity of a community (sample) Beta-diversity refers to dissimilarity between two communities

Talking about microbes: Alpha-diversity (1-sample) scenarios Not diverse Qualitatively diverse Taxonomically diverse Phylogenetically diverse Quantitatively diverse Taxonomically diverse

Talking about microbes: measures for alpha-diversity Richness: number of unique taxa Richness estimates (how many unobserved taxa?) Chao1 f 1 is the number of singleton taxa (observed only once, one read) and f 2 is the number of doubleton taxa Diversity as considered in information theory, entropy Shannon s diversity index p i is the relative abundance of taxon i Many other measures: Simpson, McIntosh, Berger-Parker, Vegan::diversity() in R

Alpha-diversity of the gut microbiome increases during first years of life Microbiome complexity & stability Birth 3 yrs Adult Elderly Kostic, A. D., Xavier, R. J., & Gevers, D. (2014). The microbiome in inflammatory bowel disease: current status and the future ahead. Gastroenterology, 146(6), 1489 99.

Increasing diversity in DIABIMMUNE Increase in diversity during first three years of life New microbes colonize the gut with increasing complexity of diet, environmental exposures, etc.

Talking about microbes: Beta-diversity (2-sample) scenarios Sample 1 Sample 2 Qualitatively diverse Taxonomically diverse Quantitatively diverse Taxonomically diverse Quantitatively diverse Phylogenetically diverse

Talking about microbes: measures for beta-diversity Jaccard index, proportion of shared taxa Bray-Curtis dissimilarity where C is the sum of the lesser values for only those species in common between both samples. S are the total number of species per sample. vegan::vegdist in R 20

UniFrac beta-diversity accounts for the phylogeny Raw weighted UniFrac metric Where n is the total number of branches in the tree, b i is the length of branch i, A i and B i are the number of descendants of branch i from communities A and B respectively, and A T and B T are the total number of sequences from communities A and B respectively Lozupone, C.; Knight, R. (2005). "UniFrac: A New Phylogenetic Method for Comparing Microbial Communities". Applied and Environmental Microbiology 71 21

Talking about microbes: ordination Ordination is a constrained projection of high-dimensional data into fewer dimensions Principal component analysis (PCA) guarantees the new dimensions to maximize normal variation Principal coordinates analysis (PCoA) denotes to any ordination method based on (dis)similarity matrix Nonmetric multidimensional scaling (NMDS) based on UniFrac beta-diversity is widely used in microbial community analysis Hamady, 2009

t-distributed stochastic neighborhood embedding Modern, distance / similarity matrix based technique for visualizing (highdimensional) data Find mapping / visualization which is faithful to the original local neighborhoods in the data Data points similar in the input data tend to be close in the visualization Rtsne::Rtsne in R

What aspects of a human host most influence microbial community composition? Rob Knight ~5,200 microbial communities profiled by 16S sequencing (closer = more similar)

How about infant gut microbiome? Variation in the infant gut microbiome is dominated by the age In DIABIMMUNE, Russians seem to have distinct microbiota compared to Finns and Estonians 25

Two big questions of microbial community analysis Who is there? What are they doing?

How to obtain data on microbes? Cultivate single strains of bacteria Traditional microbiology + sequencing Sequencing based methods for studying microbial communities Purify all DNA and sequence Amplicon-based methods target specific regions/genes of interest Shotgun sequencing for all DNA material Differences between sequencing methods Short vs. long reads Errors are more problematic than in e.g. human genome analysis

Sequencing as a tool for microbial community analysis (amplicon vs. shotgun) Lyse cells Extract & fragment DNA Features Samples Relative abundance Sequence short DNA reads 16S (18S, ITS) rrna gene Conserved across bacteria (Allows PCR amplification) Some regions are variable Permits genus-level ID Map reads to reference genomes AGCTAGA CCGATCG TTAGCAC ACTAGCA Assemble into contigs AGCTACAGC ACAGCACGGCAT GGCATCATC AGCTACAGCACGGCATCATC 28

Typical microbiome community analysis tasks Metagenomic data Stats 16S data 29 29

Two big questions of microbial community analysis Who is there? What are they doing?

Metagenomic methods: 16S rrna gene Structural component of the prokaryotic ribosome Used as molecular clock to identify phylogeny: Large, good scale for mutations Portions are constant, allowing amplification Relatively cheap Woese, 1987 Pace, 1997 V6 George Rice, Montana State University Ley, 2006 V2 31

Microbiome composition analysis: phylotypes and binning Binning: nontrivial assignment of reads to phylotypes or OTUs (=clustering / classification) Phylotype or operational taxonomic unit (OTU): organisms clonal to within some tolerance (e.g. 97%); species

Microbiome composition analysis: operational taxonomic unit (OTU) binning Open reference Clustering AAA AAG AAT TGA >Uniq1 AAA >Uniq2 TGA >Uniq3 TTT Closed reference Classification TTT TGG

QIIME for analysing amplicon sequencing data QIIME (pronounced chime) is a modular open-source bioinformatics pipeline for analysingmicrobial amplicon sequencing data Homepage qiime.org contains documentation, tutorials and other resource material Huge collection of scripts for many different analysis tasks

QIIME for analysing amplicon sequencing data

Profiling microbial communities by metagenomic shotgun sequencing Reference Genomes A Y X B Y Y C A X X B X Y C Short Reads 36

Indexing microbial pangenomes I II III I II IV III IV I II I II II IV III I II I I IV II V III II V NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 4,646 Eukaryota 2,177 V V IV II III II Bags of protein coding genes 49.0 million total genes II IV III V Species pangenomes 7,677 containing 18.6 million gene clusters II V Core genes V Marker genes RepoPhlAn ChocoPhlAn (http://metaref.org)

MetaPhlAn Metagenomic Phylogenic Analysis Reference Genomes A Y X B Y Y C A X X B X Y C Short Reads 38

MetaPhlAn data, species x samples

Other software for taxonomic profiling motu (metagenomic OTU) http://www.bork.embl.de/software/motu/ MEGAN http://ab.inf.uni-tuebingen.de/software/megan6/ Kraken https://ccb.jhu.edu/software/kraken/

Two big questions of microbial community analysis Who is there? What are they doing?

Metagenomic analysis: molecular functions in biological roles Subjects Phylum abundance Phylum abundance Nares Skin Oral (BM) Oral (SupP) Oral (TD) Gut Vaginal Pathway abundance http://hmpdacc.org/hmmrc Pathway abundance Subjects

Metagenomic analysis: molecular functions in biological roles Orthology: Grouping genes by conserved sequence features COG, KO, FIGfam Structure: Grouping genes by similar protein domains Pfam, TIGRfam, SMART, EC Biological roles: Grouping genes by pathway and process involvement GO, KEGG, MetaCyc, SEED Warnecke, 2007 Turnbaugh, 2009 DeLong, 2006

From reads to genes (HUMAnN2) INPUT: Quality controlled metagenome (or metatranscriptome) Rapidly identify species in the community with MetaPhlAn2 Nucleotide search reads vs. pangenomes of identified species Translated search unclassified reads vs. non-redundant protein db Isolate novel reads for external assembly http://huttenhower.sph.harvard.edu/humann2 44

From reads to genes (HUMAnN2) IV II V Quality-controlled RNA or DNA seq reads Taxonomic profiling (MetaPhlAn 2) List of abundant organisms III II V KEY data input Analysis module Unmapped reads Nucleotide level pangenome mapping (Bowtie 2) Functionally annotated species pangenomes (ChocoPhlAn) data product Organism-agnostic translated search (diamond) Organism specific hits Universal protein reference database (UniRef) Hits to protein families http://huttenhower.sph.harvard.edu/humann2 HUMAnN core algorithms Pathway collection (MetaCyc) 45

Body site-specific signature pathways in the human microbiome Note typically large abundance relative to other body sites Note relatively small % of pathway copies unclassified L-rhamnose degradation (RHAMCAT-PWY) emerged as a signature of the human gut microbiome across >900 first-visit HMP1-II metagenomes analyzed

Body site-specific signature pathways in the human microbiome Max area 2% relative abundance (other areas square-root scaled) signature for area i Q1( area i ) > Q3( area j ) for all j i; very stringent! 50 total signature pathways across 4 major body areas Values plotted = median (Q2) abundance for samples from that area 47

Which functions of microbiome are disrupted in IBD? Over six times as many microbial metabolic processes disrupted in IBD as microbes. If there s a transit strike, everyone driving a bus in Helsinki is disrupted, not everyone named Virtanen or Doe Phylogenetic distribution of function is consistent but diffuse During IBD, microbes... Stop Creating most amino acids Degrading complex carbs. Producing short-chain fatty acids Start Taking up more host products Dodging the immune system Adhering to and invading host cells

Confounding effects in real world data Biology is complicated, everything affects everything Scientist cannot control everything, in observational cohorts they are not even trying to Observed associations may be explained by confounding factors

Confounding effects in psychology Classical example: drowning incidents and ice cream sales are highly positively correlated Explanations Possibility #1: People drowning causes other people to purchase ice cream Possibility #2: Purchasing ice cream causes people to drown Possibility #3: There is a third variable (confounding variable) that causes the increase in both ice cream sales and drowning incidents The weather confounds the relationship between ice cream sales and drowning incidents Confounding variables are common in microbiome studies Lots of environmental factors affect the gut microbiome

Solution #1 post hoc checking of results Consumption of vegetables is correlated with species X Check if any other collected metadata, information about the study subjects, is correlated or associated with the consumption of vegetables No: you did not see any confounding factors but there still might be some Yes: Can you stratify your analyses to further confirm the finding E.g.: females consume more vegetables and have more species X Does the correlation hold with females/males only

Solution #2 Design and conduct a controlled experiment Consumption of vegetables is correlated with species X Design an experiment where subjects are randomly assigned to consume 1) a lot, or 2) no vegetables Control known confounders E.g. both groups contain same amount of males and females

Solution #3 Statistical modeling Test if the correlation / association holds after correcting for the confounding effects statistically Linear models easy to understand and computationally low cost

Lipid A biosynthesis in DIABIMMUNE infants

Typical microbiome community analysis tasks Metagenomic data Stats 16S data 55 55