Introduction to Microbiome Omics Technologies

Size: px
Start display at page:

Download "Introduction to Microbiome Omics Technologies"

Transcription

1 BICF Education Monthly Topics in Bioinformatics and Genomics BICF Astrocyte Workflows in Sequence Variation, RNASeq, ChipSeq, CRISPR BICF Data Resources Public Resources Bioinformatics skills for the bench scientist Nanocourses Introduction to R (Dec 6,7) GPU Programming (Dec 13,14) Intermediate R (Jan 31, Feb 1) GWAS Analysis (TBD)

2 Introduction to Microbiome Omics Technologies Brandi Cantarel, PhD 11/16/2016

3 Introduction to Metagenomics What is a microbiome, metagenome,the relationship between the micro biome and it s environment? Whole Community Sequencing Methods vs Traditional Culture Methods Genetic Diversity in a Microbiome Sampling, DNA Extraction and Sequencing Comparison of Sequencing Technologies Data quality: Error rates of sequencing, chimeras Differences in profiles depending on sampling, DNA extraction and sequencing Omics Technologies used to Study the Human Microbiome Targeting DNA Sequencing Whole Genome Shotgun Sequencing Transcriptomics Proteomics Metabolomics

4 Introduction to Metagenomics What is a microbiome, metagenome,the relationship between the micro biome and it s environment? Whole Community Sequencing Methods vs Traditional Culture Methods Genetic Diversity in a Microbiome Sampling, DNA Extraction and Sequencing Comparison of Sequencing Technologies Data quality: Error rates of sequencing, chimeras Differences in profiles depending on sampling, DNA extraction and sequencing Omics Technologies used to Study the Human Microbiome Targeting DNA Sequencing Whole Genome Shotgun Sequencing Transcriptomics Proteomics Metabolomics

5 What is a Microbiome? A term coined by Joshua Lederberg The ecological community of commensal, symbiotic and pathogenic microorganisms All plants and animals, from protists to humans, live in close association with microbial organisms. The hologenome theory proposes that the object of natural selection is not the individual organism, but the organism together with its associated microbial communities.

6 Emerging Microbiome Research Late 17th Century, Anton van Leeuwenhoek First metagenomicist who directly studies organisms from pond water and his own teeth 1920s Cell culture evolved, 16S rrna sequencing of cultural microbes Is an organism could not be cultured, it could not be classified

7 Traditional Culture Dependent Profiling It s estimated that only about <1% of microorganisms can be grown in culture Amann RI, Ludwig W, Schleifer KH. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev Mar;59(1): Review. PubMed PMID: ; PubMed Central PMCID: PMC Discrepancies observed: Number of organisms under microscope in conflict with amount on plates Cellular activities in situ conflicted with activities in culture Cells are viable but unculturable Even if all microbes could be grown in culture, it would be a daunting task to determine growth conditions for ALL microbes

8 What is a Metagenome? The term "metagenomics" was first used by Jo Handelsman, Jon Clardy, Robert M. Goodman, Sean F. Brady, and others, and first appeared in publication in A metagenome is the collection of genes in a microbial community. Metagenomics is the study of genetic material from an environmental sample Offers a culture independent methods

9 Earth Microbiome Project The Earth Microbiome Project is a proposed massively multidisciplinary effort to analyze microbial communities across the globe. The general premise is to examine microbial communities from their own perspective. lysis portal for visualization of all information.

10 Microbiomes in Extreme Environments The Extreme Microbiome Project (XMP) is a scientific effort to characterize, discover, and develop new pipelines and protocols for extremophiles and novel organisms.

11 Urban Microbiomes

12 Metaorganisms (Superorganisms) Animal bodies (including humans) are superorganisms. Composed of microbial and animal cells Microbes are important for digestion, immune development and other functions essential for survival

13 Microbiomes in Health Acne Antibiotic-associated diarrhea Asthma/allergies Autism Autoimmune diseases Cancer Dental cavities Depression and anxiety Diabetes Eczema Gastric ulcers Hardening of the arteries Inflammatory bowel diseases Malnutrition Obesity

14 Introduction to Metagenomics What is a microbiome, metagenome,the relationship between the micro biome and it s environment? Whole Community Sequencing Methods vs Traditional Culture Methods Genetic Diversity in a Microbiome Sampling, DNA Extraction and Sequencing Comparison of Sequencing Technologies Data quality: Error rates of sequencing, chimeras Differences in profiles depending on sampling, DNA extraction and sequencing Omics Technologies used to Study the Human Microbiome Targeting DNA Sequencing Whole Genome Shotgun Sequencing Transcriptomics Proteomics Metabolomics

15 Sequencing Technologies Ion Torrent 400bp reads Inaccuracies accumulated in homopolymer regions ~ $0.63/Mbp Hardware ~$70K/machine Low upfront and maintenance costs makes it attractive to independent labs Illumina HiSeq 150/200 bp reads $0.04 Mbp Hardware ~ $1M Used for WGS projects Illumina MiSeq 250/300 bp reads $0.05 Mbp Hardware ~ $125K Used for 16S projects 384 samples/run Desktop Sequencer

16 Third Generation Long-Read Sequence Technologies Pacific Biosciences Single Molecule Real Time (SMRT) Sequencing Very High Error Rate can reduced with consensus of reads Average read length > 1kb Great for Finishing Genomes by Ilumina/PacBio Hybrid Assembly ~$2/Mbp Oxford Nanopore MinIon A laptop powered sequencing Average Read Length 5.4kb Light weight and low power usage makes it interesting for in the field applications Potential for pathogen identification in ~ 4 hours in the clinic

17 Quality Control Negative Controls are the best way to identify microbial lab contamination Sequencing Errors Low Quality Bases Homopolymer Strings Too short trimmed reads Biological and Technical Replicates Helps to ensure group trends and identify sample mislabeling and possible compromised samples Knights D, Kuczynski J, Koren O, Ley RE, Field D, Knight R, DeSantis TZ, Kelley ST. Supervised classification of microbiota mitigates mislabeling errors. ISME J Apr;5(4): doi: /ismej Epub 2010 Oct 7. PubMed PMID: ; PubMed Central PMCID: PMC

18 Sampling Sampling Must be Standardized Samples should be collected with sterile instrumentation or swabs transported into a sterile tube without too much interaction with the environment stabilized depending on molecule of interest frozen in time HMP_Protocol_Version_9_ pdf

19 Sources of Contamination At Collection use sampling protocol Host DNA Environmental In the lab Use a negative control (water or stabilization buffer) sample to determine likely lab contaminiation Your microbiome covers is a cloud around your body

20 Introduction to Metagenomics What is a microbiome, metagenome,the relationship between the micro biome and it s environment? Whole Community Sequencing Methods vs Traditional Culture Methods Genetic Diversity in a Microbiome Sampling, DNA Extraction and Sequencing Comparison of Sequencing Technologies Data quality: Error rates of sequencing, chimeras Differences in profiles depending on sampling, DNA extraction and sequencing Omics Technologies used to Study the Human Microbiome Targeting DNA Sequencing Whole Genome Shotgun Sequencing Transcriptomics Proteomics Metabolomics

21 Relative Abundance vs Absolute Abundance Abundance of Chipmunks Absolute: 2 Relative: 40% Absolute: 2 Relative: 20%

22 Understanding Interactions between Microbial Communities and Environment Experimental and computational techniques are necessary to make inferences about the community: Community Structure Gene Content Expression Translation Metabolites

23 Marker Genes Allow For Taxonomic Profiling

24 Marker Genes Allow For Taxonomic Profiling Should be present in all prokaryotic organisms compared Vertically and slowly evolving Amplify-able with small set of universal primers Has an established database of reference sequences

25 rrnas as phylogenetic markers Ribosomal RNAs are present in all living organisms 16S present in all prokaryotes 18S present in all eukaryotes rrnas are vertically and slowly evolving Play a critical role in protein translation rrnas are relatively conserved and rarely acquired horizontally rrnas are amplify-able with small set of universal primers rrnas has an established reference database

26 rrna Reference Databases Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje Ribosomal Database Project: data and tools for high throughput rrna analysis Nucl. Acids Res. 42(Database issue):d633-d642; doi: /nar/gkt1244 [PMID: ] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41 (D1): D590-D596. DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G. L. Andersen Greengenes, a Chimera-Checked 16S rrna Gene Database and Workbench Compatible with ARB. Appl Environ Microbiol 72:

27 Other Marker Genes Intergenic Transcribed Spacer (ITS) RecA: Response to DNA Stress in Bacteria Cpn60: Chaperonin Database

28 Overall Analysis Pipeline Input Seq QC Barcode/Primer + Quality Trimming; Min Read Length Align Sequences to 16S Reference DB PCoA NMDS OTU Clustering Taxonomic Assignment Rarefaction Beta Diversity Alpha Diversity Stat Analysis

29 Alpha Diversity Species richness is a survey of the number of distinct organism in a community Rarefaction is a method to assess species richness Species evenness measures how equal the community ie 2 taxa each at 50% abundance vs 9 to 1 ratio. Alpha diversity is a measurement composed of richness and evenness.

30 Beta-Diversity Beta-diversity measures including absolute or relative overlap describe how many taxa are shared between habitats Beta diversity acts like a similarity score between populations, allowing analysis by sample clustering or, again, by dimensionality reductions such as PCA Beta diversity can be measured by simple taxa overlap such as Bray-Curtis dissimilarity

31 Unifrac A distance metric used for comparing biological communities It differs from distance metrics (Bray Curtis) as it incorporates phylogenetic distances (tree based) between observed organisms in the computation Weighted Unifrac also incorporates taxonomic abundances

32 Sample Comparison based on OTU Composition PCoA

33 Taxonomic Assessment using 16S 16S is targeted sequencing for a single gene which acts as a marker for organisms Pros Well established Relatively inexpensive $50-$100/sample Amplifies only bacteria not host or environmental fungi, plants, etc Cons Amplifies only bacteria not viruses, microbial fungi, archaea, etc Although can be paired with 18S and archaeal specific 16S Is based on a very well conserved gene, making it hard to resolve species and strains V-region choice can bias results

34 Taxonomic Assignment using WGS WGS (whole genome shotgun) aims to sequence the whole metagenome Pros Not biased by amplicon primer set Not limited to by conservation of the amplicon Can also provide functional information Cons Environmental contamination, including host More expensive - $1000+/sample Complex data analysis Requires high performance computing, high memory, high compute capacity

35 Taxonomic Assignment: Complex Analysis All of the organism mixed together It s hard to bin all of the reads from one organism (strain or species) for deconvolution Reads are short Reads can potentially share similarity to multiple taxa Lateral gene transfer Not all of the genes in a genome shares the same evolutionary history

36 Least Common Ancestor Taxonomic Assignment Reads can potentially share similarity to multiple taxa Least Common Ancestor allows for the taxonomic assignment when similarity is shared to multiple taxa Dependent on the taxonomic tree and similarity to genomes Remember there are different versions of bacterial taxonomy

37 Sources of Reference Genomes for Comparison

38 Strategies for Taxonomic Assignment of WGS Compositional Based Taxonomic Assignment GC Content Kmer based Sequence Alignment Based Taxonomic Assignment Diamond, BLAT/BLAST, Melt, Kraken/Centrifuge Maker Gene Based Taxonomic Assignment MetaPhlAn2 Phyloshift

39 WGS Taxonomy Assignment and Visualization Taxonomer Megan Tool with WGS taxonomic assignment (based on BLAST) and functional assignment MG-RAST Online tools with WGS taxonomic assignment and functional assignment MetaCRAM

40 Comprehensive Functional Databases KEGG eggnog/cog PFAM SEED used by MG-RAST MetaCyc Uniref

41 Specialized Functional Databases Antibiotic resistant genes Virulent factors Carbohydrate Active Enzymes Phage Proteases Transporters

42 Available Web-based Analysis Pipelines MG-RAST Preference given to public datasets Every easy to use EBI Metagenomics Includes data visualization and customizable samples comparisons DIAG JGI Integrated Microbial Genomes Includes data visualization and customizable samples comparisons CloVR Cloud-based workflow manager Can run pipelines on your desktop Available on the Academic Cloud

43 Many Paths for Functional Annotations Reads Assemblies ORFs Functional Annotation ORFs Functional Annotation Compare Gene Content

44 Functional Profiling High Throughput functional profiling comparison allows for gross comparisons of the functional capability of samples Broad functional categories tend to be very similar in an ecological niche Profiling relies on alignments to functionally characterized proteins Homologous proteins tend to have similar broad enzymatic function i.e. kinase, hydrolase, transferase However: Homology Same Biological Function

45 Metagenomics vs Metatranscriptomics Metagenomics can give insight into gene content. Metatranscriptomics can measure how expression (functional potential) changes in response to the environment Metatranscriptomics can also show which organism are the most functionally active.

46 Metatranscriptomics Isolate RNA Remove Ribosomal RNAs Sequence Sample Comparison Functional Annotation blastx QC

47 Metaprotomics Like metagenomics and metatranscriptomics, metaproteomics is complicated by the lack of a complete reference set In order to determine the protein sequence of peptide fragments, a metagenomic or reference genome database is necessary. Unlike sequencing, denovo protein prediction from MS/MS is not trivial. Contains a mixture of environmental and microbiome proteins

48 Omics Pipeline Human Stool Samples Density Centrifugation to Extract Bacterial Cells Protein Digestion Genomic DNA DNA Extraction 454 and HiSeq 2000 Protein Extraction for Mass Spectrometry ~ 83K spectra / sample Filter union RP SCX RP 2D LC-MSMS SEQUEST Search ~ 1M reads / sample Metagenomic Annotation Pipeline Protein Database Cantarel et al. (2011) PLoS One 6: e27173

49 Peptide Spectral Matching Duncan MW, Aebersold R, Caprioli RM. The pros and cons of peptide-centric proteomics. Nat Biotechnol Jul;28(7):

50 meta-metabolomics Animal and environmental metabolomic studies are (meta)metabolomics it is difficult to know who produced a particular metabolite.

51 meta-metabolomics Marcobal A, Kashyap PC, Nelson TA, Aronov PA, Donia MS, Spormann A, Fischbach MA, Sonnenburg JL. A metabolomic view of how the human gut microbiota impacts the host metabolome using humanized and gnotobiotic mice. ISME J Oct;7(10): doi: /ismej Epub 2013 Jun 6. PubMed PMID: ; PubMed Central PMCID: PMC