MICROBIOMICS Current and future tools of the trade Ingeborg Klymiuk Core Facility Molecular Biology ZMF - CENTER FOR MEDICAL RESEARCH Medical University Graz
MICROBIOMICS DEFINITION OF OMIC TECHNOLOGIES OMIC technologies make up a holistic view of the molecules that make up a cell, tissue, sample or organism OMICS: universal detection of genes (genomics), mrna (transcriptomics), proteins (proteomics), lipids (lipidomics) and metabolites (metabolomics) in a specific biological sample Non-target, non-biased manner (bias from study design to analysis that can impact results) approaches for biomarker discovery
culturomics selective growth antibiotic assay enzymatic assay X ProteOMICS Y A Targeted amplicon 16s amplicon NGS LEA-Seq PhyloChips MetabolOMICS MetagenOMICS LipidOMICS TranscriptOMICS
Conduction a microbiome study The approaches to study the human-associated microbial communities are increasing DNA or RNA based analysis community surveys: (descriptive) indentification of microorganism (OTUs - operational taxonomical units); differences in OTU composition between sample groups indentification of genes: functional identification of genetic potential, gene richness detection of rare OTUs, minor species: depth requirements transcriptional activity and functionality various constituents of a microbial community, such as eukaryotes, viruses and various groups of bacteria live dead discrimination: propidium monoazide (PMA) duration, costs and sample volume of analysis biological question Study subjects and controls Sampling Sample storage NS extraction (DNA, RNA) PCR, libraray preparation sequencing pipeline specific analysis diversity analysis classification, clustering, modeling OMICS data analysis Deposit data, share Goodrich et al. Cell 158, July 17, 2014
Conduction a microbiome study 16s amplicon based approach Who is there? descriptive view of microbiome diversity assess the general composition of the microbiota economical and therefore scale to large projects bacterial, archaeal, fungal diversity complex bacterial communities Metagenomics What can they do? portrays functional potential of microbiome gene content bacteria, archaea, fungi, viruses human/host background Metatranscriptomics What are they doing? describes active gene expression elucidates the active members bacteria, archaea, fungi, viruses human/host background
X Y A Targeted amplicon 16s amplicon NGS LEA-Seq PhyloChips MetagenOMICS TranscriptOMICS
(16s) targeted amplicon one or a few marker genes and use these markers to reveal the composition and diversity of the microbiota 16s rrna gene highly conserved between different species of Bacteria and Archaea the internal transcribed spacer (ITS) region of the rrna fungi (Bellemain et al. 10, Bokulich et al. 13) beside conserved primer binding regions hypervariable regions provide species-specific signature sequences primer choise, variable region, effect of experimental setup: PCR amplification, -cycle number, - condition, depth of analysis, platform used fro sequencing..effects on results large databases of reference sequences and taxonomies (such as greengenes - DeSantis et al.06, SILVA - Quast et al. 13 and the Ribosomal Database Project - Cole et al. 09) risk of misclassification
(16s) targeted amplicon workflow DNA Isolation amplicon preparation indexing, purification and pooling sequencing 1 100 200 300
(16s) targeted amplicon workflow DATA ANALYSIS Galaxy: mothur, qiime (optional) combine two sets of reads Quality filtering and trimming Pick OTUs with uclust (similarity 0.97) Taxonomy assignment Representative sequence alignment PCR bias, Chimera removal (chimera.uchime) Phylogenetic classification Alpha diversity (Chao, shannon, eveness, richness) Beta diversity (UniFrac, Bray-Curtis, Euclidian, Pearson) Diversity analysis, sample and group comparison of microbial communities (multi-variant data Goodrich et al. Cell 158, July 17, 2014 analysis)statistics and visualization (multi variant data analysis) Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. (2013): Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Applied and Environmental Microbiology. 79(17):5112-20.
innovations for high-throughput amplicon sequencing PCR and sequencing introduce sequence errors and sampling bias poor estimates of microbial diversity the amplification of non-target DNA may results in inefficiencies to represent the microbial pattern low diversity samples are problematic on the Illumina system 1) increase amplicon diverisity by spike sequencing runs with shared genomic DNA (PhiX174) 2) increase diversity by heterogenity spacer, frameshift nucleotides 3) low-error amplicon sequencing: LEA-Seq.. Lundberg et al nature methods 2013 Faith et al Science 2013 Fadrosh et al Microbiome, 2014
PhyloChip G3 PhyloChip G3 is a microbial community assessment tool that can simultaneously track highabundance and low-abundance bacterial and archaeal taxa; currently available through the Second Genome Inc. (San Francisco, CA) Microarrays based technology with a high chip-to-chip reproducibility 16s full length amplicon Bacteria: 27F 5 -agagtttgatcctggctcag-3, 1492R 5 -ggttaccttgttacgactt-3 Archaea: 4Fa 5 -tccggttgatcctgcccg-3, 1492R 5 -ggttaccttgttacgactt-3
PhyloChip G3 Moissl-Eichinger et al 2014 PhyloChip G3 25-mer oligos, 1,100,000 probes and analysis for 59,959 OTUs overcome the depth bias - no saturation effect detection of rare OTUs; down to subspecies level resolution Mainly used for environmental samples (space crafts, clean rooms, hot pring systems, ) Human high diversity samples, gut samples, high resolution and sensitivity desired tackling the minority Used in varius publications characterizing thediseased human intestinal tract, gastric samples,
MetagenOMICS the unbiased direct sequencing of the microbiomes, genomes of all microorgansims in a sample 10 14 microorganisms inhabiting the human gut ensemble of the genomes of human-associated microorganisms providing much richer data on the functional potential present in microbial community genomes sacrifice resolution Extend the analysis to other organisms than bacteria MetaHIT: defined a gene catalogue of 3.3Mio non redundant gut microbial genes by metagenomics defining 100 times more genes than encoded by the human host genome
MetagenOMICS - workflow DNA Isolation libraray preparation indexing, purification and pooling sequencing
MetagenOMICS - workflow Reference based: MIRA, Newbler De novo: SOAP, Velvet 16s phylogeny sequencing QC QC assembly Binning MG-RAST, a metagenome annotation system draft genome Annotation, gene prediction Statistical analysis Data storage Metadata Data sharing
MetagenOMICS integrating microbial membership with biomolecular potential and activity in the human intestine eliminates danger of PCR/Primer bias (missed phyla) Genomes form the same species (by 16s) can have large genomic differences outside the 16s region, can have different sets of gene clusters includes fungal (<1% stool) and viral (1*10-5 % 2% stool) sequences; immense biological function (reviewed in Morgan et al Gastroenterology 2014) Lepage et al Gut 2013 Dutilh et al Nature 2014
MetatranscriptOMICS Is a gene active? Is a gene higher expressed than another gene in the same sample/treatment Is a gene differentially expressed in response to experimental conditions? only a subset of the present genes are expressed rare species might be highly transcriptional active characterize the complete collection of transcribed sequences in a microbial community How communities respond to changes in their environment Analysis of the active fraction of the community Does gene expression change over time?
MetatranscriptOMICS-workflow RNA is less stable rrna, trna deplition (stool: 98%) (MicrobExpress kit; LifeTech) mammalian (host cell) RNA selectively removed; high relevance for e.g. biopsy samples Higher amount of starting material is necessary or amplification step Microbial Community Total RNA extraction mrna enrichment cdna synthesis optional: amplification Libraray preparation for high throughput sequencing Metatranscriptome Franzosa et al 2014 PNAS mod. from Warnecke et al 2009
Which Technology/System to choose? Illumina HiSeq2000/2500 ~2bil. reads á 2x125b ChIP-Sequencing medip-sequencing read numbers Illumina NextSeq500 ~400Mio. reads á 2x1500bp Transcriptomics (e.g. Tag sequencing, RNA Seq) ncrna transcriptomics Illumina MiSeq ~25Mio. reads á 2x300bp Exome Sequencing Microbiome Studies/Metagenomics de-novo Sequencing Transcriptome-wide fulllength cdna Sequencing Amplicon Sequencing Whole Genome Re-Sequencing Haplotyping Roche 454GS FLX 1.3Mio reads á 700b Pac. Biosciences 50k reads 5/8.5kb-20kb! read length
Illumina: sequencing by synthesis Output scalable: MiSeq, HiSeq, NextSeq High multiplexing capacity Read length increasing (300bp MiSeq,150 HiSeq) Low error rates Cost efficient system (about 1400 for a MiSeq run-384 samples)
Pacific Bioscience: SMRT Cell small genome, bacteria, archaea metagenomics no multiplexing capacity One SMRT cell 200-400MB Output www.pacificbiosciences.com
Nanopores The future in DNA sequencing Single moelcule sequencing Direct RNA molecule sequencing High error rate (13%-15%)
X ProteOMICS Y A MetabolOMICS LipidOMICS
ProteOMICS MALDI-TOF MS mass spectrometriy techniques offer methods to directly analyze small molecules determine lipids, metabolites and proteins MALDI-TOF: routine application for bacterial classification (DSMZ) recording of mass spectra of large biomolecules (mainly ribosomal proteins) mass spectrometric fingerprints: identification of bacteria, yeasts and fungi by comparison with reference databases http://www.mayomedicallaboratories.com/articles/communique/2013/01-maldi-tof-massspectrometry/index.html
LipidOMICS / MetabolOMICS Metabolomics and metabolome profiling for disease biomarkers Understand rare taxa and taxa with genomic variations Rare taxa can have important metabolic activities Metabolomics provide o picture of metabolism rather than the potential of metabolism short-chain fatty acids (SCFAs) derived from microbial metabolism in the gut play a central role in host homeostasis Ursell et al. Gastroenterology 2014
Mass spectrometry
MetabolOMICS Semi - Targeted Methods RT: 2,68-18,86 3,68 100 95 90 85 80 75 70 16:0 18:0 4,93 5,25 18:2 5,87 18:1n9 NL: 2,28E7 TIC F: + c Full ms [80,00-400,00] MS L12 Relative Abundance 65 60 55 50 45 40 35 20:4 9,52 30 25 20 15 15:0 IntStd 3,21 18:1n11 22:6 17,31 10 5 0 3,94 9,01 4,25 6,73 7,36 8,32 9,73 11,12 11,66 12,15 14,08 14,16 14,83 16,31 16,57 17,47 17,90 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Time (min)
culturomics selective growth antibiotic assay enzymatic assay X A Y
CulturOMICS it is commonly accepted that c. 80% of the bacterial species found by molecular tools e.g. in the human gut are uncultured or even unculturable (Turnbaugh et al 2007) the German national academy Leopoldina in Berlin has recently recommended increasing the effort in taxonomic research taxonomy is important for medicine, food technology and agriculture, for an optimal understanding and application of microorganisms pure cultures are mandatory for taxonomic assessment CulturOMICS is an approach allowing extensive assessment of the microbial composition by highthroughput culture complements the metagenomic analysis and overcomes the depth bias and DNA isolation/pcr bias
CulturOMICS define high throughput culture conditions (212 different ) high throughput automized colony screening indentification by MALDI-TOF compariosn of culturomics taxa with those found by 16s amplicon sequencing 31 new bacterial species and genera (http://www.ebi.ac.uk/embl/submission/index.html) culturomics approach yielded 340 bacterial species, seven phyla, 117 genera pyrosequencing identified 282 species, six phyla, 91 genera 51 phylotypes overlap between the methods definition of most efficient culture conditions characteriation by functionality and enzyme activity possible minority population can have a substantial effect on the ecology of the gut microbiota and on human health later use of indetified sepcies as probiotica
Challenge & impact of OMICs Bioinformatics Proteomics Lipidomics Metabolomics Genomics Hyb Amplicon Metagenome Transcriptome Owyang & Wu 2014
Challenge & impact of OMICs Integrative Bioinformatics Analysis & Methods Functional analysis Candidate prioritization Biomarker identification Drug target discovery candidate list Omics Reiss et al. 2011, Host Cell & Microbe
MICROBIOMICS
THANK YOU! Ingeborg Klymiuk Core Facility Molecular Biology ZMF - CENTER FOR MEDICAL RESEARCH Medical University Graz
Illumina Sequencing Systems http://systems.illumina.com/systems/sequencing.html
16s pros and cons Advantage Present in all bacteria and archaea (91). Contains highly conserved regions suitable for universal primer design (37). Contains regions of high variability suitable as unique identifiers (42). Disadvantage Present in multiple copy numbers through most organisms (91), which may lead to overestimation of the abundance of some organisms. Small number of organisms do not display as much conservation through these regions leading to primer bias (37). Regions of variability are occasionally insufficient to provide species-level resolution, and may be biased toward certain species (39,42). Numerous well-curated databases allowing sequence comparison and taxonomic assignment of organisms (45). Many databases contain sequences with errors (45). Well-studied primer pairs available, which are capable of amplifying most organisms with high specificityfor bacteria (38). May lack specificity for certain bacterial groups and result in inaccurate estimations of community composition (38).