CBC Data Therapy. Metagenomics Discussion

Similar documents
CBC Data Therapy. Metatranscriptomics Discussion

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Experimental Design Microbial Sequencing

Applications of Next Generation Sequencing in Metagenomics Studies

Bioinformatics for Microbial Biology

Microbiome: Metagenomics 4/4/2018

dbcamplicons pipeline Amplicons

Infectious Disease Omics

dbcamplicons pipeline Amplicons

COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Introduction to Bioinformatics analysis of Metabarcoding data

Microbiomes and metabolomes

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

NGS part 2: applications. Tobias Österlund


Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine

Joint RuminOmics/Rumen Microbial Genomics Network Workshop

Using New ThiNGS on Small Things. Shane Byrne

Report on database pre-processing

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Bioinformatic tools for metagenomic data analysis

Robert Edgar. Independent scientist

Metagenomics Computational Genomics

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Fungal ITS Bioinformatics Efforts in Alaska

What is metagenomics?

Diversity Profiling Service: Sample preparation guide

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State

Parts of a standard FastQC report

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Conducting Microbiome study, a How to guide

Metagenome Analysis With MG- RAST

Next Generation Sequencing. Tobias Österlund

Name: Ally Bonney. Date: January 29, 2015 February 24, Purpose

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Functional annotation of metagenomes

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

Diversity Profiling Service: Sample preparation guide

HMP Data Set Documentation

1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute

Computing for Metagenome Analysis

Introduction to Microbial Sequencing

SCGC Service Description For further information, go to scgc.bigelow.org Updated June 1, 2018

Matthew Tinning Australian Genome Research Facility. July 2012

RHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS. Jiarong Guo

Supplementary Figure 1 Schematic view of phasing approach. A sequence-based schematic view of the serial compartmentalization approach.

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou Nucleic Acids Research

Molecular methods to characterize the microbiota in the mouse tissues

Assembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

* Custom assays developed based on customer requirements. * Rigorous QC process to ensure test performance and accuracy

Next generation sequencing in diagnostic laboratories: opportunities and challenges

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Next-Generation Sequencing Services à la carte

Introduction to Microbiome Omics Technologies

Contact us for more information and a quotation

De novo assembly in RNA-seq analysis.

(SHOTGUN) METAGENOMICS. Hélène Touzet, CNRS, CRIStAL

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Introduction to the MiSeq

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

16s Metagenomic Analysis Tutorial Max Planck Society

IMGM Laboratories GmbH. Sales Manager

Deep Sequencing technologies

Third Generation Sequencing

Introduc)on to QIIME on the IPython Notebook

METAGENOMICS. Aina Maria Mas Calafell Genomics

Next Generation Sequencing for Metagenomics

MICROBIOMICS Current and future tools of the trade

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

SUPPLEMENTARY INFORMATION

FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING. From wet lab to dry lab complete sample analysis

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Analysing genomes and transcriptomes using Illumina sequencing

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Analytics Behind Genomic Testing

Next-Generation Sequencing. Technologies

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada June 26, 2017

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University

Next Generation Sequencing

SANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome

How much sequencing do I need? Emily Crisovan Genomics Core

Transcription:

CBC Data Therapy Metagenomics Discussion

General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis

Marker Genes Extract DNA Amplify with targeted primers Filter errors, build clusters Diversity analysis

Metagenomics Extract DNA Sequence random fragments QC, assemble, annotate Diversity, function analysis

Metatranscriptomics Extract RNA, subtract rrna Sequence cdna QC Gene expression, function

Sequencing Sanger Ion Torrent Roche 454 Illumina *Seq Pacific Biosciences Nanopore

Resources (16S) RDP II: Cole et al. NAR (2013) rrndb: Stoddard et al. NAR (2016) SILVA: Quast et al. NAR (2015)

Resources (Genomes) GenBank Genomes PATRIC (host and bacterial) GOLD (JGI metagenomes) Ensembl Genomes

Resources (Metagenomes) EBI metagenomics MG-RAST HMP DACC

Resources (Function) KEGG CARD UniProtKB Gene Ontology

General Challenges/Considerations Sequencing errors Error rates, error type (PacBio: 10% random, Illumina 0.1% substitution) Chimeras Amplification artifacts, cloning of restriction fragments 16S: different V regions give different results Different sequencing platforms / sampling conditions ALSO give different results Workflow complexity / plethora of tools

General Challenges/Considerations Strain-level diversity in metagenomes will often be missed by amplicon (esp. short-read) and shotgun approaches This may be especially important between samples Taxonomy Database predictions (RDP) Functional Annotation Coverage versus accuracy

Marker Genes Eukaryotic Organisms (protists, fungi) 18S (http://www.arb-silva.de) ITS (http://www.mothur.org/wiki/unite_its_database) Bacteria CPN60 (http://www.cpndb.ca/cpndb/home.php) ITS (Martiny, EnvMicro 2009) RecA gene Viruses Gp23 for T4-like bacteriophage RdRp for picornaviruses Faster evolving markers used for strain-level differentiation

Marker Genes Focus on contamination reduction during preparation 16S rrna contains 9 hypervariable regions (V1-V9) V4 was chosen because of its size (suitable for Illumina 150bp paired-end sequencing) and phylogenetic resolution Different V regions have different phylogenetic resolutions giving rise to slightly different community composition results Sequencing: MiSeq capacity allows multiple samples to be combined into a single run Number of reads needed to differentiate samples depends on the nature of the studies Unique DNA barcodes can be incorporated into your amplicons to differentiate samples

Marker Genes QIIME (http://qiime.org) Mothur (http://www.mothur.org)

Bioinformatics Overall Bioinformatics Workflow Sequence data (fastq) 1) Preprocessing: remove primers, demultiplex, quality filter, decontamination 2) OTU Picking / Representative Sequences Inputs Metadata about samples (mapping file) 3) Taxonomic Assignment 4) Build OTU Table (BIOM file) 5) Sequence Alignment 6) Phylogenetic Analysis Processed Data OTU Table Phylogenetic Tree Outputs 7) Downstream analysis and Visualization knowledge discovery

QIIME versus MOTHUR QIIME A python interface to glue together many programs Wrappers for existing programs Large number of dependencies / VM available More scalable Steeper learning curve but more flexible workflow if you can write your own scripts http://www.ncbi.nlm.nih.gov/pubmed/24060131 Mothur Single program with minimal external dependency Reimplementation of popular algorithms Easy to install and setup; work best on single multi-core server with lots of memory Less scalable Easy to learn and works the best with built-in tools http://www.mothur.org/wiki/miseq_sop

Metagenomics Goal: Identify the relative abundance of different microbes in a sample given using metagenomics Problems: Reads are all mixed together Reads can be short (~100bp) Lateral gene transfer Two broad approaches 1. Binning Based 2. Marker Based

Metagenomics Attempts to bin reads into the genome from which they originated Composition-based Uses GC composition or k-mers (e.g. Naïve Bayes Classifier) Generally not very precise and not recommended Sequence-based Compare reads to large reference database using BLAST (or some other similarity search method) Reads are assigned based on Best-hit or Lowest Common Ancestor approach

LCA Use all BLAST hits above a threshold and assign taxonomy at the lowest level in the tree which covers these taxa. Notable Examples: MEGAN: http://ab.inf.uni-tuebingen.de/software/megan5/ One of the first metagenomic tools Does functional profiling too! MG-RAST: https://metagenomics.anl.gov/ Web-based pipeline (might need to wait awhile for results) Kraken: https://ccb.jhu.edu/software/kraken/ Fastest binning approach to date and very accurate. Large computing requirements (e.g. >128GB RAM)

Metagenomic Assembly MetaSPAdes showed the overall best assembly size statistics while also capturing a relatively large fraction of the expected diversity. The usage of this tool is relatively simple and convenient, being basically identical to that of SPAdes, and largely flexible regarding the format of the input data. A drawback may be the reduced sensitivity for micro diversity. However, for the majority of metagenome research questions, accurate and representative consensus genomes of species should be more than sufficient.