Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Similar documents
Infectious Disease Omics

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

Bioinformatic tools for metagenomic data analysis

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Human Microbiome Project: First Map of the World Within Us. Hsin-Jung Joyce Wu "Microbiota and man: the story about us

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine

Introduction to OTU Clustering. Susan Huse August 4, 2016

NCBI web resources I: databases and Entrez

Single Cell Genomics

Microbiome Analysis in Kawasaki Disease. Kristine M. Wylie, Susan C. Baker, George M. Weinstock, Stanford T. Shulman, and Anne H.

MICROBIOME SOFTWARE: END OF BEGINNING.

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

Genetics Lecture 21 Recombinant DNA

Microbial Diversity and Assessment (III) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Alignment-free d2 oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences

Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture

Introduction to Bioinformatics

Introduction to Bioinformatics

METAGENOMICS. Aina Maria Mas Calafell Genomics

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Next Generation Sequencing for Metagenomics

Gene Expression Technology

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Welcome to the NGS webinar series

Novel bacterial taxa in the human microbiome

They are similar to one another but different from other species: They are capable of breeding: Artificial classification: Natural classification:

Chapter 12: Human Microbiome Analysis

Next Gen Sequencing. Expansion of sequencing technology. Contents

Metagenome Analysis With MG- RAST

DNA Technology. Asilomar Singer, Zinder, Brenner, Berg

Contents. 1 Basic Molecular Microbiology of Bacteria... 1 Exp. 1.1 Isolation of Genomic DNA Introduction Principle...

Lecture Four. Molecular Approaches I: Nucleic Acids

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Metagenomic Analysis in Human- Associated Projects

Probes can be designed in an evolutionary hierarchy

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics. A proposal to the Gordon and Betty Moore Foundation

The Human Microbiome: A New Paradigm? John Huss Department of Philosophy The University of Akron

INTRODUCTION TO REVERSE TRANSCRIPTION PCR (RT-PCR) ABCF 2016 BecA-ILRI Hub, Nairobi 21 st September 2016 Roger Pelle Principal Scientist

A proposal to the Gordon and Betty Moore Foundation

SHAMAN : SHiny Application for Metagenomic ANalysis

Theory and Application of Multiple Sequence Alignments

2054, Chap. 14, page 1

ABSTRACT METHODS FOR MICROBIAL GENOMICS. Professor Steven L. Salzberg Department of Computer Science

Assigning Sequences to Taxa CMSC828G

BIO 205 Microbiology with Lab (Title Change ONLY Oct. 2013) Course Package. Approved December 10, 2004 Effective Spring 2005

DNA sequencing. Course Info

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

How much sequencing do I need? Emily Crisovan Genomics Core

Genome Sequence Assembly

Chapter 10: Classification of Microorganisms

Etienne Ruppé AP HP, Hôpital Bichat Claude Bernard UMR 1137 IAME

ADN environnemental et biodiversité

2054, Chap. 13, page 1

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Functional vs Organismal views of Ecology. One organism: Population genetics Many organisms: Ecology No organisms: Ecosystems

Mate-pair library data improves genome assembly

Suggest a technique that could be used to provide molecular evidence that all English Elm trees form a clone. ... [1]

Introduction to Bioinformatics and Gene Expression Technologies

RNA-Seq with the Tuxedo Suite

Assembly Biosciences Jefferies 2015 Microbiome Summit. December 16, 2015

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

Bioinformatics and computational tools

CHAPTER 21 LECTURE SLIDES

Patentability/Literature Research

Lecture 25 (11/15/17)

Phylogenetic methods for taxonomic profiling

Bioinformatic methods for the analysis and comparison of metagenomes and metatranscriptomes

Introducing new DNA into the genome requires cloning the donor sequence, delivery of the cloned DNA into the cell, and integration into the genome.

Technical University of Munich Chair for Biofunctionality. Gut microbiology PRACTICAL COURSE Dr. Thomas Clavel

Analysing genomes and transcriptomes using Illumina sequencing

Course Descriptions. BIOL: Biology. MICB: Microbiology. [1]

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017

O C. 5 th C. 3 rd C. the national health museum

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Bio 101 Sample questions: Chapter 10

Title: Genome sequence of lineage III Listeria monocytogenes strain HCC23

Microorganism regulated mechanisms of temperature effects on the performance of anaerobic digestion

Chapter 8 Recombinant DNA Technology. 10/1/ MDufilho

BIOTECHNOLOGY OLD BIOTECHNOLOGY (TRADITIONAL BIOTECHNOLOGY) MODERN BIOTECHNOLOGY RECOMBINANT DNA TECHNOLOGY.

3 Designing Primers for Site-Directed Mutagenesis

Technical note: Molecular Index counting adjustment methods

MICROBIO, IMMUN, PATHOLOGY-MIP (MIP)

Computational Biology I LSM5191

Molecular Biology: DNA sequencing

DNA Transcription. Dr Aliwaini

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

REPORT DOCUMENTATION PAGE

Water Quality and Waller Creek Dr. Kinney & UTBIOME Collaborators. What is in Waller Creek? A Wide Variety of Biota!

COMPAS for the Analysis of SELEX Experiments

IMPACT OF DIET ON MICROBIOTA COMPOSITION AND FUNCTION IN THE SMALL INTESTINE Microbiome Drug Development Summit

Transcription:

METAGENOMICS

Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary biologists think he was crazy

16S rrna One of the three structural RNAs that compose the prokaryotic ribosome (together with 5S and 23S, and with 52 proteins) Eukaryotic ribosomes are composed by 79 proteins and 4 rrnas (5S, 5.8S, 28S, 18S) S stands for...

16S rrna S stands for SVEDBERG After Theodor Svedberg, winner of the 1926 Nobel Prize in chemistry inventor of the ultracentrifuge It measures how fast a molecule precipitates in an ultracentrifuge, 1S=10-13 seconds

16S rrna Translation is one of the most conserved biological processes Ribosomal RNAs and proteins tend to be very similar even in very different organisms

16S rrna Woese used this characteristics to compare the ribosomal sequences across eukaryotes and prokaryotes and to Redraw the tree of life

16S rrna contains highly conserved regions, but also Variable regions (V1-V9), in those parts of the molecule that are not fully constrained by the function in the ribosome

16S rrna we can design PCR primers in the conserved regions And use them to amplify and sequence the variable regions This method called universal bacterial PCR, or panbacterial PCR allows to - amplify (almost) all bacteria - identify them based on V regions

16S panbacterial sequencing is at the basis of microbial ecology studies And it gave us the possibility to understand that bacteria are everywhere

Microbiome At the interface between bioinformatics, microbiology & ecology

The field of study named MICROBIOMICS Includes all the strategies to study the MICROBIOTA Microbiota = the entire microbial community of an habitat Microbiome = the genomes of a microbiota

How many human cells are in my body?

How many human cells are in my body? 30 trillions= 3 1013

How many human cells are in my body? 30 trillions= 3 1013 How many cells are in my body?

How many human cells are in my body? 30 trillions= 3 1013 How many cells are in my body? 70 trillions= 7 1013 This means that over 50% of the cells that compose my body are NOT HUMAN! (different studies suggest variable values, from 50 to 90%

JEFFREY GORDON Identified correlations between obesity and the microbiota He colonized germ-free twin mice, one with 'lean microbiota' and one with 'obese microbiota' Nature 2006

MICROBIOME STUDIES human microbiome studies (gut, mouth...) Strong influence of the microbiota in: Immunity Infections Metabolism Happyness (serotonin)

Fecal microbiome transplantation (FMT) The medical procedure of taking stool from an healthy donor and transplant it in a patient who has a difficult to eliminate infection, mainly from Clostridium difficile FMT is currently considered an experimental treatment, but it has shown extremely promising results

MICROBIOME STUDIES Ecology Agriculture Wastewater Biofuels Animals Plants

MICROBIOME STUDIES Filarial nematodes cause terrible diseases Symbiotic bacteria found in filarial nematodes They are necessary for the host survival Antibiotic treatment kills the symbionts --> the worm dies patients are cured

MICROBIOME STUDIES Microbiome of a hydrotermal vent 2,000 m deep between Norway and Greenland Found a novel phylum of Archea that could be the missing link between prokaryotes and eukaryotes

Microbiome 1.0 Amplicon sequencing

How to study microbiomes? Classical approach 1. PCR with panbacterial primers to amplify a variable region of the 16s rrna 2. Clone the PCR products 3. Sanger sequencing of a number of clones 4. Analysis of the obtained sequences

Next-gen sequencing can be used for microbiome studies 1. PCR with panbacterial primers (variable region of the 16s rrna) 2. Use of the PCR product as a template for the next-generation sequencing (many more reads) 3. bioinformatic analysis

What technology? When we choose the right Next-gen technology for an experiment, the two main questions are How long must the reads be? How many reads do we need?

How long must the reads be? In metagenomics, longer reads are better Longer reads = sequence more variable regions = more discriminatory power

How many reads do we need? Increasing the number of sequences obtained, it will be possible to identify a greater number of taxa After a certain number of sequences, depending upon the complexity of the community of the sample, a plateau will be reached

How many reads do we need? Enough reads to detect all the taxa present in the chosen sample Strongly depends on the complexity of the microbial community Human gut very complex Soil sample extremely complex Arthropod sample simple Anyway, less reads than genome sequencing projects

What technology? We need long reads, and we need few reads

What technology? We need long reads, and we need few reads It used to be Sanger, then it became 454 (long reads) Now even metagenomic studies are performed with Illumina

What technology? Now even metagenomic studies are performed with Illumina The quality, quantity and low price of Illumina sequences wins even if the method is not optimal for microbiome studies The current microbiomics approach are thus tailored for Illumina reads a high number of short reads In the future 3rd gen methods will allow to sequence the entire 16S rrna... and the bioinformatics methods will change accordingly

Multiplexing The output of Next-Gen technologies is excessive for metagenomics Physical multiplexing was invented for 454 The plate is divided into up to 16 spaces, to load 16 different samples

Multiplexing Physical multiplexing helps but... No more than 16 samples Physical space of the sequencing plate is lost not productive

Biochemical multiplexing

Biochemical multiplexing During library preparation Ligation of the Adapters + a barcode sequence BARCODE: a short sequence (usually 8nt) A different BARCODE is associated to each sample

Biochemical multiplexing A different BARCODE is associated to each sample Up to 96 samples (currently, but it depends on the technology) can be pooled in one sequencing run The sequences of each sample are then separated downstream with bioinformatic tools that recognize the sequence of each barcode The bioinformatics is very simple: the Illumina software discriminates barcodes

BIOINFORMATICS of 16S metagenomics 0. sequencing 1. Sample reduction 1.1 Quality control 1.2 Identical sequences merge 2 Selection of homologous 16S sub-sequences 3. OTUs classification 3.1 OTUs identification 3.2 OTUs annotation

BIOINFORMATICS of 16S metagenomics 0. SEQUENCING: Barcoded paired-ends reads are generated Barcoded so multiple samples can be sequenced at once Paired-ends so we have more information (more nts per read) Paired-ends for metagenomics are constructed so that they have an overlap This allows merging of the two paired end reads and generation of a longer sequence Read 1-250nt Overlap - 100nt Merged read 400nt Read 2-250nt

BIOINFORMATICS of 16S metagenomics 1. Sample reduction 1.1 Quality control: Paired-end assembly and quality check Each pair of paired-end reads is assembled Read pairs with mismatches number greater than a threshold are removed The threshold is usually ZERO Assembled reads with length lower than a threshold are removed

BIOINFORMATICS of 16S metagenomics 1. Sample reduction 1.2 Identical sequences are merged Selected assembled reads are compared all versus all and identical reads are merged, just one is mantained in the reads dataset and infomation about merging in stored in a text file This step decreases the amount of reads subjected to the next analyses, reduncing the CPU power and time required to complete the analysis (metagenomics can be time consuming!) Simple optimized informatic ALGORITHMs is used (we only look for sequences that are exactly identical)

BIOINFORMATICS of 16S metagenomics 2 selection of homologous 16S sub-sequences Select the subset of reads to use for OTUs identification (see the next slide) In order to compare the reads we must calculate the reads pairwise nucleotide distances but an allvsall approach is not feasible (Np-complete problem) To do it we need to compare homologous regions but the Illumina reads are too short to cover the entire V4 region of the 16S gene we have to select a gene sub-region and perform the analysis using only the reads that align on that V4 sub-region

BIOINFORMATICS of 16S metagenomics 3.1 OTUs identification The selected reads (homologous) are aligned all versus all and clustered on the basis of nucleotide similarity reads with nucleotide distance lower than a threshold (usually 3% for the V4 region of the 16S rdna) are grouped in the same cluster Each cluster is defined as a Operative Taxonomic Unit (OTU) Why OTUs and not species? What is a species?

BIOINFORMATICS of 16S metagenomics 3.1 OTUs identification Why OTUs and not species? The concept of species is difficult to define Derives mostly from historical and phenotypical reasons The concept of OTUs is - standardized (good) - free of phenotypical meaning (good and bad) A single species can comprise multiple OUTs Multiple OTUs can belong to a single species

BIOINFORMATICS of 16S metagenomics 3.2 OTUs Annotation All the reads belonging to an OTU are used to generate a consensus sequence: a sequence that contains, at each position, the base with the highest frequency in the alignment The consensus is compared to a 16S database and annotated

BIOINFORMATICS of 16S metagenomics 3.2 OTUs Annotation Reads taxonomic annotation The OTUS are aligned against a manually curated alignment of 16S sequences, representative of the known bacterial diversity (several 16S alignments are available on database, the most used are SILVA and RDP)

SOFTWARES FOR 16S METAGENOMICS PERFORM ALL THE STEPS 1. Sample reduction 1.1 Quality control 1.2 Identical sequences merge 2. Selection of homologous 16S sub-sequences 3. OTUs classification 3.1 OTUs identification 3.2 OTUs annotation MOTHUR Knowledge of the process is fundamental to interact with the software, set parameters and analyze results The pipelines are highly customizable

BIOINFORMATICS of 16S metagenomics Finally the output! Note: OTU Sample1 Sample2 Sample3 OTU1 34 102 56 OTU2 5 306 78 OTU3 23 45 98 OUT4 12 36 112............ The number of OTU does not correspond to the number of species! More OTUs can be assigned to the same species Note2: Discrimination at the species level is difficult Better to stop at the genus

Comparative analysis of microbiomes 16S Microbiomics is comparative by nature

Comparative analysis of microbiomes The obtained data can used to compare the composition of different microbial communities Alpha-diversity is the measure of the diversity within a population (many different types are in the sample) Beta-diversity is the measure of the inter-population diversity

Comparative analysis of microbiomes Several indexes and statistical tests are available to study alpha and beta diversities These methods are imported directly from ecology

Comparative analysis of microbiomes Alpha-diversity Many of the indexes used to study microbial alpha diversity are also used in ecology. E.g. Shannon Index Simpson index used to quantify the biodiversity of a habitat Take into account the number of species present and the abundance of each species

Comparative analysis of microbiomes Beta-diversity In order to compare the composition of different microbial populations we can use the Bray-Curtis dissimilarity index to calculate the similarity matrix, and then apply clustering methods to group the populations

MICROBES = bacteria? Additional approaches are considering The fungal community = MYCOBIOME (18S or ITS amplicon sequencing) The viral community = VIROME (???) Could we design an unbiased approach to obtain the entire MICROBIOME?

Microbiome 2.0 Shotgun metagenomics

Next-Gen sequencing technologies are so productive that a complex sample such as a microbial community can be fully sequenced without 'filter' steps This is the concept of metagenomic shotgun Ultra-deep sequencing is needed

This approach hypothetically allows to identify not just every taxon, but every gene present in the sample Consequently, it is possible to analyze the complexity of metabolic reactions present in the sample

Downstream bioinformatics analysis to discriminate what is present in a shotgun microbiome sample can be very challenging Ad-hoc softwares are now being developed specifically for these approaches

Downstream analyses Once you get your metagenomic reads, you can: 1. IDENTIFICATION AND QUANTIFICATION Know what species are in the sample and how many cells there are for each of them 2. FUNCTIONAL ANALYSIS Know what functions/genes are in the sequenced community (with or without knowing which species has which gene) 3. GROWTH ANALYSIS know which population is growing A closed reference genome and organisms with one single origin of replication are needed (very new and rare)

1. IDENTIFICATION AND QUANTIFICATION It is possible to characterize and quantify the community in a sequenced sample APPROACH: Look for reads corresponding to marker genes: genes apt to be used to recognize an organism, for example the 16s rdna It is the same concept as the amplicon based metagenomics Can be performed on more than one marker gene at the same time (more precise than the amplicon based one) There are two ways of calculating the taxonomy: based on sequence alignments and similarity or using phylogeny One of the output images of MetaPhlAn

2. FUNCTIONAL ANALYSIS Reads can be assembled to look for functions/genes Gene function can be annotated by sequence similarity or predicted Sometimes the assembly is so fragmented that you get only pieces of genes (function can still be predicted) Whose gene is this? Functions can be assigned to a taxonomy by BINNING, which can be performed before or after the assembly Taxonomy assignment is based on: sequence similarity based on alignment to known genomes sequence content: percentage of GC, presence of specific k-mers Similar read coverage E. coli B. subtilis P. putida

3. GROWTH ANALYSIS know which population is growing A closed reference genome and organisms with one single origin of replication are needed (very new and rare) Peak to trough ratio see the differences in coverage between origin of replication and other genome locations

Microbiome 3.0 Ad-hoc approaches

Mini metagenomics A complex sample is passed through a cell-sorter

Mini metagenomics Cells of interest are selected These can be all bacteria, all eukaryotic cells, all bacteria of one species, all leukocytes... The sample is sequenced Shotgun metagenomics methods are used... on a much simpler community

Symbiosis The associaton of organisms living together A simbiotic consortium is a metagenome A specific bioinformatic can be used to disentagle a symbiotic consortium and obtain the genome of a symbiont