Introduction to OTU Clustering. Susan Huse August 4, 2016
|
|
- Marion Shepherd
- 6 years ago
- Views:
Transcription
1 Introduction to OTU Clustering Susan Huse August 4, 2016
2 What is an OTU? Operational Taxonomic Units a.k.a. phylotypes a.k.a. clusters aggregations of reads based only on sequence similarity, independent of taxonomic name
3 The Magical 3% NOT! 3% SSU OTUs = Species and 6% SSU OTUs = Genera
4 History of The 3% OTUs Wayne et al (1987) IJSEM 37(4): Uses genome-level phylogeny to define species, using 70% DNA-DNA reassociation. Stackebrandt and Goebel (1994) IJSEM 44(4): % DNA-DNA ~ 97% similarity of full-length 16S Therefore 97% similarity of any hypervariable region in any bacteria or fungus = species?
5 Cumulative Percentage of Non-Singleton Taxa 18% of genera are > 10% OTUs 30% of genera are < 3% OTUs Genus (827) Family (244) Order (92) Class (52) Phylum (31) Fig 1A Schloss & Westcott (2011) Appl Env Microbiol Maximum Intra-Taxon Distance for Full-length 16S
6 Why OTUs? Because we can! We can t: do whole genome sequencing of communities assign complete taxonomic names derive phylogenetic trees from millions of short tags Continuing to develop new methods that are more biologically or evolutionarily meaningful Come back in 10 years
7 Pros and Cons OTUs Novel organisms Taxonomy Universal names Insufficient taxonomy Does not lump together all order or family-level classifications Many names based on phenotype rather than genotype Meaning associated with names Independent of clustering width and algorithm Historically well-studied are split New areas are lumped
8 Cluster Width Diameter Sequences are never more than D apart. (CL) Radius Sequences are never more than R from seed or reference. (Ref, Greedy) Variable Only selected positions (Oligo) Averaged radii (AL)
9 Primary Aggregation Methods Complete linkage - no two sequences are farther apart than the clustering width Average linkage - the average distance between sequences is less than the width Greedy clusters sequences are less than the width to OTU representatives built de novo Reference sequences are less than the width to universal reference set of OTU representatives Oligotyping sequences are equivalent at most informationrich nucleotide positions Probabilistic error modeling of likely clusters
10 De Novo Greedy Clustering Most abundant reads: are likely true sequences least likely to be errors or chimeras, and most likely to spawn errors and chimeras Seed OTUs with most abundant, Assign variation to most abundant source.
11 De Novo Greedy Clustering Sort by abundance First Seq 1 is seed of Cluster 1 Compare next Seq to each cluster in order: if Distance(Seq, Seed) <= W, include it in the cluster if Seq is not a member of any cluster, seed a new cluster Edgar (2010) Bioinformatics 26(19):2460-1
12 Greedy Seeds > W Each tag goes to OTU with nearest representative, not OTU with the nearest sequence
13 5. Bin each tag to that reference s OTU#, or unclustered if > Width Rideout et al (2014) PeerJ in press Closed Reference OTUs 1. Create a set full-length reference sequences (e.g., Greengenes) 2. Cluster reference sequences 3. Select representative sequence for each cluster 4. Map each tag to nearest rep <= Width
14 Open Reference OTUs 1. Perform Closed Reference OTU mapping 2. Perform de novo clustering on unclustered sequences 3. Subsequent datasets can be mapped to the reference and to an expanding set of de novo clusters.
15 Reference OTU Caveat X% clustering of full-length reference sequences does not represent X% for each hypervariable region. Will vary by phylotype
16 Oligotyping Calculated Entropy Alignment positions of greatest information Oligotypes identical in selected positions Ignore the rest as noise Alignment Position Eren et al (2013) Methods in Ecol and Evol 4:1111-9
17 Pros and Cons Greedy, AL - standard de novo methods, abundance dependent, project-specific Reference OTU compare V-regions, universal IDs, dependent on reference database and reference clustering Oligotyping / DADA2 increased sensitivity, no specific radius, projectspecific
18 The Problem of Inflation Clustering algorithms return more OTUs than predicted for mock communities. OTU inflation leads to: alpha diversity inflation beta diversity inflation Where does this inflation come from? How can we adjust for this inflation?
19 OTU inflation varied with sample depth 7000 MS-CL M2FN Rarefaction - PML OTUs ,000 40,000 60,000 80, , ,000 Number of Sequences Sampled 5K 10K 15K 20K 50K 100K
20 Low-Quality Reads and Errant OTUs Word to the wise: minimize adventurous OTUs
21 Different clustering algorithms have very different effects on the size and number of OTUs created
22 Inflation in Action >200,000 V6 reads of E. coli clone (454) 1,042 is a few more than the expected 2 Huse et al (2010) Environ Microbiol 12(7):
23 Sequencing Noise in E.coli Distance Tags Seqs 0 177, % , % % % % % % % % % > % Cluster at 3% using only tags within 3% of the correct sequence
24 Inflation with small sequence error Only 129 OTUs much better!!
25 Clustering V6 sequences within 3% of known templates (outliers removed) E. coli (2) S. epidermidis (1) Clone43 (43) MS-CL 129 / 89 / 694 Clustering method MS-AL 54 / 44 / 218 Alignment method PW-CL 6 / 5 / 308 PW-AL 2 / 1 / 43 MS Multiple Sequence Alignment PW Pairwise Alignment CL Complete Linkage clustering AL Average Linkage clustering
26 Multiple Sequence Alignments Regardless of clustering algorithm, an MSA cannot fully align tags whose sequences are too divergent, more tags cause more problems 18,156 sequences and 392 positions
27 MSA Limitations Distance Comparison Independent Distances Subsampled Distances X-axis: 1000 distances subsampled from MSA of 40,000 sequences Y-axis: 1000 distances from MSA of only the 1,000 bacterial sequences
28 Cluster Count: Complete Linkage propagates OTUs #5 #6 #9 #8 #4 #3 #1 #12 #11 #2 #7 #10 #13 Each V6 variant with 2nt distance, will start a new OTU. Each V6 variant with 1nt distance, can cluster with the seed or a 2nd variant.
29 Average Linkage collapses errors Cluster Count: 1 #1 Clusters tend to be heavily dominated by their most abundant sequence, which strongly weights the average and smoothes the noise.
30 Greedy Seeds (UClust) Cluster Count: 1
31 Effect of Sampling Depth OTUs MS-CL M2FN Rarefaction - PML 5K 10K 15K 20K 50K 100K Previous algorithms differentially inflated OTUs at a sampling depth dependent upon total sample depth 0-20,000 40,000 60,000 80, , ,000 Number of Sequences Sampled Newer methods do not
32 Errant OTUs If a low-quality sequence is >3% from its source, it can create a new OTU. If the rate of an errant read = 1 in 10 thousand, and we have 1 million reads: * 1,000,000 reads = 100 errant OTUs
33 Impact of Errant OTUs Errant OTUs as percent of OTUs decreases with diversity. If we have 100 errant OTUs: Mock community: 100 / 50 OTUs = +200% Diverse community: 100 / 2,000 OTUs = +5%
34 Relative Inflation Absolute number of errant OTUs will increase with sample size. Relative number of errant OTUs will decrease with sample complexity
35 What is the right cluster width? No.
36 What is the right cluster width? Distances vary by hypervariable region Full-length does not equal V1-V3 which does not equal V3-V5 which does not equal V6, etc. Distance vary by bacterial lineage What specificity are you trying to gain? No single clustering width is the right answer, use several (and wave your hands a lot) or use other methods like Oligotyping
37 Questions about Clustering How meaningful are clusters functionally? When is an errare rare and when is it an error? Should it be included in an existing cluster or start its own? How to assign sequences if OTUs overlap? What is the effect of residual low quality data or chimeras? How sensitive are alpha and beta diversity estimates to clustering results?
Sequencing Errors, Diversity Estimates, and the Rare Biosphere
Sequencing Errors, Diversity Estimates, and the Rare Biosphere or Living in the shadow of Errares Susan Huse Marine Biological Laboratory June 13, 2012 Consistent Community Profile across samples and environments
More informationQuality Filtering of Illumina Sequences. Susan Huse Brown University August 6, 2015
Quality Filtering of Illumina Sequences Susan Huse Brown University August 6, 2015 Illumina FASTQ Files File naming: NA10831_ATCACG_L002_R1_001.fastq.gz FA1_S1_L001_R1_001.fastq.gz Sample_Barcode/Index_Lane_Read#_Set#.fastq.gz
More informationRobert Edgar. Independent scientist
Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2
More informationmothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University
mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur Mission to develop a single piece of open-source, expandable
More informationmothur Workshop for Amplicon Analysis Michigan State University, 2013
mothur Workshop for Amplicon Analysis Michigan State University, 2013 Tracy Teal MMG / ICER tkteal@msu.edu Kevin Theis Zoology / BEACON theiskev@msu.edu mothur Mission to develop a single piece of open-source,
More informationIntroduction to Bioinformatics analysis of Metabarcoding data
Introduction to Bioinformatics analysis of Metabarcoding data Theoretical part Alvaro Sebastián Yagüe Experimental design Sampling Sample processing Sequencing Sequence processing Experimental design Sampling
More informationIntroduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014
Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to
More informationCarl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life
METAGENOMICS Carl Woese Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary
More informationMetagenomics Computational Genomics
Metagenomics 02-710 Computational Genomics Metagenomics Investigation of the microbes that inhabit oceans, soils, and the human body, etc. with sequencing technologies Cooperative interactions between
More informationCarl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life
METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary
More informationAssessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rrna Gene Sequence Analysis
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, May 2011, p. 3219 3226 Vol. 77, No. 10 0099-2240/11/$12.00 doi:10.1128/aem.02810-10 Copyright 2011, American Society for Microbiology. All Rights Reserved. Assessing
More informationMicrobiomes and metabolomes
Microbiomes and metabolomes Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271
More informationLecture 01: Overview of Metagenomics
Lecture 01: Overview of Metagenomics 1 Culture Independent Techniques: Metagenomics Universal Gene census Shotgun Metagenome Sequencing Transcriptomics (shotgun mrna) Proteomics (protein fragments) Metabolomics
More informationchoose MBL-REGISTER user: dm00834 password: dm00834 http://register.mbl.edu/ stamps.mbl.edu this uses the username and password on your STAMPS name badge Strategies for Analysis of Microbial Population
More informationStability of operational taxonomic units: an important but neglected property for analyzing microbial diversity
He et al. Microbiome (2015) 3:20 DOI 10.1186/s40168-015-0081-x RESEARCH Open Access Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity Yan He
More informationMethods for the phylogenetic inference from whole genome sequences and their use in Prokaryote taxonomy. M. Göker, A.F. Auch, H. P.
Methods for the phylogenetic inference from whole genome sequences and their use in Prokaryote taxonomy M. Göker, A.F. Auch, H. P. Klenk Contents 1. Short introduction into the role of DNA DNA hybridization
More informationTECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.
TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS. Ian Jeffery I.Jeffery@ucc.ie What is metagenomics Metagenomics is the study of genetic material recovered directly from environmental
More informationBellerophon; a program to detect chimeric sequences in multiple sequence
Revised ms: BIOINF-03-0817 Bellerophon; a program to detect chimeric sequences in multiple sequence alignments. Thomas Huber 1 *, Geoffrey Faulkner 1 and Philip Hugenholtz 2 1 ComBinE group, Advanced Computational
More informationBioinformatics for Microbial Biology
Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. MBQC base beta diversity, major protocol variables, and taxonomic profiles.
Supplementary Figure 1 MBQC base beta diversity, major protocol variables, and taxonomic profiles. A) Multidimensional scaling of MBQC sample Bray-Curtis dissimilarities (see Fig. 1). Labels indicate centroids
More informationdbcamplicons pipeline Amplicons
dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:
More informationHmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies
Zheng et al. Genome Biology (2018) 19:82 https://doi.org/10.1186/s13059-018-1450-0 SOFTWARE HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome
More informationdbcamplicons pipeline Amplicons
dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:
More informationFungal ITS Bioinformatics Efforts in Alaska
Fungal ITS Bioinformatics Efforts in Alaska D. Lee Taylor ltaylor@iab.alaska.edu Institute of Arctic Biology University of Alaska Fairbanks Shawn Houston Minnesota Supercomputing Institute University of
More informationCOMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES
COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES Tyler Bradley * Jacob R. Price * Christopher M. Sales * * Department of Civil, Architectural, and Environmental Engineering,
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature09944 Supplementary Figure 1. Establishing DNA sequence similarity thresholds for phylum and genus levels Sequence similarity distributions of pairwise alignments of 40 universal single
More informationAll subjects gave written informed consent before participating in this study, which
Supplementary Information Materials and Methods Sequence generation and analysis All subjects gave written informed consent before participating in this study, which was approved by the Washington University
More informationSANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM
SANBio BIOINFORMATICS TRAINING COURSE THE MICROBIOME: ANALYSIS OF NGS DATA CBIO-PIPELINE SAMSON, KM 10/23/2017 Microbiome : Analysis of NGS Data 1 Outline Background Wet Lab! Raw reads Quality Assessment
More informationReport on database pre-processing
Multiscale Immune System SImulator for the Onset of Type 2 Diabetes integrating genetic, metabolic and nutritional data Work Package 2 Deliverable 2.3 Report on database pre-processing FP7-600803 [D2.3
More informationGenome Jigsaw: Implications of 16S Ribosomal RNA Gene Fragment Position for Bacterial Species Identification
Wilfrid Laurier University Scholars Commons @ Laurier Theses and Dissertations (Comprehensive) 2014 Genome Jigsaw: Implications of 16S Ribosomal RNA Gene Fragment Position for Bacterial Species Identification
More informationBioinformatic tools for metagenomic data analysis
Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic
More informationAdvisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.
Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:
More information1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute
1 Abstract 2 Introduction The human body is home to an enormous number and diversity of microbes. These microbes, our microbiome, are increasingly thought to be required for normal human development, physiology,
More informationPractical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State
Practical Bioinformatics for Life Scientists Week 14, Lecture 27 István Albert Bioinformatics Consulting Center Penn State No homework this week Project to be given out next Thursday (Dec 1 st ) Due following
More informationWhat is metagenomics?
Metagenomics What is metagenomics? Term first used in 1998 by Jo Handelsman "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments,
More informationCBC Data Therapy. Metagenomics Discussion
CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,
More informationDevelopment of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia
Development of NGS metabarcoding for the characterization of aerobiological samples Lucia Muggia Alberto Pallavicini, Elisa Banchi, Claudio G. Ametrano, David Stankovic, Silvia Ongaro, Enrico Tordoni,
More informationMetagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome
Also: Sunaina Melissa Gardiner UTS Catherine Burke UTS Michael Liu UTS Chris Beitel UTS, UC Davis Matt DeMaere UTS Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin
More informationBIOINFORMATICS TO ANALYZE AND COMPARE GENOMES
BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES We sequenced and assembled a genome, but this is only a long stretch of ATCG What should we do now? 1. find genes What are the starting and end points for
More informationEvaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef
Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef K.L. Huebner, J.N. Martin C.J. Weissend, K.L. Holzer, M. Weinroth, Z. Abdo,
More informationPhylogenetic methods for taxonomic profiling
Phylogenetic methods for taxonomic profiling Siavash Mirarab University of California at San Diego (UCSD) Joint work with Tandy Warnow, Nam-Phuong Nguyen, Mike Nute, Mihai Pop, and Bo Liu Phylogeny reconstruction
More informationMicrobiome: Metagenomics 4/4/2018
Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms
More informationHMP Data Set Documentation
HMP Data Set Documentation Introduction This document provides detail about files available via the DACC website. The goal of the HMP consortium is to make the metagenomics sequence data generated by the
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 16 2011, pages 2194 2200 doi:10.1093/bioinformatics/btr381 Sequence analysis Advance Access publication June 23, 2011 UCHIME improves sensitivity and speed of
More informationParts of a standard FastQC report
FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are
More informationOligotyping analysis of the human oral microbiome
Oligotyping analysis of the human oral microbiome A. Murat Eren a, Gary G. Borisy b,1, Susan M. Huse c, and Jessica L. Mark Welch a,1 a Josephine Bay Paul Center for Comparative Molecular Biology and Evolution,
More informationUSEARCH software and documentation Copyright Robert C. Edgar All rights reserved.
USEARCH software and documentation Copyright 2010-11 Robert C. Edgar All rights reserved http://drive5.com/usearch robert@drive5.com Version 5.0 August 22nd, 2011 Contents Introduction... 3 UCHIME implementations...
More informationMicrobiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine
Microbiomics I August 24th, 2017 Introduction Robert Kraaij, PhD Erasmus MC, Internal Medicine r.kraaij@erasmusmc.nl Welcome to Microbiomics I Infection & Immunity MSc students Only first day no practicals
More informationGPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU
GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU Xiaoquan Su $, Xuetao Wang $, JianXu, Kang Ning* Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory
More informationEvaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef
Evaluation of the liver abscess microbiome and liver abscess prevalence in cattle reared for production of natural branded beef K.L. Huebner, J.N. Martin C.J. Weissend, K.L. Holzer, M. Weinroth, Z. Abdo,
More informationIntroduction to CGE tools
Introduction to CGE tools Pimlapas Leekitcharoenphon (Shinny) Research Group of Genomic Epidemiology, DTU-Food. WHO Collaborating Centre for Antimicrobial Resistance in Foodborne Pathogens and Genomics.
More informationChapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)
Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis
More informationChIP. November 21, 2017
ChIP November 21, 2017 functional signals: is DNA enough? what is the smallest number of letters used by a written language? DNA is only one part of the functional genome DNA is heavily bound by proteins,
More information12/8/09 Comp 590/Comp Fall
12/8/09 Comp 590/Comp 790-90 Fall 2009 1 One of the first, and simplest models of population genealogies was introduced by Wright (1931) and Fisher (1930). Model emphasizes transmission of genes from one
More informationGene Expression Data Analysis
Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based
More informationDistribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit
Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
More informationReducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rrna-based Studies
Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rrna-based Studies The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story
More informationa-dB. Code assigned:
This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the
More informationExploring Microbial Diversity and Taxonomy Using SSU rrna Hypervariable Tag Sequencing
Exploring Microbial Diversity and Taxonomy Using SSU rrna Hypervariable Tag Sequencing Susan M. Huse 1, Les Dethlefsen 2, Julie A. Huber 1, David Mark Welch 1, David A. Relman 2,3,4, Mitchell L. Sogin
More informationUTAX accurately predicts taxonomy of marker gene sequences
UTAX accurately predicts taxonomy of marker gene sequences Robert C. Edgar Independent Investigator Tiburon, California, USA. robert@drive5.com The UTAX algorithm accurately predicts the taxonomy of 16S
More informationDr. Sabri M. Naser Department of Biology and Biotechnology An-Najah National University Nablus, Palestine
Molecular identification of lactic acid bacteria Enterococcus, Lactobacillus and Streptococcus based on phes, rpoa and atpa multilocus sequence analysis (MLSA) Dr. Sabri M. Naser Department of Biology
More informationComputational Systems Biology Deep Learning in the Life Sciences
Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 Christina Ji April 6, 2017 DanQ: a hybrid convolutional and recurrent deep neural network for quantifying
More informationNature Biotechnology: doi: /nbt.3943
Supplementary Figure 1. Distribution of sequence depth across the bacterial artificial chromosomes (BACs). The x-axis denotes the sequencing depth (X) of each BAC and y-axis denotes the number of BACs
More informationSupplementary Information for
Supplementary Information for Microbial community dynamics and stability during an ammonia- induced shift to syntrophic acetate oxidation Jeffrey J. Werner 1,2, Marcelo L. Garcia 3, Sarah D. Perkins 3,
More informationNB! 6. Global key annotations
NB! This is a separate paragraph of the PlutoF manual. Please use current manual at http://unite.ut.ee/temp/plutof2/files/plutof_2.5_manual_small.pdf if needed or cited in this document. 6. Global key
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationIntroducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Mar. 2005, p. 1501 1506 Vol. 71, No. 3 0099-2240/05/$08.00 0 doi:10.1128/aem.71.3.1501 1506.2005 Copyright 2005, American Society for Microbiology. All Rights Reserved.
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationLecture 7. Next-generation sequencing technologies
Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively
More informationConducting Microbiome study, a How to guide
Conducting Microbiome study, a How to guide Sam Zhu Supervisor: Professor Margaret IP Joint Graduate Seminar Department of Microbiology 15 December 2015 Why study Microbiome? ü Essential component, e.g.
More informationSequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing
Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence
More informationUsing Ensembles of Hidden Markov Models in metagenomic analysis
Using Ensembles of Hidden Markov Models in metagenomic analysis Tandy Warnow, UIUC Joint work with Siavash Mirarab, Nam-Phuong Nguyen, Mike Nute, Mihai Pop, and Bo Liu Computational Phylogenetics and Metagenomics
More informationApplications of Next Generation Sequencing in Metagenomics Studies
Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno
More informationa-dB. Code assigned:
This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the
More informationALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG
Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press
More informationNGS part 2: applications. Tobias Österlund
NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationFigure S4 A-H : Initiation site properties and evolutionary changes
A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags
More informationSequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements
More informationAccurate determination of microbial diversity from 454 pyrosequencing data
brief communications 2009 Nature America, Inc. All rights reserved. Accurate determination of microbial diversity from 454 pyrosequencing data Christopher Quince 1, Anders Lanzén 2, Thomas P Curtis 3,
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationNovel bacterial taxa in the human microbiome
Washington University School of Medicine Digital Commons@Becker Open Access Publications 2012 Novel bacterial taxa in the human microbiome Kristine M. Wylie Washington University School of Medicine in
More informationUtilizing novel diversity estimators to quantify multiple dimensions of microbial biodiversity across domains
Doll et al. BMC Microbiology 2013, 13:259 RESEARCH ARTICLE Open Access Utilizing novel diversity estimators to quantify multiple dimensions of microbial biodiversity across domains Hannah M Doll 1*, David
More informationA FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA
A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA A. MURAT EREN Department of Computer Science, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148, USA Email: aeren@uno.edu MICHAEL
More informationA comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome
Allali et al. BMC Microbiology (2017) 17:194 DOI 10.1186/s12866-017-1101-8 RESEARCH ARTICLE Open Access A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the
More informationIntroduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine
Introduction to Microbial Community Analysis Tommi Vatanen CS-E5890 - Statistical Genetics and Personalised Medicine Structure of the lecture Motivation: human microbiome Terminology Data types, analysis
More informationEvaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
Golob et al. BMC Bioinformatics (2017) 18:283 DOI 10.1186/s12859-017-1690-0 RESEARCH ARTICLE Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial
More informationGenome sequence of Brucella abortus vaccine strain S19 compared to virulent strains yields candidate virulence genes
Fig. S2. Additional information supporting the use case Application to Comparative Genomics: Erythritol Utilization in Brucella. (A) Genes encoding enzymes involved in erythritol transport and catabolism
More informationon November 21, 2018 by guest
AEM Accepts, published online ahead of print on 30 November 2012 Appl. Environ. Microbiol. doi:10.1128/aem.03171-12 Copyright 2012, American Society for Microbiology. All Rights Reserved. 1 2 3 4 5 6 7
More informationCS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes
CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual
More informationSpectral Counting Approaches and PEAKS
Spectral Counting Approaches and PEAKS INBRE Proteomics Workshop, April 5, 2017 Boris Zybailov Department of Biochemistry and Molecular Biology University of Arkansas for Medical Sciences 1. Introduction
More informationTracking the units of microbial communities with metagenomics
Tracking the units of microbial communities with metagenomics Dr. Kostas Konstantinidis School of Civil and Environmental Engineering & School of Biology (Adjunct), Center for Bioinformatics and Computational
More informationSupplementary Material
Supplementary Material Table of contents: Supplementary Figures 2 Supplementary Tables 7 Supplementary Notes 11 Supplementary Note 1: Read cloud and SLR sequencing 11 Supplementary Note 2: Overview of
More informationHHS Public Access Author manuscript Nat Methods. Author manuscript; available in PMC 2016 November 23.
DADA2: High resolution sample inference from Illumina amplicon data Benjamin J Callahan 1,*, Paul J McMurdie 2, Michael J Rosen 3, Andrew W Han 2, Amy Jo A Johnson 2, and Susan P Holmes 1 1 Department
More informationngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian
Metagenomics T TM storage genetics assembly ncrna custom genotyping RNA-seq de novo mendelian ChIP-seq exome genomics indel ngs trio prediction metagenomics SNP resequencing bioinformatics diagnostics
More informationA step towards understanding Polycyclic Aromatic Hydrocarbons degradation by microbial communities
7th Annual Conference on the Advances in Land Contamination Assessment and Remediation A step towards understanding Polycyclic Aromatic Hydrocarbons degradation by microbial communities Kevin Bayle Ph.D.
More informationAssociation Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010
Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of
More informationRuns of Homozygosity Analysis Tutorial
Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................
More information