MICROBIOME SOFTWARE: END OF BEGINNING.
|
|
- Kelly Townsend
- 6 years ago
- Views:
Transcription
1 MICROBIOME SOFTWARE: END OF BEGINNING. DR. CHARLES ROBERTSON DIVISION OF INFECTIOUS DISEASES, UNIVERSITY OF COLORADO SCHOOL OF MEDICINE DR. DANIEL N. FRANK, DIVISION OF INFECTIOUS DISEASES, SCHOOL OF MEDICINE DR. J. KIRK HARRIS, DEPT. OF PEDIATRICS, SCHOOL OF MEDICINE & CHILDREN S HOSPITAL CO
2 OVERVIEW Sequence Data Microbiome Sequence Analysis Tools Results Today: Look at three items in the Black Box
3 OUR MENTOR: NORMAN PACE NORM S MENTOR: CARL WOESE Nucleic acid biochemist Extensive ribozyme work: RNase P Invented the basis of all microbiome studies: The culture independent method Member NAS, election year 1991 Nucleic acid biochemist Discovered the Archaea Put forth the RNA world hypothesis Member NAS, election year 1988
4 MY BACKGROUND Extensive use of computers starting in 1968 Itinerant programmer in the US and Europe BS Electrical Engineering & Computer Science, 1982 School of Engineering, University of Colorado, Boulder 2 years spent doing logic design at a supercomputer company 25 years in the Electronic Design Automation business Building commercial software to solve NP complete problems Hardware description languages, Circuit Simulation, Logic Simulation, Placement, Routing, PCB s & IC s Last position in the EDA industry: CEO PhD, 2008 Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder Current: Hardware/Software/Sequence Analysis Have processed >200 MiSeq runs in the last 5 years (> 2 billion sequences) ~90% medical ~10% environmental (Customers: primarily CU Boulder School of Engineering)
5 THE MICROBIOME PROCESS Sample Extract DNA Amplify One Molecule Sequence Identify & Count Each Sequence Type Community Composition wet bench work computer informatics The primary topic of this presentation
6 The culture independent method use DNA sequences to identify microbes Woese selected the ribosome Ribosome: complex machine that assembles proteins from amino acids per information encoded in chromosomes A heavily constrained portion of the information processing system The ribosome is a ribozyme Shape is everything Precise positioning of reactants and charged ions to get enzymatic activity
7 Small SubUnit rrna 16S
8 Information rich molecule Primary sequence Secondary Structure Information content is non-uniformly distributed across the entire molecule The Tree of Life cannot be reproduced with short sequences Amplicon access via universal primers Desire uniform amplification of all kinds Ever a compromise between length (cost per sequence) and primer locations Small SubUnit rrna 16S Easy to identify phylum by short sequence No simple/consistent way to get to species with short sequences
9 Ordination The Human Microbiome Consortium Nature 486:
10 IN THE BEGINNING, WITH SANGER SEQUENCING Only use full length sequences for analysis The full length sequences had to be ALIGNED (NP complete) Why align? To assure comparison of homologous nucleotides G-CGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACG-C GCC-TAATCG--GGCCATTACGCTTGCGTAATGGCCCGATTA-GGC GCCGT-ATCGAAGGCCATTAC-CTTG-GTAATGGCCCGAT-ACGGC GCCGTAATC---GGC-ATTACGCTTGCGTAAT-GCC-GATTACGGC GCCGTAATCGAAGGCCATTA-GCTTGC-TAATGGCCCGATTACGGC Build phylogenetic trees (NP complete) Iteratively make informed guesses as to the shapes of trees & measure their probabilities Informed trial and error! C T T G G C C G A T T A T A A T C G C G G C A G C A G C C G T A A T A T T A G C C G G C
11 1997, Sanger 1990, Pre and early Sanger Woese, et al. PNAS, 1990 June; 87 (12): Pace, NR. Science,1997 May 2;276(5313):
12 ADVENT OF NEXT GENERATION SEQUENCING Induced very rapid change due to very large decrease in price per sequence Sequences/Sample: Sanger/454/MiSeq: 96/8,000/100,000 Other scientific disciplines suddenly very motivated to explore the microbiomes of their knowledge sub-domains Ecologists Geologists Physicians A big problem arose: Alignment & Tree building s/w of that time did not scale well Existing analysis approaches (computers/software tools) could not cope with the onslaught of the large number of sequences in NGS datasets
13 SOLUTION: NEW TOOLS THE PROGRAMMERS ARRIVE Adopt new languages and rapid prototyping software creation processes Eg, the Python programming language Abandon NP complete processes Vigorously assert all of the following Full length sequences not always needed Local (or no) alignment good enough Just stop building phylogenetic trees (for the most part)
14 ITEM ONE IN THE BLACK BOX: NUMERICAL OTUS Per SOP s of Qiime and Mothur: Create numerical OTUs Generate enumerated clusters of sequences that are sort of close ( close enough, say 3%) Pick a single sequence as a representative of each cluster Classify only the representative sequence which is then attributed to all sequences in the cluster Less classification means faster dataset processing
15 CREATING CLUSTERS Intuitive example, that has issues similar to sequence clustering. Let the radius of a circle represent the size of an OTU, eg 3% OTU Picking with fixed radius clusters: Numerical OTUs are NOT canonical: Completely dependent on selection rules: Order & packing heuristic We don t have a theoretical framework guided by biochemistry, biology, etc, to inform how the clusters are to be created everyone is as correct as anyone else, but they are NOT DIRECTLY COMPARABLE.
16 PICKING ONE REPRESENTATIVE FOR THE CLUSTER Which dot is the single best representative for this OTU cluster? Why? Again: no theoretical biochemical/etc. framework to inform the selection of representatives of clusters Arguments can be made for various approaches, but the arguments are NOT based on biochemistry or biology they are based on statistics or computer science (which means programmer convenience)
17 CLUSTERING YIELDS BIASED RESULTS Numerical OTU s do a great job of enumerating differences between sets of sequences Great insights via ordination However: Clustering usually superposes a model (3% species bins) that does not fit current observations based on the Big Tree For medical analyses different organisms often appear in a single cluster Clustering adds a bias to the results The representative sequence does not appropriately match all of the sequences in the clusters The true positions of individual sequences become fuzzier
18 ITEM TWO FROM THE BLACK BOX: CLASSIFICATION The RDP Classifier: Naïve Bayesian Classification Eliminated the need for 2 computationally intensive activities: alignment & tree building How does it work? Start with unaligned sequence data and associated taxonomy lines (aka, The Training Set) Use Bayes Theorem to generate probability coefficients that allows very fast classification of unknown sequences RDP Classifier Unknown sequence Bacteria/Proteobacteria/ /E. coli Probabilistic Binning Cloud Norm Pace Bayes Theorem:
19 TRAINING SETS: UNALIGNED Makes use of unaligned reference sequences G-CGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACG-C GCC-TAATCG--GGCCATTACGCTTGCGTAATGGCCCGATTA-GGC GCCGT-ATCGAAGGCCATTAC-CTTG-GTAATGGCCCGAT-ACGGC GCCGTAATC---GGC-ATTACGCTTGCGTAAT-GCC-GATTACGGC GCCGTAATCGAAGGCCATTA-GCTTGC-TAATGGCCCGATTACGGC Divide into groups of 8 columns: 8-mers GCGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACGC.. GCCTAATCGGGCCATTACGCTTGCGTAATGGCCCGATTAGGC... GCCGTATCGAAGGCCATTACCTTGGTAATGGCCCGATACGGC... GCCGTAATCGGCATTACGCTTGCGTAATGCCGATTACGGC... GCCGTAATCGAAGGCCATTAGCTTGCTAATGGCCCGATTACGGC.. Classic alignment Retains correlation with the secondary structure! RDP classifier training set Loses correlation with the secondary structure! C T T G G C C G A T T A T A A T C G C G G C A G C A G C C G T A A T A T T A G C C G G C Using unaligned training sets changes precise boundaries into vague boundaries: Noise.
20 NAÏVE BAYES CLASSIFICATION: PROS/CONS Good Very fast (Computer: just multiplications and additions) Ubiquitous (Qiime/Mothur/RDP website) Bad (from our perspective) Training sets are often unstable don t get out what you put in Creating a stable training set is a black art To get the results you want, often have to add/delete apparently completely unrelated sequences The result provides no clues whatsoever as to how the classifier came up with the answer 100% oracle, 0% insight Which sequence in the reference training set was closest to an unknown? For very similar sequences, which few nucleotides were different? Does not provide an AUDIT TRAIL: critical for clinical medicine & epidemiology! which known species in the database was the basis for an unknown called as that species?
21 NO UNIVERSAL CONSTANTS IN BIOLOGY Biology is an intrinsically observational activity The process: collect and assemble anecdotes Insight arises when a critical mass of anecdotes is accumulated No predictive mathematical formulations have been forthcoming, eg: No speed of light, No E = MC 2, Not even an Ohm s law equivalent How many kinds of microbes exist on the planet FOR CERTAIN? How much sequence distance exists within ALL species level clades in the Big Tree FOR CERTAIN? In retrospect 3% species should NOT have been enshrined in microbiome tools For new organisms: it must always come back to a pairwise comparison As it was for Linnaeus, so it is still for us. The new organism must be compared to the most similar organism that has already been documented The lack of a numeric predictive theoretical framework is at odds with Bioinformatics Software demands very specific answers to questions like: what means nearby between two sequences?
22 SO WHAT? There are limits to the precision we can get with numerical OTUs and Bayesian classification Where in the world are the error bars on these processes? Effective software solutions exist that are not based on numerical OTUs & Bayesian classification We are at the end of the beginning of microbiome analysis It is time to re-evaluate all of the fundamental assumptions to get to the future Next: The biggest bleeding sore : The libraries of reference sequences we all use.
23 CURATED DATABASES OF FULL LENGTH 16S SEQUENCES Why not just NCBI/EMBL? no attempt at all to place sequences in a phylogenetic context. Submitted sequences not unambiguously derived from cultivated sources are assigned taxonomy Environmental/Uncultivated The two most commonly used curated 16S phylogenetic databases: Greengenes & Silva Greengenes from the Pace Lab via Phil Hugenholtz to JGI. Qiime default Greengenes Database Consortium/2 nd Genome: but current status unclear. No updates since May, 2013 Silva Microbial Genomics Group at the Max Planck Institute for Marine Microbiology, Bremen and the Department of Microbiology at the Technical University Munich. Mothur default Well documented releases at somewhat irregular intervals; releases locked to EMBL versions. Latest: Silva 128, Sept 28, Silva >>> Greengenes
24 THE PRECISION LIMIT: REFERENCE SEQUENCES The most significant microbiome tools limit: database content All microbiome tools are vetted against, or make intrinsic use of these curated 16S databases How do we know issues exist? Recent availability of many microbial GENOMES rrna s of microbial genomes are relatively clean : uniform, consistent, little variation within By comparison with genomes rrnas, many database sequences have non-subtle defects Missing pieces, added pieces, perturbed secondary structures Database sequence defects source? the mishmash of sequencing technologies over the ages: Sanger, 454, Illumina We did not know what we did not know best efforts at the time Sequence databases are the ultimate Hotel California Sequences check into databases but they never leave. Infinite academic collegiality is in force No non-confrontational means to resolve issues.
25 DATABASE CURATION IS HARD, EXPENSIVE, UNDERFUNDED Even the best rrna database inadequate for calls to species level These databases are the equivalent of the literature and museums that Linnaeus used to deduce relationships: if we get them wrong, uncertainty propagates. Are genomes the silver bullet for high precision reference sequences? Evaluation of rrnas from genomes in Silva 128 finds some with defects: case-by-case scrutiny required! Defects: Missing pieces, added pieces, perturbed secondary structures, protein content Errors likely due to assembly process errors (software as oracle, again!) Most genomicists do not go to extra effort to verify structure of rrnas (focus is on proteins) But: genomes are clearly consistently better Current databases need to be re-evaluated in the light of the genomes rrna sequences! Fundamental career limiting disincentive for database work: The work is considered to be significant but NOT INNOVATIVE therefore, NO FUNDING
26 REFLECTIONS ON THE JOURNEY Phylogenetic analysis of full length sequences is still the gold standard High volume analysis techniques must be characterized in light of phylogenetics Taxonomic error bar characterization needed for microbiome analysis Numerical OTUs have and will continue to provide utility BUT: to maximize biological, biochemical, and evolutionary insight we need the most precise taxonomy calls that can be attained Let go of universal numeric constants! Reference Sequence Databases New Focus, Means (IEEE style?), and Funding mechanism required change the quid pro quo for this work! Perhaps just a wee bit of rebalance of focus back toward biochemistry instead of software?
27 OUR SOFTWARE SPECIFIC FUNDING NIH R21HG (Frank) CIHR Genome Canada (Parkinson) NIH UH2DK (Li)
28 THE END
Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life
METAGENOMICS Carl Woese Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary
More informationCBC Data Therapy. Metagenomics Discussion
CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,
More informationCarl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life
METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary
More informationChapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)
Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis
More informationJoint RuminOmics/Rumen Microbial Genomics Network Workshop
Joint RuminOmics/Rumen Microbial Genomics Network Workshop Microbiome analysis - Amplicon sequencing Dr. Sinéad Waters Animal and Bioscience Research Department, Teagasc Grange, Ireland Prof. Leluo Guan
More informationInfectious Disease Omics
Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and
More informationPractical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State
Practical Bioinformatics for Life Scientists Week 14, Lecture 27 István Albert Bioinformatics Consulting Center Penn State No homework this week Project to be given out next Thursday (Dec 1 st ) Due following
More informationFollowing text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005
Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationApplications of Next Generation Sequencing in Metagenomics Studies
Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno
More informationBIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)
BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics
More informationMicrobiome Analysis. Research Day 2012 Ranjit Kumar
Microbiome Analysis Research Day 2012 Ranjit Kumar Human Microbiome Microorganisms Bad or good? Human colon contains up to 100 trillion bacteria. Human microbiome - The community of bacteria that live
More informationGrundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1
More informationCOMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES
COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES Tyler Bradley * Jacob R. Price * Christopher M. Sales * * Department of Civil, Architectural, and Environmental Engineering,
More informationBioinformatics for Microbial Biology
Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:
More informationI AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador
I AM NOT A METAGENOMIC EXPERT I am merely the MESSENGER Blaise T.F. Alako, PhD EBI Ambassador blaise@ebi.ac.uk Hubert Denise Alex Mitchell Peter Sterk Sarah Hunter http://www.ebi.ac.uk/metagenomics Blaise
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationIntroduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014
Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to
More informationRobert Edgar. Independent scientist
Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2
More informationdbcamplicons pipeline Amplicons
dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:
More informationHMP Data Set Documentation
HMP Data Set Documentation Introduction This document provides detail about files available via the DACC website. The goal of the HMP consortium is to make the metagenomics sequence data generated by the
More informationBellerophon; a program to detect chimeric sequences in multiple sequence
Revised ms: BIOINF-03-0817 Bellerophon; a program to detect chimeric sequences in multiple sequence alignments. Thomas Huber 1 *, Geoffrey Faulkner 1 and Philip Hugenholtz 2 1 ComBinE group, Advanced Computational
More informationCSC 121 Computers and Scientific Thinking
CSC 121 Computers and Scientific Thinking Fall 2005 Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationMicrobiome: Metagenomics 4/4/2018
Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms
More informationIntroducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight
Introducing QIAseq Accelerate your NGS performance through Sample to Insight solutions Sample to Insight From Sample to Insight let QIAGEN enhance your NGS-based research High-throughput next-generation
More informationBioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview
Bioinformatics Some selected examples... and a bit of an overview Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July 19, 2007 @ EnviroHealth Connections Bioinformatics and
More informationMicroSEQ Rapid Microbial Identification System
MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationmothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University
mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur Mission to develop a single piece of open-source, expandable
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature09944 Supplementary Figure 1. Establishing DNA sequence similarity thresholds for phylum and genus levels Sequence similarity distributions of pairwise alignments of 40 universal single
More informationdbcamplicons pipeline Amplicons
dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationComputers in Biology and Bioinformatics
Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors of organisms, how species and individuals come
More informationMicrobiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine
Microbiomics I August 24th, 2017 Introduction Robert Kraaij, PhD Erasmus MC, Internal Medicine r.kraaij@erasmusmc.nl Welcome to Microbiomics I Infection & Immunity MSc students Only first day no practicals
More informationEmbeddable Sensor/Actuator Networks for Biological Systems
Embeddable Sensor/Actuator Networks for Biological Systems Networks of sensors and actuators that can be interfaced with biological systems at the cellular and molecular scales. Real-time, label-free sensing
More informationMB311 Molecular Microbiology Laboratory. Spring 2017: Tuesday/Thursday 9:00-11:50 am; Nash 304
MB311 Molecular Microbiology Laboratory Spring 2017: Tuesday/Thursday 9:00-11:50 am; Nash 304 Instructor: Dr. Walt Ream reaml@science.oregonstate.edu ALS1081 737-1791 Office Hours: by appointment. Teaching
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationAn Overview of Probabilistic Methods for RNA Secondary Structure Analysis. David W Richardson CSE527 Project Presentation 12/15/2004
An Overview of Probabilistic Methods for RNA Secondary Structure Analysis David W Richardson CSE527 Project Presentation 12/15/2004 RNA - a quick review RNA s primary structure is sequence of nucleotides
More informationComplex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine
Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine January 18, 2013 Anna D. Barker, Ph.D. Director, Transformative Healthcare Networks C-Director, Complex Adaptive Systems Initiative
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases
More informationIntroduction to DNA-Sequencing
informatics.sydney.edu.au sih.info@sydney.edu.au The Sydney Informatics Hub provides support, training, and advice on research data, analyses and computing. Talk to us about your computing infrastructure,
More informationchoose MBL-REGISTER user: dm00834 password: dm00834 http://register.mbl.edu/ stamps.mbl.edu this uses the username and password on your STAMPS name badge Strategies for Analysis of Microbial Population
More informationWhat is metagenomics?
Metagenomics What is metagenomics? Term first used in 1998 by Jo Handelsman "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments,
More informationImaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized
1 2 3 Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized medicine, risk assessment etc Public Health Bio
More informationComputational methods in bioinformatics: Lecture 1
Computational methods in bioinformatics: Lecture 1 Graham J.L. Kemp 2 November 2015 What is biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of an animal Community All species
More informationContents. Page 1 of 36
Programme-specific Section of the Curriculum for the MSc Programme in Biochemistry at the Faculty of Science, University of Copenhagen 2009 (Rev. 2018) Contents 1 Title, affiliation and language... 2 1.1
More informationGenome 373: Genomic Informatics. Elhanan Borenstein
Genome 373: Genomic Informatics Elhanan Borenstein Genome 373 This course is intended to introduce students to the breadth of problems and methods in computational analysis of genomes and biological systems,
More informationMicrobiomes and metabolomes
Microbiomes and metabolomes Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271
More informationAn introduction into 16S rrna gene sequencing analysis. Stefan Boers
An introduction into 16S rrna gene sequencing analysis Stefan Boers Microbiome, microbiota or metagenomics? Microbiome The entire habitat, including the microorganisms, their genomes (i.e., genes) and
More informationRHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS. Jiarong Guo
RHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS By Jiarong Guo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Microbiology and Molecular
More informationCMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS
CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS (Computational Structural Biology) OUTLINE Review: Molecular biology Proteins: structure, conformation and function(5 lectures) Generalized coordinates,
More informationIntroduction to OTU Clustering. Susan Huse August 4, 2016
Introduction to OTU Clustering Susan Huse August 4, 2016 What is an OTU? Operational Taxonomic Units a.k.a. phylotypes a.k.a. clusters aggregations of reads based only on sequence similarity, independent
More informationEnabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017
Enabling reproducible data analysis for metagenomics eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Outline 16S rrna analysis Current CBIO 16S rrna analysis setup H3ABioNet hackathon
More informationMicroSEQ Rapid Microbial Identifi cation System
APPLICATION NOTE MicroSEQ Rapid Microbial Identifi cation System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identifi cation using the gold-standard genotypic
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationTheory and Application of Multiple Sequence Alignments
Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)
More informationShort Course Instructors
Short Course Instructors Andrew Allen, Ph.D., Professor of Biostatistics and Bioinformatics and Director of the new Duke Center of Statistical Genetics and Genomics, Duke University, has expertise in statistical
More informationA New Database of Genetic and. Molecular Pathways. Minoru Kanehisa. sequencing projects have been. Mbp) and for several bacteria including
Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways Minoru Kanehisa Institute for Chemical Research, Kyoto University From Genome Sequences to Functions The Human Genome Project
More informationMicroSEQ TM ID Rapid Microbial Identification System:
MicroSEQ TM ID Rapid Microbial Identification System: the complete solution for reliable genotypic microbial identification 1 The world leader in serving science Rapid molecular methods for pharmaceutical
More informationGenomics. Data Analysis & Visualization. Camilo Valdes
Genomics Data Analysis & Visualization Camilo Valdes cvaldes3@miami.edu https://github.com/camilo-v Center for Computational Science, University of Miami ccs.miami.edu Today Sequencing Technologies Background
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationDay 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology
Day 3 Examine gels from PCR Learn about more molecular methods in microbial ecology Genes We Targeted 1: dsrab 1800bp 2: mcra 750bp 3: Bacteria 1450bp 4: Archaea 950bp 5: Archaea + 950bp 6: Negative control
More informationBIOINFORMATICS THE MACHINE LEARNING APPROACH
88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationBioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University
Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University Course Information Instructor: Luay Nakhleh (nakhleh@rice.edu); office hours by appointment (office: DH 3119) TA: Leo Elworth (DH
More informationMotivation From Protein to Gene
MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein
More informationmothur Workshop for Amplicon Analysis Michigan State University, 2013
mothur Workshop for Amplicon Analysis Michigan State University, 2013 Tracy Teal MMG / ICER tkteal@msu.edu Kevin Theis Zoology / BEACON theiskev@msu.edu mothur Mission to develop a single piece of open-source,
More informationST 591: Introduction to Quantitative Genomics Syllabus
General Information Instructor: Thomas Sharpton Email: thomas.sharpton@oregonstate.edu Office: 530 Nash Hall Phone: (541) 737-8623 Office Hours: TBD Teaching Assistand: TBD Course credits: 3 Class meetings:
More informationSAMPLE. Interpretive Criteria for Identification of Bacteria and Fungi by Targeted DNA Sequencing
MM18 Interpretive Criteria for Identification of Bacteria and Fungi by Targeted DNA Sequencing This guideline includes information on sequencing DNA targets of cultured isolates, provides a quantitative
More informationGenome 373: High- Throughput DNA Sequencing. Doug Fowler
Genome 373: High- Throughput DNA Sequencing Doug Fowler Tasks give ML unity We learned about three tasks that are commonly encountered in ML Models/Algorithms Give ML Diversity Classification Regression
More informationAdvanced Technology in Phytoplasma Research
Advanced Technology in Phytoplasma Research Sequencing and Phylogenetics Wednesday July 8 Pauline Wang pauline.wang@utoronto.ca Lethal Yellowing Disease Phytoplasma Healthy palm Lethal yellowing of palm
More informationOMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport
OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport Evgueni Doukhanine, Anne Bouevitch, Ashlee Brown, Jessica Gage LaVecchia, Carlos Merino and Lindsay
More information1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute
1 Abstract 2 Introduction The human body is home to an enormous number and diversity of microbes. These microbes, our microbiome, are increasingly thought to be required for normal human development, physiology,
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationAdvisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.
Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:
More informationFungal ITS Bioinformatics Efforts in Alaska
Fungal ITS Bioinformatics Efforts in Alaska D. Lee Taylor ltaylor@iab.alaska.edu Institute of Arctic Biology University of Alaska Fairbanks Shawn Houston Minnesota Supercomputing Institute University of
More informationExperimental Design Microbial Sequencing
Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing
More informationRNA ID missing Word ID missing Word DNA ID missing Word
Table #1 Vocab Term RNA ID missing Word ID missing Word DNA ID missing Word Definition Define Base pairing rules of A=T and C=G are used for this process DNA duplicates, or makes a copy of, itself. Synthesis
More informationConducting Microbiome study, a How to guide
Conducting Microbiome study, a How to guide Sam Zhu Supervisor: Professor Margaret IP Joint Graduate Seminar Department of Microbiology 15 December 2015 Why study Microbiome? ü Essential component, e.g.
More informationGenetics and Bioinformatics
Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationComputational Biology
3.3.3.2 Computational Biology Today, the field of Computational Biology is a well-recognised and fast-emerging discipline in scientific research, with the potential of producing breakthroughs likely to
More informationESSENTIAL BIOINFORMATICS
ESSENTIAL BIOINFORMATICS Essential Bioinformatics is a concise yet comprehensive textbook of bioinformatics that provides a broad introduction to the entire field. Written specifically for a life science
More informationCourse Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses
Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:
More informationFood Safety (Bio-)Informatics
Food Safety (Bio-)Informatics Henk C. den Bakker Assistant Professor in Bioinformatics and Epidemiology Center for Food Safety University of Georgia hcd82599@uga.edu Overview Short introduction of Food
More informationThe Basics of Understanding Whole Genome Next Generation Sequence Data
The Basics of Understanding Whole Genome Next Generation Sequence Data Heather Carleton-Romer, MPH, Ph.D. ASM-CDC Infectious Disease and Public Health Microbiology Postdoctoral Fellow PulseNet USA Next
More informationNext Generation Sequencing. Tobias Österlund
Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationData Mining for Biological Data Analysis
Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han
More informationWhat is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.
What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer
More informationBasics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility
2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,
More informationMetagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome
Also: Sunaina Melissa Gardiner UTS Catherine Burke UTS Michael Liu UTS Chris Beitel UTS, UC Davis Matt DeMaere UTS Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationThe application of hidden markov model in building genetic regulatory network
J. Biomedical Science and Engineering, 2010, 3, 633-637 doi:10.4236/bise.2010.36086 Published Online June 2010 (http://www.scirp.org/ournal/bise/). The application of hidden markov model in building genetic
More informationLesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome
Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationIntroduction to Algorithms in Computational Biology Lecture 1
Introduction to Algorithms in Computational Biology Lecture 1 Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001. This class has been edited from
More informationWHAT IS BIOCHEMISTRY
WHAT IS BIOCHEMISTRY Each part of every living being is biochemically connected. Biochemistry is at the heart of life science. It is a fascinating, diverse and sprawling discipline; which makes it near
More information