MICROBIOME SOFTWARE: END OF BEGINNING.

Size: px
Start display at page:

Download "MICROBIOME SOFTWARE: END OF BEGINNING."

Transcription

1 MICROBIOME SOFTWARE: END OF BEGINNING. DR. CHARLES ROBERTSON DIVISION OF INFECTIOUS DISEASES, UNIVERSITY OF COLORADO SCHOOL OF MEDICINE DR. DANIEL N. FRANK, DIVISION OF INFECTIOUS DISEASES, SCHOOL OF MEDICINE DR. J. KIRK HARRIS, DEPT. OF PEDIATRICS, SCHOOL OF MEDICINE & CHILDREN S HOSPITAL CO

2 OVERVIEW Sequence Data Microbiome Sequence Analysis Tools Results Today: Look at three items in the Black Box

3 OUR MENTOR: NORMAN PACE NORM S MENTOR: CARL WOESE Nucleic acid biochemist Extensive ribozyme work: RNase P Invented the basis of all microbiome studies: The culture independent method Member NAS, election year 1991 Nucleic acid biochemist Discovered the Archaea Put forth the RNA world hypothesis Member NAS, election year 1988

4 MY BACKGROUND Extensive use of computers starting in 1968 Itinerant programmer in the US and Europe BS Electrical Engineering & Computer Science, 1982 School of Engineering, University of Colorado, Boulder 2 years spent doing logic design at a supercomputer company 25 years in the Electronic Design Automation business Building commercial software to solve NP complete problems Hardware description languages, Circuit Simulation, Logic Simulation, Placement, Routing, PCB s & IC s Last position in the EDA industry: CEO PhD, 2008 Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder Current: Hardware/Software/Sequence Analysis Have processed >200 MiSeq runs in the last 5 years (> 2 billion sequences) ~90% medical ~10% environmental (Customers: primarily CU Boulder School of Engineering)

5 THE MICROBIOME PROCESS Sample Extract DNA Amplify One Molecule Sequence Identify & Count Each Sequence Type Community Composition wet bench work computer informatics The primary topic of this presentation

6 The culture independent method use DNA sequences to identify microbes Woese selected the ribosome Ribosome: complex machine that assembles proteins from amino acids per information encoded in chromosomes A heavily constrained portion of the information processing system The ribosome is a ribozyme Shape is everything Precise positioning of reactants and charged ions to get enzymatic activity

7 Small SubUnit rrna 16S

8 Information rich molecule Primary sequence Secondary Structure Information content is non-uniformly distributed across the entire molecule The Tree of Life cannot be reproduced with short sequences Amplicon access via universal primers Desire uniform amplification of all kinds Ever a compromise between length (cost per sequence) and primer locations Small SubUnit rrna 16S Easy to identify phylum by short sequence No simple/consistent way to get to species with short sequences

9 Ordination The Human Microbiome Consortium Nature 486:

10 IN THE BEGINNING, WITH SANGER SEQUENCING Only use full length sequences for analysis The full length sequences had to be ALIGNED (NP complete) Why align? To assure comparison of homologous nucleotides G-CGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACG-C GCC-TAATCG--GGCCATTACGCTTGCGTAATGGCCCGATTA-GGC GCCGT-ATCGAAGGCCATTAC-CTTG-GTAATGGCCCGAT-ACGGC GCCGTAATC---GGC-ATTACGCTTGCGTAAT-GCC-GATTACGGC GCCGTAATCGAAGGCCATTA-GCTTGC-TAATGGCCCGATTACGGC Build phylogenetic trees (NP complete) Iteratively make informed guesses as to the shapes of trees & measure their probabilities Informed trial and error! C T T G G C C G A T T A T A A T C G C G G C A G C A G C C G T A A T A T T A G C C G G C

11 1997, Sanger 1990, Pre and early Sanger Woese, et al. PNAS, 1990 June; 87 (12): Pace, NR. Science,1997 May 2;276(5313):

12 ADVENT OF NEXT GENERATION SEQUENCING Induced very rapid change due to very large decrease in price per sequence Sequences/Sample: Sanger/454/MiSeq: 96/8,000/100,000 Other scientific disciplines suddenly very motivated to explore the microbiomes of their knowledge sub-domains Ecologists Geologists Physicians A big problem arose: Alignment & Tree building s/w of that time did not scale well Existing analysis approaches (computers/software tools) could not cope with the onslaught of the large number of sequences in NGS datasets

13 SOLUTION: NEW TOOLS THE PROGRAMMERS ARRIVE Adopt new languages and rapid prototyping software creation processes Eg, the Python programming language Abandon NP complete processes Vigorously assert all of the following Full length sequences not always needed Local (or no) alignment good enough Just stop building phylogenetic trees (for the most part)

14 ITEM ONE IN THE BLACK BOX: NUMERICAL OTUS Per SOP s of Qiime and Mothur: Create numerical OTUs Generate enumerated clusters of sequences that are sort of close ( close enough, say 3%) Pick a single sequence as a representative of each cluster Classify only the representative sequence which is then attributed to all sequences in the cluster Less classification means faster dataset processing

15 CREATING CLUSTERS Intuitive example, that has issues similar to sequence clustering. Let the radius of a circle represent the size of an OTU, eg 3% OTU Picking with fixed radius clusters: Numerical OTUs are NOT canonical: Completely dependent on selection rules: Order & packing heuristic We don t have a theoretical framework guided by biochemistry, biology, etc, to inform how the clusters are to be created everyone is as correct as anyone else, but they are NOT DIRECTLY COMPARABLE.

16 PICKING ONE REPRESENTATIVE FOR THE CLUSTER Which dot is the single best representative for this OTU cluster? Why? Again: no theoretical biochemical/etc. framework to inform the selection of representatives of clusters Arguments can be made for various approaches, but the arguments are NOT based on biochemistry or biology they are based on statistics or computer science (which means programmer convenience)

17 CLUSTERING YIELDS BIASED RESULTS Numerical OTU s do a great job of enumerating differences between sets of sequences Great insights via ordination However: Clustering usually superposes a model (3% species bins) that does not fit current observations based on the Big Tree For medical analyses different organisms often appear in a single cluster Clustering adds a bias to the results The representative sequence does not appropriately match all of the sequences in the clusters The true positions of individual sequences become fuzzier

18 ITEM TWO FROM THE BLACK BOX: CLASSIFICATION The RDP Classifier: Naïve Bayesian Classification Eliminated the need for 2 computationally intensive activities: alignment & tree building How does it work? Start with unaligned sequence data and associated taxonomy lines (aka, The Training Set) Use Bayes Theorem to generate probability coefficients that allows very fast classification of unknown sequences RDP Classifier Unknown sequence Bacteria/Proteobacteria/ /E. coli Probabilistic Binning Cloud Norm Pace Bayes Theorem:

19 TRAINING SETS: UNALIGNED Makes use of unaligned reference sequences G-CGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACG-C GCC-TAATCG--GGCCATTACGCTTGCGTAATGGCCCGATTA-GGC GCCGT-ATCGAAGGCCATTAC-CTTG-GTAATGGCCCGAT-ACGGC GCCGTAATC---GGC-ATTACGCTTGCGTAAT-GCC-GATTACGGC GCCGTAATCGAAGGCCATTA-GCTTGC-TAATGGCCCGATTACGGC Divide into groups of 8 columns: 8-mers GCGTAATCGAAGGCCATTACGCTTGCGTAATGGCCCGATTACGC.. GCCTAATCGGGCCATTACGCTTGCGTAATGGCCCGATTAGGC... GCCGTATCGAAGGCCATTACCTTGGTAATGGCCCGATACGGC... GCCGTAATCGGCATTACGCTTGCGTAATGCCGATTACGGC... GCCGTAATCGAAGGCCATTAGCTTGCTAATGGCCCGATTACGGC.. Classic alignment Retains correlation with the secondary structure! RDP classifier training set Loses correlation with the secondary structure! C T T G G C C G A T T A T A A T C G C G G C A G C A G C C G T A A T A T T A G C C G G C Using unaligned training sets changes precise boundaries into vague boundaries: Noise.

20 NAÏVE BAYES CLASSIFICATION: PROS/CONS Good Very fast (Computer: just multiplications and additions) Ubiquitous (Qiime/Mothur/RDP website) Bad (from our perspective) Training sets are often unstable don t get out what you put in Creating a stable training set is a black art To get the results you want, often have to add/delete apparently completely unrelated sequences The result provides no clues whatsoever as to how the classifier came up with the answer 100% oracle, 0% insight Which sequence in the reference training set was closest to an unknown? For very similar sequences, which few nucleotides were different? Does not provide an AUDIT TRAIL: critical for clinical medicine & epidemiology! which known species in the database was the basis for an unknown called as that species?

21 NO UNIVERSAL CONSTANTS IN BIOLOGY Biology is an intrinsically observational activity The process: collect and assemble anecdotes Insight arises when a critical mass of anecdotes is accumulated No predictive mathematical formulations have been forthcoming, eg: No speed of light, No E = MC 2, Not even an Ohm s law equivalent How many kinds of microbes exist on the planet FOR CERTAIN? How much sequence distance exists within ALL species level clades in the Big Tree FOR CERTAIN? In retrospect 3% species should NOT have been enshrined in microbiome tools For new organisms: it must always come back to a pairwise comparison As it was for Linnaeus, so it is still for us. The new organism must be compared to the most similar organism that has already been documented The lack of a numeric predictive theoretical framework is at odds with Bioinformatics Software demands very specific answers to questions like: what means nearby between two sequences?

22 SO WHAT? There are limits to the precision we can get with numerical OTUs and Bayesian classification Where in the world are the error bars on these processes? Effective software solutions exist that are not based on numerical OTUs & Bayesian classification We are at the end of the beginning of microbiome analysis It is time to re-evaluate all of the fundamental assumptions to get to the future Next: The biggest bleeding sore : The libraries of reference sequences we all use.

23 CURATED DATABASES OF FULL LENGTH 16S SEQUENCES Why not just NCBI/EMBL? no attempt at all to place sequences in a phylogenetic context. Submitted sequences not unambiguously derived from cultivated sources are assigned taxonomy Environmental/Uncultivated The two most commonly used curated 16S phylogenetic databases: Greengenes & Silva Greengenes from the Pace Lab via Phil Hugenholtz to JGI. Qiime default Greengenes Database Consortium/2 nd Genome: but current status unclear. No updates since May, 2013 Silva Microbial Genomics Group at the Max Planck Institute for Marine Microbiology, Bremen and the Department of Microbiology at the Technical University Munich. Mothur default Well documented releases at somewhat irregular intervals; releases locked to EMBL versions. Latest: Silva 128, Sept 28, Silva >>> Greengenes

24 THE PRECISION LIMIT: REFERENCE SEQUENCES The most significant microbiome tools limit: database content All microbiome tools are vetted against, or make intrinsic use of these curated 16S databases How do we know issues exist? Recent availability of many microbial GENOMES rrna s of microbial genomes are relatively clean : uniform, consistent, little variation within By comparison with genomes rrnas, many database sequences have non-subtle defects Missing pieces, added pieces, perturbed secondary structures Database sequence defects source? the mishmash of sequencing technologies over the ages: Sanger, 454, Illumina We did not know what we did not know best efforts at the time Sequence databases are the ultimate Hotel California Sequences check into databases but they never leave. Infinite academic collegiality is in force No non-confrontational means to resolve issues.

25 DATABASE CURATION IS HARD, EXPENSIVE, UNDERFUNDED Even the best rrna database inadequate for calls to species level These databases are the equivalent of the literature and museums that Linnaeus used to deduce relationships: if we get them wrong, uncertainty propagates. Are genomes the silver bullet for high precision reference sequences? Evaluation of rrnas from genomes in Silva 128 finds some with defects: case-by-case scrutiny required! Defects: Missing pieces, added pieces, perturbed secondary structures, protein content Errors likely due to assembly process errors (software as oracle, again!) Most genomicists do not go to extra effort to verify structure of rrnas (focus is on proteins) But: genomes are clearly consistently better Current databases need to be re-evaluated in the light of the genomes rrna sequences! Fundamental career limiting disincentive for database work: The work is considered to be significant but NOT INNOVATIVE therefore, NO FUNDING

26 REFLECTIONS ON THE JOURNEY Phylogenetic analysis of full length sequences is still the gold standard High volume analysis techniques must be characterized in light of phylogenetics Taxonomic error bar characterization needed for microbiome analysis Numerical OTUs have and will continue to provide utility BUT: to maximize biological, biochemical, and evolutionary insight we need the most precise taxonomy calls that can be attained Let go of universal numeric constants! Reference Sequence Databases New Focus, Means (IEEE style?), and Funding mechanism required change the quid pro quo for this work! Perhaps just a wee bit of rebalance of focus back toward biochemistry instead of software?

27 OUR SOFTWARE SPECIFIC FUNDING NIH R21HG (Frank) CIHR Genome Canada (Parkinson) NIH UH2DK (Li)

28 THE END

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

CBC Data Therapy. Metagenomics Discussion

CBC Data Therapy. Metagenomics Discussion CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,

More information

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11) Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis

More information

Joint RuminOmics/Rumen Microbial Genomics Network Workshop

Joint RuminOmics/Rumen Microbial Genomics Network Workshop Joint RuminOmics/Rumen Microbial Genomics Network Workshop Microbiome analysis - Amplicon sequencing Dr. Sinéad Waters Animal and Bioscience Research Department, Teagasc Grange, Ireland Prof. Leluo Guan

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State Practical Bioinformatics for Life Scientists Week 14, Lecture 27 István Albert Bioinformatics Consulting Center Penn State No homework this week Project to be given out next Thursday (Dec 1 st ) Due following

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

Applications of Next Generation Sequencing in Metagenomics Studies

Applications of Next Generation Sequencing in Metagenomics Studies Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno

More information

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics

More information

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Microbiome Analysis. Research Day 2012 Ranjit Kumar Microbiome Analysis Research Day 2012 Ranjit Kumar Human Microbiome Microorganisms Bad or good? Human colon contains up to 100 trillion bacteria. Human microbiome - The community of bacteria that live

More information

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1

More information

COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES

COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES COMPARING MICROBIAL COMMUNITY RESULTS FROM DIFFERENT SEQUENCING TECHNOLOGIES Tyler Bradley * Jacob R. Price * Christopher M. Sales * * Department of Civil, Architectural, and Environmental Engineering,

More information

Bioinformatics for Microbial Biology

Bioinformatics for Microbial Biology Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:

More information

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador I AM NOT A METAGENOMIC EXPERT I am merely the MESSENGER Blaise T.F. Alako, PhD EBI Ambassador blaise@ebi.ac.uk Hubert Denise Alex Mitchell Peter Sterk Sarah Hunter http://www.ebi.ac.uk/metagenomics Blaise

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map

More information

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

Robert Edgar. Independent scientist

Robert Edgar. Independent scientist Robert Edgar Independent scientist robert@drive5.com www.drive5.com Reads FASTQ format Millions of reads Many Gb USEARCH commands "UPARSE pipeline" OTU sequences FASTA format >Otu1 GATTAGCTCATTCGTA >Otu2

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

HMP Data Set Documentation

HMP Data Set Documentation HMP Data Set Documentation Introduction This document provides detail about files available via the DACC website. The goal of the HMP consortium is to make the metagenomics sequence data generated by the

More information

Bellerophon; a program to detect chimeric sequences in multiple sequence

Bellerophon; a program to detect chimeric sequences in multiple sequence Revised ms: BIOINF-03-0817 Bellerophon; a program to detect chimeric sequences in multiple sequence alignments. Thomas Huber 1 *, Geoffrey Faulkner 1 and Philip Hugenholtz 2 1 ComBinE group, Advanced Computational

More information

CSC 121 Computers and Scientific Thinking

CSC 121 Computers and Scientific Thinking CSC 121 Computers and Scientific Thinking Fall 2005 Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

Microbiome: Metagenomics 4/4/2018

Microbiome: Metagenomics 4/4/2018 Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms

More information

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight Introducing QIAseq Accelerate your NGS performance through Sample to Insight solutions Sample to Insight From Sample to Insight let QIAGEN enhance your NGS-based research High-throughput next-generation

More information

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview Bioinformatics Some selected examples... and a bit of an overview Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July 19, 2007 @ EnviroHealth Connections Bioinformatics and

More information

MicroSEQ Rapid Microbial Identification System

MicroSEQ Rapid Microbial Identification System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University

mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur tutorial STAMPS, 2013 Kevin R. Theis Department of Zoology BEACON Center for the Study of Evolution in Action Michigan State University mothur Mission to develop a single piece of open-source, expandable

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature09944 Supplementary Figure 1. Establishing DNA sequence similarity thresholds for phylum and genus levels Sequence similarity distributions of pairwise alignments of 40 universal single

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Computers in Biology and Bioinformatics

Computers in Biology and Bioinformatics Computers in Biology and Bioinformatics 1 Biology biology is roughly defined as "the study of life" it is concerned with the characteristics and behaviors of organisms, how species and individuals come

More information

Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine

Microbiomics I August 24th, Introduction. Robert Kraaij, PhD Erasmus MC, Internal Medicine Microbiomics I August 24th, 2017 Introduction Robert Kraaij, PhD Erasmus MC, Internal Medicine r.kraaij@erasmusmc.nl Welcome to Microbiomics I Infection & Immunity MSc students Only first day no practicals

More information

Embeddable Sensor/Actuator Networks for Biological Systems

Embeddable Sensor/Actuator Networks for Biological Systems Embeddable Sensor/Actuator Networks for Biological Systems Networks of sensors and actuators that can be interfaced with biological systems at the cellular and molecular scales. Real-time, label-free sensing

More information

MB311 Molecular Microbiology Laboratory. Spring 2017: Tuesday/Thursday 9:00-11:50 am; Nash 304

MB311 Molecular Microbiology Laboratory. Spring 2017: Tuesday/Thursday 9:00-11:50 am; Nash 304 MB311 Molecular Microbiology Laboratory Spring 2017: Tuesday/Thursday 9:00-11:50 am; Nash 304 Instructor: Dr. Walt Ream reaml@science.oregonstate.edu ALS1081 737-1791 Office Hours: by appointment. Teaching

More information

Two Mark question and Answers

Two Mark question and Answers 1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three

More information

An Overview of Probabilistic Methods for RNA Secondary Structure Analysis. David W Richardson CSE527 Project Presentation 12/15/2004

An Overview of Probabilistic Methods for RNA Secondary Structure Analysis. David W Richardson CSE527 Project Presentation 12/15/2004 An Overview of Probabilistic Methods for RNA Secondary Structure Analysis David W Richardson CSE527 Project Presentation 12/15/2004 RNA - a quick review RNA s primary structure is sequence of nucleotides

More information

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine January 18, 2013 Anna D. Barker, Ph.D. Director, Transformative Healthcare Networks C-Director, Complex Adaptive Systems Initiative

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases

More information

Introduction to DNA-Sequencing

Introduction to DNA-Sequencing informatics.sydney.edu.au sih.info@sydney.edu.au The Sydney Informatics Hub provides support, training, and advice on research data, analyses and computing. Talk to us about your computing infrastructure,

More information

choose MBL-REGISTER user: dm00834 password: dm00834 http://register.mbl.edu/ stamps.mbl.edu this uses the username and password on your STAMPS name badge Strategies for Analysis of Microbial Population

More information

What is metagenomics?

What is metagenomics? Metagenomics What is metagenomics? Term first used in 1998 by Jo Handelsman "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments,

More information

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized 1 2 3 Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized medicine, risk assessment etc Public Health Bio

More information

Computational methods in bioinformatics: Lecture 1

Computational methods in bioinformatics: Lecture 1 Computational methods in bioinformatics: Lecture 1 Graham J.L. Kemp 2 November 2015 What is biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of an animal Community All species

More information

Contents. Page 1 of 36

Contents. Page 1 of 36 Programme-specific Section of the Curriculum for the MSc Programme in Biochemistry at the Faculty of Science, University of Copenhagen 2009 (Rev. 2018) Contents 1 Title, affiliation and language... 2 1.1

More information

Genome 373: Genomic Informatics. Elhanan Borenstein

Genome 373: Genomic Informatics. Elhanan Borenstein Genome 373: Genomic Informatics Elhanan Borenstein Genome 373 This course is intended to introduce students to the breadth of problems and methods in computational analysis of genomes and biological systems,

More information

Microbiomes and metabolomes

Microbiomes and metabolomes Microbiomes and metabolomes Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271

More information

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

An introduction into 16S rrna gene sequencing analysis. Stefan Boers An introduction into 16S rrna gene sequencing analysis Stefan Boers Microbiome, microbiota or metagenomics? Microbiome The entire habitat, including the microorganisms, their genomes (i.e., genes) and

More information

RHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS. Jiarong Guo

RHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS. Jiarong Guo RHIZOSPHERE METAGENOMICS OF THREE BIOFUEL CROPS By Jiarong Guo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Microbiology and Molecular

More information

CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS

CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS (Computational Structural Biology) OUTLINE Review: Molecular biology Proteins: structure, conformation and function(5 lectures) Generalized coordinates,

More information

Introduction to OTU Clustering. Susan Huse August 4, 2016

Introduction to OTU Clustering. Susan Huse August 4, 2016 Introduction to OTU Clustering Susan Huse August 4, 2016 What is an OTU? Operational Taxonomic Units a.k.a. phylotypes a.k.a. clusters aggregations of reads based only on sequence similarity, independent

More information

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Enabling reproducible data analysis for metagenomics eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Outline 16S rrna analysis Current CBIO 16S rrna analysis setup H3ABioNet hackathon

More information

MicroSEQ Rapid Microbial Identifi cation System

MicroSEQ Rapid Microbial Identifi cation System APPLICATION NOTE MicroSEQ Rapid Microbial Identifi cation System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identifi cation using the gold-standard genotypic

More information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical

More information

Theory and Application of Multiple Sequence Alignments

Theory and Application of Multiple Sequence Alignments Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)

More information

Short Course Instructors

Short Course Instructors Short Course Instructors Andrew Allen, Ph.D., Professor of Biostatistics and Bioinformatics and Director of the new Duke Center of Statistical Genetics and Genomics, Duke University, has expertise in statistical

More information

A New Database of Genetic and. Molecular Pathways. Minoru Kanehisa. sequencing projects have been. Mbp) and for several bacteria including

A New Database of Genetic and. Molecular Pathways. Minoru Kanehisa. sequencing projects have been. Mbp) and for several bacteria including Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways Minoru Kanehisa Institute for Chemical Research, Kyoto University From Genome Sequences to Functions The Human Genome Project

More information

MicroSEQ TM ID Rapid Microbial Identification System:

MicroSEQ TM ID Rapid Microbial Identification System: MicroSEQ TM ID Rapid Microbial Identification System: the complete solution for reliable genotypic microbial identification 1 The world leader in serving science Rapid molecular methods for pharmaceutical

More information

Genomics. Data Analysis & Visualization. Camilo Valdes

Genomics. Data Analysis & Visualization. Camilo Valdes Genomics Data Analysis & Visualization Camilo Valdes cvaldes3@miami.edu https://github.com/camilo-v Center for Computational Science, University of Miami ccs.miami.edu Today Sequencing Technologies Background

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology Day 3 Examine gels from PCR Learn about more molecular methods in microbial ecology Genes We Targeted 1: dsrab 1800bp 2: mcra 750bp 3: Bacteria 1450bp 4: Archaea 950bp 5: Archaea + 950bp 6: Negative control

More information

BIOINFORMATICS THE MACHINE LEARNING APPROACH

BIOINFORMATICS THE MACHINE LEARNING APPROACH 88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and

More information

I nternet Resources for Bioinformatics Data and Tools

I nternet Resources for Bioinformatics Data and Tools ~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.

More information

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University Course Information Instructor: Luay Nakhleh (nakhleh@rice.edu); office hours by appointment (office: DH 3119) TA: Leo Elworth (DH

More information

Motivation From Protein to Gene

Motivation From Protein to Gene MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein

More information

mothur Workshop for Amplicon Analysis Michigan State University, 2013

mothur Workshop for Amplicon Analysis Michigan State University, 2013 mothur Workshop for Amplicon Analysis Michigan State University, 2013 Tracy Teal MMG / ICER tkteal@msu.edu Kevin Theis Zoology / BEACON theiskev@msu.edu mothur Mission to develop a single piece of open-source,

More information

ST 591: Introduction to Quantitative Genomics Syllabus

ST 591: Introduction to Quantitative Genomics Syllabus General Information Instructor: Thomas Sharpton Email: thomas.sharpton@oregonstate.edu Office: 530 Nash Hall Phone: (541) 737-8623 Office Hours: TBD Teaching Assistand: TBD Course credits: 3 Class meetings:

More information

SAMPLE. Interpretive Criteria for Identification of Bacteria and Fungi by Targeted DNA Sequencing

SAMPLE. Interpretive Criteria for Identification of Bacteria and Fungi by Targeted DNA Sequencing MM18 Interpretive Criteria for Identification of Bacteria and Fungi by Targeted DNA Sequencing This guideline includes information on sequencing DNA targets of cultured isolates, provides a quantitative

More information

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

Genome 373: High- Throughput DNA Sequencing. Doug Fowler Genome 373: High- Throughput DNA Sequencing Doug Fowler Tasks give ML unity We learned about three tasks that are commonly encountered in ML Models/Algorithms Give ML Diversity Classification Regression

More information

Advanced Technology in Phytoplasma Research

Advanced Technology in Phytoplasma Research Advanced Technology in Phytoplasma Research Sequencing and Phylogenetics Wednesday July 8 Pauline Wang pauline.wang@utoronto.ca Lethal Yellowing Disease Phytoplasma Healthy palm Lethal yellowing of palm

More information

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport

OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport OMNIgene GUT stabilizes the microbiome profile at ambient temperature for 60 days and during transport Evgueni Doukhanine, Anne Bouevitch, Ashlee Brown, Jessica Gage LaVecchia, Carlos Merino and Lindsay

More information

1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute

1 Abstract. 2 Introduction. 3 Requirements. Most Wanted Taxa from the Human Microbiome The Broad Institute 1 Abstract 2 Introduction The human body is home to an enormous number and diversity of microbes. These microbes, our microbiome, are increasingly thought to be required for normal human development, physiology,

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not

More information

Engineering Genetic Circuits

Engineering Genetic Circuits Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art

More information

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College.

Advisors: Prof. Louis T. Oliphant Computer Science Department, Hiram College. Author: Sulochana Bramhacharya Affiliation: Hiram College, Hiram OH. Address: P.O.B 1257 Hiram, OH 44234 Email: bramhacharyas1@my.hiram.edu ACM number: 8983027 Category: Undergraduate research Advisors:

More information

Fungal ITS Bioinformatics Efforts in Alaska

Fungal ITS Bioinformatics Efforts in Alaska Fungal ITS Bioinformatics Efforts in Alaska D. Lee Taylor ltaylor@iab.alaska.edu Institute of Arctic Biology University of Alaska Fairbanks Shawn Houston Minnesota Supercomputing Institute University of

More information

Experimental Design Microbial Sequencing

Experimental Design Microbial Sequencing Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing

More information

RNA ID missing Word ID missing Word DNA ID missing Word

RNA ID missing Word ID missing Word DNA ID missing Word Table #1 Vocab Term RNA ID missing Word ID missing Word DNA ID missing Word Definition Define Base pairing rules of A=T and C=G are used for this process DNA duplicates, or makes a copy of, itself. Synthesis

More information

Conducting Microbiome study, a How to guide

Conducting Microbiome study, a How to guide Conducting Microbiome study, a How to guide Sam Zhu Supervisor: Professor Margaret IP Joint Graduate Seminar Department of Microbiology 15 December 2015 Why study Microbiome? ü Essential component, e.g.

More information

Genetics and Bioinformatics

Genetics and Bioinformatics Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Computational Biology

Computational Biology 3.3.3.2 Computational Biology Today, the field of Computational Biology is a well-recognised and fast-emerging discipline in scientific research, with the potential of producing breakthroughs likely to

More information

ESSENTIAL BIOINFORMATICS

ESSENTIAL BIOINFORMATICS ESSENTIAL BIOINFORMATICS Essential Bioinformatics is a concise yet comprehensive textbook of bioinformatics that provides a broad introduction to the entire field. Written specifically for a life science

More information

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:

More information

Food Safety (Bio-)Informatics

Food Safety (Bio-)Informatics Food Safety (Bio-)Informatics Henk C. den Bakker Assistant Professor in Bioinformatics and Epidemiology Center for Food Safety University of Georgia hcd82599@uga.edu Overview Short introduction of Food

More information

The Basics of Understanding Whole Genome Next Generation Sequence Data

The Basics of Understanding Whole Genome Next Generation Sequence Data The Basics of Understanding Whole Genome Next Generation Sequence Data Heather Carleton-Romer, MPH, Ph.D. ASM-CDC Infectious Disease and Public Health Microbiology Postdoctoral Fellow PulseNet USA Next

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome

Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin microbiome Also: Sunaina Melissa Gardiner UTS Catherine Burke UTS Michael Liu UTS Chris Beitel UTS, UC Davis Matt DeMaere UTS Metagenomic 3C, full length 16S amplicon sequencing on Illumina, and the diabetic skin

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

The application of hidden markov model in building genetic regulatory network

The application of hidden markov model in building genetic regulatory network J. Biomedical Science and Engineering, 2010, 3, 633-637 doi:10.4236/bise.2010.36086 Published Online June 2010 (http://www.scirp.org/ournal/bise/). The application of hidden markov model in building genetic

More information

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful

More information

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),

More information

Introduction to 'Omics and Bioinformatics

Introduction to 'Omics and Bioinformatics Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current

More information

Introduction to Algorithms in Computational Biology Lecture 1

Introduction to Algorithms in Computational Biology Lecture 1 Introduction to Algorithms in Computational Biology Lecture 1 Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001. This class has been edited from

More information

WHAT IS BIOCHEMISTRY

WHAT IS BIOCHEMISTRY WHAT IS BIOCHEMISTRY Each part of every living being is biochemically connected. Biochemistry is at the heart of life science. It is a fascinating, diverse and sprawling discipline; which makes it near

More information