(SHOTGUN) METAGENOMICS. Hélène Touzet, CNRS, CRIStAL

Size: px
Start display at page:

Download "(SHOTGUN) METAGENOMICS. Hélène Touzet, CNRS, CRIStAL"

Transcription

1 (SHOTGUN) METAGENOMICS Hélène Touzet, CNRS, CRIStAL

2 Shotgun sequencing for community samples Metagenomics potentially sequences all fragmented DNA in a community includes all microorganisms and viruses gives access to all genes across the entire genomes Metatranscriptomics potentially sequences all fragmented RNA in a community activity of the genes

3 Shotgun sequencing versus amplicon sequencing Who is there? more complete taxonomic information no bias due to PCR amplification (primers) access to the full genomes and genes captures genomes which lack amplicon targets, such as viruses What are they doing? functional information analysis of gene functions, metabolic pathways, etc. to identify the functional potential of the community more expensive new challenges in terms of data processing, storage and analysis

4 Content of this lecture Taxonomic analysis Some general ideas, principles and tools Functional analysis Some general ideas, principles and tools Not presented today : Richness, comparative analysis

5 short reads individual reads assembly filtering 16S reads selection of markers all reads one genome per organism possible / not possible? cf amplicons PhyloSift Metaphlan2 MG-rast MEGAN EBI kraken centrifuge... this afternoon taxonomic classification functional analysis

6 Taxonomic classification Input short reads from a single shotgun metagenomic sequencing experiment (FASTA or FASTQ) Output list of detected microbes and their relative abundances

7 Three main approaches 1. focus on a single phylogenetic marker 2. utilization of multiple markers 3. whole information contained in the reads

8 Approach 1 : One marker choice of the phylogenetic marker ubiquitous in the environment showing some differences between species identification of the reads corresponding to the marker processing of the extracted reads direct classification of the raw reads Qiime, MAPseq,... reconstruction of the full sequence of the marker gene before classification Emirge 2011, MATAM 2017

9 Approach 2 : Multiple markers How to choose the markers? all possible genes for all possible genomes redundant information high running-time due to the size of the database not such a good idea selection of a few universal phylogenetics markers phylosift Selection of clade-specific marker sequences Metaphlan2

10 Phylosift 37 families of elite marker genes congruent phylogenetic histories represent about 1% of an average bacterial genome 16S and 18S ribosomal RNA genes mitochondrial gene families eukaryote-specific gene families viral gene families

11 Metaphlan2 Metagenomic Phylogenetic Analysis successor of Metaphlan (2012, Human Microbiome Project) markers and quasi-markers coding sequences that unequivocally identify specific microbial clades at the species level or higher taxonomic levels markers : specific of the clade quasi-markers : show a minimal number of sequence hits in genomes outside the clade

12

13 pre-computed database of markers and pseudo markers + clades (LCA in the taxonomy) short read bacteria : 770,000 markers + 130,000 pseudomarkers from 13,000 genomes archaea : 460,000 markers + 4,600 pseudomarkers from 300 genomes eukaryotes : 22,400 markers + 2,550 pseudomarkes from 110 genomes virus : 38,800 markers + 23,000 pseudomarkers from 3500 genomes

14 Metaphlan2 pipeline 1. mapping of short reads on the marker database (Bowtie2) 2. calculation of the relative abundance of each taxonomic unit priority to (strict) markers quasi-markers are added only if the number of (strict) markers is < 200 normalization of the total number of reads in each clade by the nucleotide length of its markers 3. unclassified subclades : reads belonging to clades with no available sequenced genomes are reported as an unclassified subclade of the closest ancestor for which there is available sequence data

15 SampleID Metaphlan2_Analysis k Bacteria k Bacteria p Firmicutes k Bacteria p Bacteroidetes k Bacteria p Actinobacteria k Bacteria p Firmicutes c Negativicutes k Bacteria p Bacteroidetes c Bacteroidia k Bacteria p Firmicutes c Bacilli k Bacteria p Actinobacteria c Actinobacteria k Bacteria p Firmicutes c Negativicutes o Selenomonadales k Bacteria p Bacteroidetes c Bacteroidia o Bacteroidales k Bacteria p Firmicutes c Bacilli o Lactobacillales k Bacteria p Actinobacteria c Actinobacteria o Actinomycetales k Bacteria p Firmicutes c Negativicutes o Selenomonadales f Veillon k Bacteria p Bacteroidetes c Bacteroidia o Bacteroidales f Prevotel k Bacteria p Firmicutes c Bacilli o Lactobacillales f Streptococcac k Bacteria p Actinobacteria c Actinobacteria o Actinomycetales f Ac Kingdom Phylum Class Order Family Genus Species Strain

16 SampleID Metaphlan2_Analysis k Bacteria k Bacteria p Firmicutes k Bacteria p Bacteroidetes k Bacteria p Actinobacteria k Bacteria p Firmicutes c Negativicutes k Bacteria p Bacteroidetes c Bacteroidia k Bacteria p Firmicutes c Bacilli k Bacteria p Actinobacteria c Actinobacteria k Bacteria p Firmicutes c Negativicutes o Selenomonadales k Bacteria p Bacteroidetes c Bacteroidia o Bacteroidales k Bacteria p Firmicutes c Bacilli o Lactobacillales k Bacteria p Actinobacteria c Actinobacteria o Actinomycetales k Bacteria p Firmicutes c Negativicutes o Selenomonadales f Veillon k Bacteria p Bacteroidetes c Bacteroidia o Bacteroidales f Prevotel k Bacteria p Firmicutes c Bacilli o Lactobacillales f Streptococcac k Bacteria p Actinobacteria c Actinobacteria o Actinomycetales f Ac Kingdom Phylum Class Order Family Genus Species Strain

17 Installation and usage local installation only (python) tutorial hclust2 : construction of heatmap graphlan : visualisation with cladograms

18 Approach 3 : Raw information reference genomes + taxonomy no annotation, no phylogenetic markers no alignment between the read and the database GTGCATAGTGAAAACGGAGTTGCTGACGACGAAAGCGACATTGGGATCTGTCTGTTGTCATTCGCGG CGAGGCGGACACTGATTGACACGGTTTTGCAGAAGGTTAGGGGAATAGGTTAAATTGAGTGGCTTAA GGATTAAAGTGTAGTAAACTGTGATTAACGGAGACGGTTTTAAGACAGGAGTTCGCAAAATCAAGCG GTTATTCCTGGTGGTTTAGGCGTACAATGTCCTGAAGAATATTTAAGAAAAAAGCACCCCTCGTCGC CGCGGTCGACCATACCTTCGATTATCGCGCCCACTCTCCCATTAGTCGGCAGAGGTGGTTGTGTTGC ATATTCTAAGGCGTTACGCTGATGAATATTCTACGGAATTGCCATAGGCGTTGAACGCTACACGGAC TATAGAGCGGGTCATCGAAAGGTTATACTCTTGTAGTTAACATGTAGCCCGGCCCTATTAGTACAGC CATTCTCATTATTAAATTTTCTCTACAGCCAAACGACCAAGTGCATTTCCACGGAGCGCGATGGAGA AGCTCTGTAATAGGGACTAAAAGAGTGATGATAATCATGAGTGCCGCGTTATGGTGGTGTCGGAACA CCAGTCGTATTCCTTCTCGAGTTCCGTCCAGTTAAGCGTGACACTCCCAGTGTACCCGCAAACCGTG AGTCAATCGCATGTAGGATGGTCTCCAGACACCGGGGCACCAGTTTTCACGCCCAAAGCATAAACGA AGTCTTAGAACTGGACGTGCCGTTTCTCTGCGAATAATACCTCGAGCTGTACCGTTGTTGCGCTGCC CTCTTATCACATTTGCTTCGACGACTGC

19 Kraken exact k-mer matching complete bacterial, archaeal, and viral genomes in RefSeq NCBI Wood and Salzberg Genome Biology 2014, 15:R46 METHOD Kraken: ultrafast metagenomic sequence classification using exact alignments Derrick E Wood 1,2* and Steven L Salzberg 2,3 Open Access Abstract

20 all 31-mers present in the database + LCA (lowest common ancestor) AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT AAAAAAAAAAAAAAAAAAAAAAAAAAAAACA e9 disctinct k-mers (oct 2017) << 4 31 = 4.6e18 all completed microbial genomes of the RefSeq database 47,768 bacteria + 1,034 archaea + 7,530 viruses Precomputed database

21 1. short read overlapping k-mers 3. assignation of the read 2. identification of the LCA in the taxonomy for each k-mer Read assignation

22 Performances of Kraken very fast excellent results with known/poor results with unknow species high memory demanding 500 GB of disk space to build the database 200 GB to store it Minikraken : reduced databases DB 4GB : 2.7% of k-mers from the original database DB 8GB : 5% of k-mers from the original database Centrifuge : space-efficient evolution of Kraken Burrows-Wheeler Transform

23 Installation and usage kraken : local installation only (C++) centrifuge : local installation only (C++) command line interface

24 Similar tools following the same paradigm LMAT, 2013 Scalable metagenomic taxonomy classification using a reference genome database. Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. Bioinformatics, doi : /bioinformatics/btt389 Clark, 2015 CLARK : fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers R. Ounit, S. Wanamaker, T.J. Close, S. Lonardi BMC Genomics ; 16(1) : 236 One codex (commercial, free demo version) web server based on kraken algorithm registration required Kaiju, 2017 both web server and local installation (CLI) Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7 :11257

25

26 Functional classification Input short reads from a single shotgun metagenomic sequencing experiment (FASTA or FASTQ) Output functions of the genes present in the community

27 How to annotate mixtures of short reads? ab initio approaches prediction of coding regions in short reads FragGeneScan homology-based approaches alignment of short reads to a large database of annotated sequences choice of the alignment tool, DNA /protein : BlastX, Diamond,... choice of the database : Eggnog, SEEDS, KEGG, Interpro,... built-in pipelines : MG-RAST, MEGAN, EBI,...

28 FragGeneScan predict protein-coding regions from environmental sequences HMM combining codon usage bias + start/stop codon models (like Glimmer or Genemark) sequencing error models included in the MG-Rast and EBI pipelines

29 Alignment tools Query : large set of short DNA reads Reference : large protein database Pre-NGS tools : BlastX, BLAT Diamond optimized to deal with sort reads order of magnitude faster than BlastX for this kind of data ( 1000)

30 Interpro developed at EBI since 1999 signatures for protein families, domains and functional sites collected from 14 databases mappings of InterPro entries to Gene Ontology (GO) terms (InterPro2GO)

31 KEGG collection of databases : genomes, genes, metabolic pathways, diseases,... KO entries : group of genes representing functional orthologs in the molecular networks

32 1 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany, 2 Institute of Molecular Life Sciences, University of Zurich, Zurich 8057, Switzerland, 3 Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich 8057, Switzerland, 4 The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N 2200, Denmark, 5 Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg 85764, Germany, 6 CUBE Division of Computational Systems Biology, EggNOG database of orthologous groups of proteins at different taxonomic levels + functional annotation clustering of the 9.6 million proteins from 2031 genomes viruses OG (orthologous groups) D286 D293 Nucleic Acids Research, 2016, Vol. 44, Database issue Published online 17 November 2015 doi: /nar/gkv1248 eggnog 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences Jaime Huerta-Cepas 1, Damian Szklarczyk 2,3,KristofferForslund 1,HelenCook 4, Davide Heller 2,3,MathiasC.Walter 5,ThomasRattei 6,DanielR.Mende 7, Shinichi Sunagawa 1,MichaelKuhn 8,LarsJuhlJensen 4,ChristianvonMering 2,3,* and Peer Bork 1,9,10,*

33 SEEDS initiated in 2003 for genome annotation hierarchical way to organize gene families > 1300 protein families

34 Pipelines for functional analysis aligner EggNOG Interpro KEGG SEED MG-RAST BLAT MEGAN CE BlastX Diamond EBI Interproscan

35 MG-RAST Metagenomics Rapid Annotation using Subsystem Technology

36 MG-RAST developed since 2007 (University of Chicago) extension of RAST ( The Project to Annotate 1000 Genomes) Rapid Annotation using Subsystems Technology subsystems : SEED framework also supports amplicons (16S, 18S, and ITS) and metatranscriptomics BMC Bioinformatics BioMed Central Software Open Access The metagenomics RAST server a public resource for the automatic phylogenetic and functional analysis of metagenomes F Meyer* 1,2, D Paarmann 2, M D'Souza 2, R Olson 1, EM Glass 1, M Kubal 2, T Paczian 1, A Rodriguez 2, R Stevens 1,2, A Wilke 2, J Wilkening 1 and RA Edwards 1,3 Address: 1 Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S Cass Avenue, Argonne, IL 60439, USA, 2 3

37 For shotgun metagenome and shotgun metatranscriptome datasets we perform a dereplication step. Figure 3.1: Details of the analysis pipeline for MG-RAST version 3 cleaning of the sequencing reads 3.1 Data hygiene rrna detection : SortmeRNA/ Silva + RDP classifier Preprocessing Protein coding gene calling : FragGeneScan (prokaryotes) After upload, data preprocessed by using SolexaQA [8] to trim low-quality regions from FASTQ data. Comparison Platform-specific toapproaches GenBank, are usedseed, for 454 data IMG, submitted UniProt, in FASTA format: KEGG, readsand more than eggnogs than two standard withdeviations BLATaway from the mean read length are discarded following [14]. All sequences submitted to the system are available, but discarded reads will not be analyzed further. Dereplication

38 Usage easy accessibility via a web-based interface account identification storage on the MG-rast server sequencing data + metadata status : private, public,... priority queue depending on the status, may be very slow 315,470 metagenomes containing 1,147 billion sequences and Tbp processed for 24,415 registered users.

39

40

41 MEGAN

42 MEGAN developed since 2007 (U. Tübingen) last release : MEGAN CE, 2017

43 MEGAN Community Edition - Interactive Exploration and Analysis of Microbiome Sequencing Data Fig 1. The compressed fastq files in a metagenome or metatranscriptome sequencing project are (1) compared against a protein reference database such as NCBI-nr using DIAMOND. (2) Taxonomic and functional analysis is then performed on the diamond files using Meganizer. (3) The resulting meganized diamond files remain on the server and are accessed via the MeganServer software. (4) Researchers work interactively with the data using MEGAN CE. Initial database : NCBI nr + NCBI taxonomy doi: /journal.pcbi g001 Design and Implementation Alignment of the reads on the database : Diamond This paper introduces MEGAN Community Edition (CE), which is a major update of our MEGAN software [11]. This release contains a large number of new features and has been substantially rewritten so as to support the analysis of many samples (hundreds) and many reads Exploration of the (billions). data This release (MEGAN includes a number of CE) command: line tools, in particular blast2rma, daa2rma and Meganizer, which can all be used to prepare input files for MEGAN CE. Taxonomic binning In addition, we : recommend LCA, the lowest use of DIAMOND common [10] for ultra-fast ancestor alignment of reads against NCBI-nr and MeganServer to allow web access to MEGAN files. Functional analysis : mapping to KEGG, SEED, EggNOG and RMA files InterPro2GO MEGAN CE analysisrequires that sequencing reads are first aligned against a suitable reference database, such as NCBI-nr in the case of a protein-based analysis, or Genbank for a DNA-based analysis, or the Silva database [24], say, when aligning 16S rrna reads. MEGAN CE can import readsand alignments in a number of different file formats and computes a compressed and indexed binary file in so-called RMA format that contains all reads, alignments, taxonomic and functional classifications. The file isindexed to allow quick access to reads and alignments bytax-

44 Lowest common ancestor with MEGAN (2007) origin of the LCA

45 Usage diamond : local installation MEGAN CE (Community Edition) : local installation JAVA graphical interface multiple viewers : KEGG Viewer (pathways), EggNog viewer, comparative analysis, rarefaction,... MEGAN Ultimate Edition

46 EBI Metagenomics

47 EBI Metagenomics first public release in 2013 close integration with the ENA (European Nucleotide Archive)

48 EBI Metagenomics pipeline cleaning and trimming of the short reads taxonomic analysis : Qiime, and then Mapseq on 16S reads functional analysis : InterPro + InterProScan + InterPro2GO

49 Data submission : ENA Webin data are stably archived accession numbers (prerequisite for many publications) active submission helpdesk training materials A major aim in the development of this resource has been to encourage metagenomics researchers to openly share their data as widely as possible, and to also describe their data in sufficient detail such that other scientists are able to extract maximum value from it.

50 8 Nucleic Acids Research, 2017 Figure 6. Growth of metagenomics data housed in ENA and processed by EBI Metagenomics (EMG). This graph shows the cumulative growth of environmental data in the two resources (ENA: solid lines, EMG: dashed lines) according to two different metrics: numbers of samples (blue) and number of bases (orange). to export the documents in numerous formats and to support concurrent versions for different pipeline/website releases. Furthermore, as this documentation is available in GitHub, it allows team members, external collaborators and users to contribute to the documentation serving contextual metadata and results have been developed to offer powerful entry points for browsing, searching and discovery of data from both a manual and programmatic perspective. Amorefundamentalchangeistheshifttowardsprovision of assembly of both user-submitted and publicly avail-

51 Usage webserver : upload of the data, analyses on the cloud programmatic access : REST API krona visualisation maybe very slow (several days)

52 Conclusion fast evolving field influence of the nature of the data sequencing technology and quality of the data complexity of the community coverage balance between performances and usability

53 Which tool is the best? With which parameters? 2016, MEGAN (older version), MG-RAST, One Codex 2017, 11 tools (including CLARK, Kraken, LMAT, Metaphlan2, PhyloSift, MGAN+Diamond) 2016, 14 tools (including CLARK, MetaPhan2, One codex, EBI, MG-Rast, kraken, LMAT, Megan)

54 Shotgun sequencing versus amplicon sequencing Comparing 16S rrna Marker Gene and Shotgun Metagenomics Datasets in the American Gut Project Using State of the Art Tools, E.R. Hyde, J. Sanders, A. Tripathi, Q. Zhu, R. Knight, 2017 There is some consistency between the 16S and shotgun metagenomics approaches although some obvious differences are noted. Large-scale differences in microbial biodiversity discovery between 16S amplicon and shotgun sequencing, Michael Tessler et al., Scientific Reports 2017 Overall the amplicon data were more robust across both biodiversity and community ecology analyses at different taxonomic scales.

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Metagenome Analysis With MG- RAST

Metagenome Analysis With MG- RAST Metagenome Analysis With MG- RAST Folker Meyer, PhD Argonne National Laboratory and University of Chicago http://metagenomics.anl.gov Palm Springs, March 2013 Acknowledgements Team: Dion Antonopoulos Daniela

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Functional analysis using EBI Metagenomics

Functional analysis using EBI Metagenomics Functional analysis using EBI Metagenomics Contents Tutorial information... 2 Tutorial learning objectives... 2 An introduction to functional analysis using EMG... 3 What are protein signatures?... 3 Assigning

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine

Introduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine Introduction to Microbial Community Analysis Tommi Vatanen CS-E5890 - Statistical Genetics and Personalised Medicine Structure of the lecture Motivation: human microbiome Terminology Data types, analysis

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

Genome sequence of Acinetobacter baumannii MDR-TJ

Genome sequence of Acinetobacter baumannii MDR-TJ JB Accepts, published online ahead of print on 11 March 2011 J. Bacteriol. doi:10.1128/jb.00226-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture

Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture Contents Introduction Abiotic Tolerance Approaches Reasons for failure Roots, microorganisms and soil-interaction

More information

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia

Development of NGS metabarcoding. characterization of aerobiological samples. Lucia Muggia Development of NGS metabarcoding for the characterization of aerobiological samples Lucia Muggia Alberto Pallavicini, Elisa Banchi, Claudio G. Ametrano, David Stankovic, Silvia Ongaro, Enrico Tordoni,

More information

ABSTRACT METHODS FOR MICROBIAL GENOMICS. Professor Steven L. Salzberg Department of Computer Science

ABSTRACT METHODS FOR MICROBIAL GENOMICS. Professor Steven L. Salzberg Department of Computer Science ABSTRACT Title of Dissertation: COMPARATIVE AND COMPUTATIONAL METHODS FOR MICROBIAL GENOMICS Derrick Edward Wood, Doctor of Philosophy, 2014 Directed by: Professor Steven L. Salzberg Department of Computer

More information

Introduction to EMBL-EBI.

Introduction to EMBL-EBI. Introduction to EMBL-EBI www.ebi.ac.uk What is EMBL-EBI? Part of EMBL Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands,

More information

Korilog. high-performance sequence similarity search tool & integration with KNIME platform. Patrick Durand, PhD, CEO. BIOINFORMATICS Solutions

Korilog. high-performance sequence similarity search tool & integration with KNIME platform. Patrick Durand, PhD, CEO. BIOINFORMATICS Solutions KLAST high-performance sequence similarity search tool & integration with KNIME platform Patrick Durand, PhD, CEO Sequence analysis big challenge DNA sequence... Context 1. Modern sequencers produce huge

More information

Alignment-free d2 oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences

Alignment-free d2 oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences Published online 28 November 2016 Nucleic Acids Research, 2017, Vol. 45, No. 1 39 53 doi: 10.1093/nar/gkw1002 Alignment-free d2 oligonucleotide frequency dissimilarity measure improves prediction of hosts

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

An introduction into 16S rrna gene sequencing analysis. Stefan Boers

An introduction into 16S rrna gene sequencing analysis. Stefan Boers An introduction into 16S rrna gene sequencing analysis Stefan Boers Microbiome, microbiota or metagenomics? Microbiome The entire habitat, including the microorganisms, their genomes (i.e., genes) and

More information

Genomics and High Performance Computing. Folker Meyer Argonne National Laboratory and University of Chicago

Genomics and High Performance Computing. Folker Meyer Argonne National Laboratory and University of Chicago Genomics and High Performance Computing Folker Meyer and University of Chicago Brief intro: I am a computer scientist turned computational biologist My CS friends tell me I am a biologist My BIO friends

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

ONLINE BIOINFORMATICS RESOURCES

ONLINE BIOINFORMATICS RESOURCES Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower

More information

Bioinformatic methods for the analysis and comparison of metagenomes and metatranscriptomes

Bioinformatic methods for the analysis and comparison of metagenomes and metatranscriptomes Bioinformatic methods for the analysis and comparison of metagenomes and metatranscriptomes Ph.D. Thesis submitted to the Faculty of Technology, Bielefeld University, Germany for the degree of Dr. rer.

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017

Enabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Enabling reproducible data analysis for metagenomics eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Outline 16S rrna analysis Current CBIO 16S rrna analysis setup H3ABioNet hackathon

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia

More information

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as

More information

Metagenomic Analysis in Human- Associated Projects

Metagenomic Analysis in Human- Associated Projects Metagenomic Analysis in Human- Associated Projects Wikimedia Commons Wikimedia Commons Daniel H. Huson Singapore Center for Environmental Life Science Engineering (SCELSE) ZBIT Center for Bioinformatics

More information

INTRODUCTION A clear cultivation bias exists in microbial phylogenetics. As of 2010, half of all

INTRODUCTION A clear cultivation bias exists in microbial phylogenetics. As of 2010, half of all Rachel L. Harris 1 Insights into the phylogeny and coding potential of microbial dark matter: a replication of phylogenetic anchoring methods described by Rinke et al., 2013 Rachel Harris, David Zhao,

More information

arxiv: v1 [q-bio.gn] 25 Nov 2015

arxiv: v1 [q-bio.gn] 25 Nov 2015 MetaScope - Fast and accurate identification of microbes in metagenomic sequencing data Benjamin Buchfink 1, Daniel H. Huson 1,2 & Chao Xie 2,3 arxiv:1511.08753v1 [q-bio.gn] 25 Nov 2015 1 Department of

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

I nternet Resources for Bioinformatics Data and Tools

I nternet Resources for Bioinformatics Data and Tools ~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.

More information

Glossary of Commonly used Annotation Terms

Glossary of Commonly used Annotation Terms Glossary of Commonly used Annotation Terms Akela a general use server for the annotation group as well as other groups throughout TIGR. Annotation Notebook a link from the gene list page that is associated

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF) Guideline for the submission of DNA sequences derived from genetically modified organisms and associated annotations within the framework of Directive 2001/18/EC and Regulation (EC) No 1829/2003 European

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Microbiome Analysis. Research Day 2012 Ranjit Kumar

Microbiome Analysis. Research Day 2012 Ranjit Kumar Microbiome Analysis Research Day 2012 Ranjit Kumar Human Microbiome Microorganisms Bad or good? Human colon contains up to 100 trillion bacteria. Human microbiome - The community of bacteria that live

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

A proposal to the Gordon and Betty Moore Foundation

A proposal to the Gordon and Betty Moore Foundation INTEGRATING EVOLUTIONARY, ECOLOGICAL AND STATISTICAL APPROACHES TO METAGENOMICS A proposal to the Gordon and Betty Moore Foundation Jonathan A. Eisen University of California, Davis U. C. Davis Genome

More information

Bioinformatics and computational tools

Bioinformatics and computational tools Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya International Livestock Research Institute Nairobi, Kenya ILRI works at the

More information

Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics. A proposal to the Gordon and Betty Moore Foundation

Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics. A proposal to the Gordon and Betty Moore Foundation Integrating Evolutionary, Ecological and Statistical Approaches to Metagenomics A proposal to the Gordon and Betty Moore Foundation Jonathan A. Eisen University of California, Davis U. C. Davis Genome

More information

MicroSEQ Rapid Microbial Identification System

MicroSEQ Rapid Microbial Identification System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based

More information

Phylogenetic methods for taxonomic profiling

Phylogenetic methods for taxonomic profiling Phylogenetic methods for taxonomic profiling Siavash Mirarab University of California at San Diego (UCSD) Joint work with Tandy Warnow, Nam-Phuong Nguyen, Mike Nute, Mihai Pop, and Bo Liu Phylogeny reconstruction

More information

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS

More information

Access to Information from Molecular Biology and Genome Research

Access to Information from Molecular Biology and Genome Research Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is

More information

Small Genome Annotation and Data Management at TIGR

Small Genome Annotation and Data Management at TIGR Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title SNP-VISTA: An Interactive SNPs Visualization Tool Permalink https://escholarship.org/uc/item/74z965bg Authors Shah, Nameeta

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

Introduction: Methods:

Introduction: Methods: Eason 1 Introduction: Next Generation Sequencing (NGS) is a term that applies to many new sequencing technologies. The drastic increase in speed and cost of these novel methods are changing the world of

More information

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing Protein Cell 2012, 3(2): 148 152 DOI 10.1007/s13238-012-2015-8 RESEARCH ARTICLE CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing Guoguang Zhao 1,4*, Dechao Bu 1,4*,

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases

More information

Assigning Sequences to Taxa CMSC828G

Assigning Sequences to Taxa CMSC828G Assigning Sequences to Taxa CMSC828G Outline Objective (1 slide) MEGAN (17 slides) SAP (33 slides) Conclusion (1 slide) Objective Given an unknown, environmental DNA sequence: Make a taxonomic assignment

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

METAGENOMICS. Aina Maria Mas Calafell Genomics

METAGENOMICS. Aina Maria Mas Calafell Genomics METAGENOMICS Aina Maria Mas Calafell Genomics Introduction Microbial communities Primary role in biogeochemical systems Study of microbial communities 1.- Culture-based methodologies Only isolated microbes

More information

ELIXIR: data for molecular biology and points of entry for marine scientists

ELIXIR: data for molecular biology and points of entry for marine scientists ELIXIR: data for molecular biology and points of entry for marine scientists Guy Cochrane, EMBL-EBI EuroMarine 2018 General Assembly meeting 17-18 January 2018 www.elixir-europe.org Scales of molecular

More information

a-dB. Code assigned:

a-dB. Code assigned: This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Entrez Gene: gene-centered information at NCBI

Entrez Gene: gene-centered information at NCBI D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National

More information

DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences

DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences PROCEEDINGS Open Access DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences Tarini Shankar Ghosh, Monzoorul Haque M, Sharmila S Mande * From Asia Pacific Bioinformatics

More information

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007 Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html

More information

L3: Short Read Alignment to a Reference Genome

L3: Short Read Alignment to a Reference Genome L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list

More information

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA

A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA A FRAMEWORK FOR ANALYSIS OF METAGENOMIC SEQUENCING DATA A. MURAT EREN Department of Computer Science, University of New Orleans, 2000 Lakeshore Drive, New Orleans, LA 70148, USA Email: aeren@uno.edu MICHAEL

More information

Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5

Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5 QIIME Analysis 1 Contents 16S rrna SEQUENCING DATA ANALYSIS TUTORIAL WITH QIIME... 5 Report Overview... 5 How to Obtain Microbiome Data... 6 How to Setup QIIME... 7 Essential files for QIIME... 7 Sequence

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

BIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007

BIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007 BIOINFORMATICS IN AQUACULTURE Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007 Research area Functional genomics of salmonids Major in diseases, stress and toxicity Experience is in -

More information

CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management

CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management Scott Sammons Technology Officer Office of Advanced Molecular Detection National Center for Emerging and Zoonotic Infectious

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press

More information

Handling temperature bursts reaching 464 C: different microbial strategies. in the Sisters Peak hydrothermal chimney

Handling temperature bursts reaching 464 C: different microbial strategies. in the Sisters Peak hydrothermal chimney AEM Accepts, published online ahead of print on 16 May 2014 Appl. Environ. Microbiol. doi:10.1128/aem.01460-14 Copyright 2014, American Society for Microbiology. All Rights Reserved. Microbial strategies

More information

Course Descriptions. BIOL: Biology. MICB: Microbiology. [1]

Course Descriptions. BIOL: Biology. MICB: Microbiology.  [1] Course Descriptions BIOL: Biology http://www.calendar.ubc.ca/vancouver/courses.cfm?code=biol [1] BIOL 112 (3) Biology of the Cell The principles of cellular and molecular biology using bacterial and eukaryotic

More information

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson

More information

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation Genome Annotation Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America May 27th, 2015 Outline Genome Annotation 1 Repeat Annotation 2 Repeat

More information

TreeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data

TreeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data RESEARCH ARTICLE TreeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data Bastiaan B. Wintermans 1 *, Bernd W. Brandt 2, Christina M. J. E. Vandenbroucke-Grauls 1,

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition

GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition Alexandre Lomsadze 1^, Shiyuyun Tang 2^, Karl Gemayel 3^ and Mark Borodovsky 1,2,3 ^ joint first authors 1 Wallace H. Coulter Department of

More information

BABELOMICS: Microarray Data Analysis

BABELOMICS: Microarray Data Analysis BABELOMICS: Microarray Data Analysis Madrid, 21 June 2010 Martina Marbà mmarba@cipf.es Bioinformatics and Genomics Department Centro de Investigación Príncipe Felipe (CIPF) (Valencia, Spain) DNA Microarrays

More information

MICROBIOME SOFTWARE: END OF BEGINNING.

MICROBIOME SOFTWARE: END OF BEGINNING. MICROBIOME SOFTWARE: END OF BEGINNING. DR. CHARLES ROBERTSON DIVISION OF INFECTIOUS DISEASES, UNIVERSITY OF COLORADO SCHOOL OF MEDICINE DR. DANIEL N. FRANK, DIVISION OF INFECTIOUS DISEASES, SCHOOL OF MEDICINE

More information

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around

More information

USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS

USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS Claude SCARPELLI Claude.Scarpelli@cea.fr FUNDAMENTAL RESEARCH DIVISION GENOMIC INSTITUTE Intel DDN Life Science Field Day Heidelberg,

More information

Application Note. NGS Analysis of B-Cell Receptors & Antibodies by AptaAnalyzer -BCR

Application Note. NGS Analysis of B-Cell Receptors & Antibodies by AptaAnalyzer -BCR Reduce to the Best Application Note NGS Analysis of B-Cell Receptors & Antibodies by AptaAnalyzer -BCR The software AptaAnalyzer harnesses next generation sequencing (NGS) data to monitor the immune response

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Next Generation Sequencing for Metagenomics

Next Generation Sequencing for Metagenomics Next Generation Sequencing for Metagenomics Genève, 13.10.2016 Patrick Wincker, Genoscope-CEA Human and model organisms sequencing were initially based on the Sanger method Sanger shotgun sequencing was

More information

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing

More information

Training materials.

Training materials. Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

From AP investigative Laboratory Manual 1

From AP investigative Laboratory Manual 1 Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND

More information

ISOLATION AND MOLECULAR CHARACTERIZATION OF THERMOTOLERANT BACTERIA FROM MANIKARAN THERMAL SPRING OF HIMACHAL PRADESH, INDIA ABSTRACT

ISOLATION AND MOLECULAR CHARACTERIZATION OF THERMOTOLERANT BACTERIA FROM MANIKARAN THERMAL SPRING OF HIMACHAL PRADESH, INDIA ABSTRACT ISOLATION AND MOLECULAR CHARACTERIZATION OF THERMOTOLERANT BACTERIA FROM MANIKARAN THERMAL SPRING OF HIMACHAL PRADESH, INDIA Ambika Verma 1*, & Poonam Shirkot 2 1*Ph.D Scholar, Department of Biotechnology,

More information

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Technical Overview Introduction RNA Sequencing (RNA-Seq) is one of the most commonly used next-generation sequencing (NGS)

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information