HPC in Bioinformatics and Genomics. Daniel Kahn, Clément Rezvoy and Frédéric Vivien Lyon 1 University & INRIA HELIX team LIP-ENS & INRIA GRAAL team

Size: px
Start display at page:

Download "HPC in Bioinformatics and Genomics. Daniel Kahn, Clément Rezvoy and Frédéric Vivien Lyon 1 University & INRIA HELIX team LIP-ENS & INRIA GRAAL team"

Transcription

1 HPC in Bioinformatics and Genomics Daniel Kahn, Clément Rezvoy and Frédéric Vivien Lyon 1 University & INRIA HELIX team LIP-ENS & INRIA GRAAL team

2 Moore s law in genomics Exponential increase Doubling time ~20 months

3 New high-throughput technologies Pyrosequencing (Roche 454 GS FLX) Mb per run (1 day) Long reads (up to 400 bp) ~15 Gb raw data Illumina Genome Analyzer 1,500 Mb per run (3 days) Short reads (35 bp) ~1 Tb raw data Applied Biosystems SOLID sequencer 3,000 Mb per run (5 days) Short reads (35 bp) ~15 Tb raw data

4

5 Uses of high throughput sequencing Population genomics For instance, 1000 human genome project Individual sequencing Metagenomics Comprehensive appraisal of microbial communities and gene repertoires in various environments Phylogenomics Resolving the history of genes and species. As many computing challenges

6 Large scale protein sequence analysis All vs. all The challenge of protein modularity Most proteins are combinatorial arrangements of conserved modules (domains) GerE LuxR FixJ OmpR SpoOA NtrC NifA

7 The ProDom project Need for an automated process in order to allow for comprehensive analysis Automatically decompose proteins into domains and cluster domain families, using MKDOM2 Generate multiple alignments and trees for all families Automatically generate mutually consistent representations for all proteins

8 Resolving combinatorial proteins

9 i th iteration DB query The MKDOM2 program no internal repeat detection yes query PSI-BLAST no match matches repeat matches (i+1) th iteration DB DB changes remove newly found domains split modified sequences sort by size query

10 Drawbacks of sequential MKDOM2 Greedy algorithm Scales quadratically Data follow Moore s law no more tractable!

11 Parallelization of MKDOM2 Parallelization of the main loop Distribute sequences for independent family construction Difficulties: Heterogeneous run times for the main loop Possible dependencies between families Precalculate an all vs. all comparison in order to select independent queries Send batches of independent sequences before worker nodes are idle Verify family independence a posteriori

12

13

14

15

16

17

18 Speed-up on medium scale test set 32 Archaeal genomes 21.5 M aminoacids

19 Large-scale test set 263 genomes 950,216 protein sequences 339 M aminoacids Run on GRID 5000 (150 nodes) Half of the data set processed in only 20 hours

20 Database crunching

21 Increasing query sizes

22 Variable sizes of domain families

23 Heterogeneous run times ~1000-fold range

24 Large result queue

25 yet efficient node usage 86% processor usage

26 Full-scale protein domain analysis To be scaled-up 7-fold for full processing of UniProt today! Will require stable MPI usage of ~1000 processors over the grid Appropriate infrastructure not yet identified Other program MPI_MKDOM3 envisioned to make full use of precalculated all vs. all comparison required in order to further cope with Moore s law

27 LIP-ENS Lyon Clément REZVOY Frédéric VIVIEN INRA Toulouse Emmanuel COURCELLE Daniel KAHN Lyon 1 University INRIA HELIX project Aurélie LAUGRAUD Lauranne DUQUENNE Daniel KAHN Support - PRABI - EU (EMBRACE & IMPACT) - IN2P3 - GRID 5000

28

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing

HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing Jik-Soo Kim, Ph.D National Institute of Supercomputing and Networking(NISN) at KISTI Table of Contents

More information

Matthew Tinning Australian Genome Research Facility. July 2012

Matthew Tinning Australian Genome Research Facility. July 2012 Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909

More information

High Performance Computing Workflow for Protein Functional Annotation

High Performance Computing Workflow for Protein Functional Annotation High Performance Computing Workflow for Protein Functional Annotation Larissa Stanberry 1,2 larissa.stanberry@gmail.com Bhanu Rekepalli 2,3 brekapal@utk.edu 1 Seattle Children s Research Institute, 2 Data-Enabled

More information

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries. PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,

More information

DIET: New Developments and Recent Results

DIET: New Developments and Recent Results A. Amar 1, R. Bolze 1, A. Bouteiller 1, A. Chis 1, Y. Caniou 1, E. Caron 1, P.K. Chouhan 1, G.L. Mahec 2, H. Dail 1, B. Depardon 1, F. Desprez 1, J. S. Gay 1, A. Su 1 LIP Laboratory (UMR CNRS, ENS Lyon,

More information

Bioinformatics and computational tools

Bioinformatics and computational tools Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya International Livestock Research Institute Nairobi, Kenya ILRI works at the

More information

GenScale Scalable, Optimized and Parallel Algorithms for Genomics. Dominique LAVENIER

GenScale Scalable, Optimized and Parallel Algorithms for Genomics. Dominique LAVENIER GenScale Scalable, Optimized and Parallel Algorithms for Genomics Dominique LAVENIER Context New Sequencing Technologies - NGS Exponential growth of genomic data Drastic decreasing of costs Emergence of

More information

Introduction to BLAST

Introduction to BLAST Introduction to BLAST PowerPoint by Ananth Kalyanaraman School of Electrical Engineering and Computer Science Washington State University SC08 Education Sequence Comparison for Metagenomics 1 About the

More information

Third Generation Sequencing

Third Generation Sequencing Third Generation Sequencing By Mohammad Hasan Samiee Aref Medical Genetics Laboratory of Dr. Zeinali History of DNA sequencing 1953 : Discovery of DNA structure by Watson and Crick 1973 : First sequence

More information

Data Intensive Biomedical Research: The EU RL VTEC efforts to take up the NGS challenge. EU RL for E. coli Annual Workshop 2015

Data Intensive Biomedical Research: The EU RL VTEC efforts to take up the NGS challenge. EU RL for E. coli Annual Workshop 2015 Data Intensive Biomedical Research: The EU RL VTEC efforts to take up the NGS challenge EU RL for E. coli Annual Workshop 2015 NGS adoption: Worldwide Source: Omicsmap.com November, 2015 Data Production

More information

Computing for Metagenome Analysis

Computing for Metagenome Analysis New Horizons of Computational Science with Heterogeneous Many-Core Processors Computing for Metagenome Analysis National Institute of Genetics Hiroshi Mori & Ken Kurokawa Contents Metagenome Sequence similarity

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

Metagenomics of the Human Intestinal Tract

Metagenomics of the Human Intestinal Tract Metagenomics of the Human Intestinal Tract http://www.metahit.eu This presentation is licensed under the Creative Commons Attribution 3.0 Unported License available at http://creativecommons.org/licenses/by/3.0/

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Sequence Based Function Annotation

Sequence Based Function Annotation Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological

More information

Overview of Next Generation Sequencing technologies. Céline Keime

Overview of Next Generation Sequencing technologies. Céline Keime Overview of Next Generation Sequencing technologies Céline Keime keime@igbmc.fr Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not

More information

Overview of Scientific Workflows: Why Use Them?

Overview of Scientific Workflows: Why Use Them? Overview of Scientific Workflows: Why Use Them? Blue Waters Webinar Series March 8, 2017 Scott Callaghan Southern California Earthquake Center University of Southern California scottcal@usc.edu 1 Overview

More information

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics Metaheuristics Meta Greek word for upper level methods Heuristics Greek word heuriskein art of discovering new strategies to solve problems. Exact and Approximate methods Exact Math programming LP, IP,

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Next generation sequencing techniques Toma Tebaldi Centre for Integrative Biology University of Trento Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento Mattarello September 28, 2009 Sequencing Fundamental task in modern biology read the information

More information

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs. Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied

More information

NOW GENERATION SEQUENCING. Monday, December 5, 11

NOW GENERATION SEQUENCING. Monday, December 5, 11 NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000

More information

Designing Filters for Fast Protein and RNA Annotation. Yanni Sun Dept. of Computer Science and Engineering Advisor: Jeremy Buhler

Designing Filters for Fast Protein and RNA Annotation. Yanni Sun Dept. of Computer Science and Engineering Advisor: Jeremy Buhler Designing Filters for Fast Protein and RNA Annotation Yanni Sun Dept. of Computer Science and Engineering Advisor: Jeremy Buhler 1 Outline Background on sequence annotation Protein annotation acceleration

More information

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index Page 1 of 6 Document Viewer TurnitinUK Originality Report Processed on: 05-Dec-20 10:49 AM GMT ID: 13 Word Count: 1587 Submitted: 1 CSC8313-201 - Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx

More information

ABSTRACT COMPUTER EVOLUTION OF GENE CIRCUITS FOR CELL- EMBEDDED COMPUTATION, BIOTECHNOLOGY AND AS A MODEL FOR EVOLUTIONARY COMPUTATION

ABSTRACT COMPUTER EVOLUTION OF GENE CIRCUITS FOR CELL- EMBEDDED COMPUTATION, BIOTECHNOLOGY AND AS A MODEL FOR EVOLUTIONARY COMPUTATION ABSTRACT COMPUTER EVOLUTION OF GENE CIRCUITS FOR CELL- EMBEDDED COMPUTATION, BIOTECHNOLOGY AND AS A MODEL FOR EVOLUTIONARY COMPUTATION by Tommaso F. Bersano-Begey Chair: John H. Holland This dissertation

More information

Quality Control of Next Generation Sequence Data

Quality Control of Next Generation Sequence Data Quality Control of Next Generation Sequence Data January 17, 2018 Kane Tse, Assistant Bioinformatics Coordinator Canada s Michael Smith Genome Sciences Centre BC Cancer Agency Canada s Michael Smith Genome

More information

Next Generation Sequencing Applications in Food Safety and Quality

Next Generation Sequencing Applications in Food Safety and Quality Next Generation Sequencing Applications in Food Safety and Quality Our science National and international centre of excellence for interdisciplinary investigation and problem solving across plant and bee

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

ECS 234: Genomic Data Integration ECS 234

ECS 234: Genomic Data Integration ECS 234 : Genomic Data Integration Heterogeneous Data Integration DNA Sequence Microarray Proteomics >gi 12004594 gb AF217406.1 Saccharomyces cerevisiae uridine nucleosidase (URH1) gene, complete cds ATGGAATCTGCTGATTTTTTTACCTCACGAAACTTATTAAAACAGATAATTTCCCTCATCTGCAAGGTTG

More information

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics

More information

Exercise I, Sequence Analysis

Exercise I, Sequence Analysis Exercise I, Sequence Analysis atgcacttgagcagggaagaaatccacaaggactcaccagtctcctggtctgcagagaagacagaatcaacatgagcacagcaggaaaa gtaatcaaatgcaaagcagctgtgctatgggagttaaagaaacccttttccattgaggaggtggaggttgcacctcctaaggcccatgaagt

More information

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador

I AM NOT A METAGENOMIC EXPERT. I am merely the MESSENGER. Blaise T.F. Alako, PhD EBI Ambassador I AM NOT A METAGENOMIC EXPERT I am merely the MESSENGER Blaise T.F. Alako, PhD EBI Ambassador blaise@ebi.ac.uk Hubert Denise Alex Mitchell Peter Sterk Sarah Hunter http://www.ebi.ac.uk/metagenomics Blaise

More information

Plant genome annotation using bioinformatics

Plant genome annotation using bioinformatics Plant genome annotation using bioinformatics ghorbani mandolakani Hossein, khodarahmi manouchehr darvish farrokh, taeb mohammad ghorbani24sma@yahoo.com islamic azad university of science and research branch

More information

Grand Challenges in Computational Biology

Grand Challenges in Computational Biology Grand Challenges in Computational Biology Kimmen Sjölander UC Berkeley Reconstructing the Tree of Life CITRIS-INRIA workshop 24 May, 2011 Prediction of biological pathways and networks Human microbiome

More information

High peformance computing infrastructure for bioinformatics

High peformance computing infrastructure for bioinformatics High peformance computing infrastructure for bioinformatics Scott Hazelhurst University of the Witwatersrand December 2009 What we need Skills, time What we need Skills, time Fast network Lots of storage

More information

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Functional profiling of metagenomic short reads: How complex are complex microbial communities? Functional profiling of metagenomic short reads: How complex are complex microbial communities? Rohita Sinha Senior Scientist (Bioinformatics), Viracor-Eurofins, Lee s summit, MO Understanding reality,

More information

Era with Computational Biology/Toxicology

Era with Computational Biology/Toxicology USM Seminar 1/22/2010 Embracing the Post-Omics Era with Computational Biology/Toxicology Ping Gong Environmental Genomics and Genetics (EGG) Team @ Environmental Laboratory Outline Introduction Bioinformatics

More information

dbcamplicons pipeline Amplicons

dbcamplicons pipeline Amplicons dbcamplicons pipeline Amplicons Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Microbial community analysis Goal:

More information

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the

More information

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11) Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis

More information

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast

More information

Next Generation Sequencing (NGS)

Next Generation Sequencing (NGS) Next Generation Sequencing (NGS) Fernando Alvarez Sección Biomatemática, Facultad de Ciencias, UdelaR 1 Uruguay Montevide o 3 TANGO World Champ 1930 1950 (Maraca 4 Next Generation Sequencing module Next

More information

Human genome sequence

Human genome sequence NGS: the basics Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion

More information

de novo paired-end short reads assembly

de novo paired-end short reads assembly 1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation

More information

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in

More information

Scheduling Divisible Loads with Return Messages on Heterogeneous Master-Worker Platforms

Scheduling Divisible Loads with Return Messages on Heterogeneous Master-Worker Platforms Scheduling Divisible Loads with Return Messages on Heterogeneous Master-Worker Platforms Olivier Beaumont 1,LorisMarchal 2,andYvesRobert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr

More information

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun The Journey of DNA Sequencing H. Sunny Sun What is a genome? Genome is the total genetic complement of a living organism. The nuclear genome comprises approximately 3.2 * 10 9 nucleotides of DNA, divided

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Big Data in Agriculture Challenges. Pascal Neveu INRA Montpellier

Big Data in Agriculture Challenges. Pascal Neveu INRA Montpellier Big Data in Agriculture Challenges INRA Montpellier The rise of Big Data in agriculture More data production from heterogeneous sources 2 The rise of Big Data in agriculture More and more data services

More information

CS3211 Project 2 OthelloX

CS3211 Project 2 OthelloX CS3211 Project 2 OthelloX Contents SECTION I. TERMINOLOGY 2 SECTION II. EXPERIMENTAL METHODOLOGY 3 SECTION III. DISTRIBUTION METHOD 4 SECTION IV. GRANULARITY 6 SECTION V. JOB POOLING 8 SECTION VI. SPEEDUP

More information

Ultrasequencing: methods and applications of the new generation sequencing platforms

Ultrasequencing: methods and applications of the new generation sequencing platforms Ultrasequencing: methods and applications of the new generation sequencing platforms Nuria Tubío Santamaría Course: Genomics Universitat Autònoma de Barcelona 1 Introduction Clasical methods of sequencing:

More information

Accelerate High Throughput Analysis for Genome Sequencing with GPU

Accelerate High Throughput Analysis for Genome Sequencing with GPU Accelerate High Throughput Analysis for Genome Sequencing with GPU ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing May 7-10, 2012 Singapore BingQiang WANG, Head of Scalable

More information

Dynamic Fractional Resource Scheduling for HPC Workloads

Dynamic Fractional Resource Scheduling for HPC Workloads Dynamic Fractional Resource Scheduling for HPC Workloads Mark Stillwell 1 Frédéric Vivien 2 Henri Casanova 1 1 Department of Information and Computer Sciences University of Hawai i at Mānoa 2 INRIA, France

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Applied bioinformatics in genomics

Applied bioinformatics in genomics Applied bioinformatics in genomics Productive bioinformatics in a genome sequencing center Heiko Liesegang Warschau 2005 The omics pyramid: 1. 2. 3. 4. 5. Genome sequencing Genome annotation Transcriptomics

More information

Next-generation sequencing Technology Overview

Next-generation sequencing Technology Overview Next-generation sequencing Technology Overview UQ Winter School 2018 Christopher Noune, PhD AGRF Melbourne christopher.noune@agrf.org.au What is NGS? Ion Torrent PGM (Thermo-Fisher) MiSeq (Illumina) High-Throughput

More information

Oil reservoir simulation in HPC

Oil reservoir simulation in HPC Oil reservoir simulation in HPC Pavlos Malakonakis, Konstantinos Georgopoulos, Aggelos Ioannou, Luciano Lavagno, Ioannis Papaefstathiou and Iakovos Mavroidis PRACEdays18 This project has received funding

More information

Large Scale Enzyme Func1on Discovery: Sequence Similarity Networks for the Protein Universe

Large Scale Enzyme Func1on Discovery: Sequence Similarity Networks for the Protein Universe Large Scale Enzyme Func1on Discovery: Sequence Similarity Networks for the Protein Universe Boris Sadkhin University of Illinois, Urbana-Champaign Blue Waters Symposium May 2015 Overview The Protein Sequence

More information

An Interactive Workflow Generator to Support Bioinformatics Analysis through GPU Acceleration

An Interactive Workflow Generator to Support Bioinformatics Analysis through GPU Acceleration An Interactive Workflow Generator to Support Bioinformatics Analysis through GPU Acceleration Anuradha Welivita, Indika Perera, Dulani Meedeniya Department of Computer Science and Engineering University

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

MetaGO: Predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping

MetaGO: Predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping MetaGO: Predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping Chengxin Zhang, Wei Zheng, Peter L Freddolino, and Yang

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 What we have

More information

Chapter 7. DNA Microarrays

Chapter 7. DNA Microarrays Bioinformatics III Structural Bioinformatics and Genome Analysis Chapter 7. DNA Microarrays 7.9 Next Generation Sequencing 454 Sequencing Solexa Illumina Solid TM System Sequencing Process of determining

More information

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS

More information

Assembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster

Assembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster Assembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster Aobakwe Matshidiso Supervisor: Prof Chrissie Rey Co-Supervisor: Prof Scott Hazelhurst Next Generation Sequencing

More information

From assembled genome to annotated genome

From assembled genome to annotated genome From assembled genome to annotated genome Procaryotic genomes Eucaryotic genomes Genome annotation servers (web based) 1. RAST 2. NCBI Gene prediction pipeline: Maker Function annotation pipeline: Blast2GO

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING. From wet lab to dry lab complete sample analysis

FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING. From wet lab to dry lab complete sample analysis FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING From wet lab to dry lab complete sample analysis »Progress in science depends on new techniques, new discoveries

More information

CMS Conference Report

CMS Conference Report Available on CMS information server CMS CR 2001/006 CMS Conference Report HEPGRID2001: A Model of a Virtual Data Grid Application Koen Holtman Published in Proc. of HPCN Europe 2001, Amsterdam, p. 711-720,

More information

GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU

GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU Xiaoquan Su $, Xuetao Wang $, JianXu, Kang Ning* Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared

More information

SDSC 2013 Summer Institute Discover Big Data

SDSC 2013 Summer Institute Discover Big Data SDSC 2013 Summer Institute Discover Big Data Natasha Balac, Ph.D. Director, PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD WELCOME Logistics Check-in from 8:00am

More information

What is Bioinformatics?

What is Bioinformatics? What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is

More information

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

CBC Data Therapy. Metagenomics Discussion

CBC Data Therapy. Metagenomics Discussion CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,

More information

Genome Assembly With Next Generation Sequencers

Genome Assembly With Next Generation Sequencers Genome Assembly With Next Generation Sequencers Personal Genomics Institute 3 May, 2011 Jongsun Park Table of Contents 1 Central Dogma and Omics Studies 2 History of Sequencing Technologies 3 Genome Assembly

More information

SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates of massive sequence variants

SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates of massive sequence variants G3: Genes Genomes Genetics Early Online, published on November 19, 2015 as doi:10.1534/g3.115.021832 SNPTracker: A swift tool for comprehensive tracking and unifying dbsnp rsids and genomic coordinates

More information

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law

More information

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017 Next Generation Sequencing Jeroen Van Houdt - Leuven 13/10/2017 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degradation" F Sanger"DNA

More information

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel. DNA Sequencing T TM variation DNA amplicon mendelian trio genomics NGS bioinformatics tumor-normal custom SNP resequencing target validation de novo prediction personalized comparative genomics exome private

More information

MOL204 Exam Fall 2015

MOL204 Exam Fall 2015 MOL204 Exam Fall 2015 Exercise 1 15 pts 1. 1A. Define primary and secondary bioinformatical databases and mention two examples of primary bioinformatical databases and one example of a secondary bioinformatical

More information

Stay Tuned Computational Science NeSI. Jordi Blasco

Stay Tuned Computational Science NeSI. Jordi Blasco Computational Science Team @ NeSI Jordi Blasco (jordi.blasco@nesi.org.nz) Outline 1 About NeSI CS Team Who we are? 2 Identify the Bottlenecks Identify the Most Popular Apps Profile and Debug 3 Tuning Increase

More information

Whole Genome Sequencing for food safety FSA Chief Scientific Advisor Report and 2013 Listeria pilot study

Whole Genome Sequencing for food safety FSA Chief Scientific Advisor Report and 2013 Listeria pilot study Whole Genome Sequencing for food safety FSA Chief Scientific Advisor Report and 2013 Listeria pilot study Dr Edward Hayes Date: July 2016, Version 1 Foodborne Pathogens 280,000 cases of Campylobacter,

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Workflow Management System Simulation Workbench Accurate, scalable, and reproducible simulations.

Workflow Management System Simulation Workbench Accurate, scalable, and reproducible simulations. Workflow Management System Simulation Workbench Accurate, scalable, and reproducible simulations http://wrench-project.org Motivation Scientific Workflows are key to advances in science and engineering

More information

Cluster Workload Management

Cluster Workload Management Cluster Workload Management Goal: maximising the delivery of resources to jobs, given job requirements and local policy restrictions Three parties Users: supplying the job requirements Administrators:

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

Questionnaire on the use of High Throughput Sequencing, Bioinformatics and Computational Genomics (HTS-BCG) in the OIE Reference Centre network

Questionnaire on the use of High Throughput Sequencing, Bioinformatics and Computational Genomics (HTS-BCG) in the OIE Reference Centre network Questionnaire on the use of High Throughput Sequencing, Bioinformatics and Computational Genomics (HTS-BCG) in the OIE Reference Centre network Massimo Palmarini MRC-University of Glasgow Centre for Virus

More information

Computational Challenges of Medical Genomics

Computational Challenges of Medical Genomics Talk at the VSC User Workshop Neusiedl am See, 27 February 2012 [cbock@cemm.oeaw.ac.at] http://medical-epigenomics.org (lab) http://www.cemm.oeaw.ac.at (institute) Introducing myself to Vienna s scientific

More information

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.

More information