Bioinformatics and computational tools

Similar documents
Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

LARGE DATA AND BIOMEDICAL COMPUTATIONAL PIPELINES FOR COMPLEX DISEASES

Using New ThiNGS on Small Things. Shane Byrne

FUTURE PROSPECTS IN MOLECULAR INFECTIOUS DISEASES DIAGNOSIS

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Next Generation Sequencing. Tobias Österlund

High peformance computing infrastructure for bioinformatics

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Introduction to Bioinformatics

From Lab Bench to Supercomputer: Advanced Life Sciences Computing. John Fonner, PhD Life Sciences Computing

Day 3. Examine gels from PCR. Learn about more molecular methods in microbial ecology

Contact us for more information and a quotation

What is Bioinformatics?

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Capabilities & Services

Genome Sequence Assembly

CBC Data Therapy. Metagenomics Discussion

TREE CODE PRODUCT BROCHURE

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Human genome sequence

GMI: Global Microbial Identifier

Introduction to BIOINFORMATICS

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

Understanding the science and technology of whole genome sequencing

SPO GRANTS (2001) PHASES

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

MicroSEQ TM ID Rapid Microbial Identification System:

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Computational Challenges of Medical Genomics

Engineering Genetic Circuits

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Genomics Market Share, Size, Analysis, Growth, Trends and Forecasts to 2024 Hexa Research

Introduction to 'Omics and Bioinformatics

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Applications of Next Generation Sequencing in Metagenomics Studies

Grand Challenges in Computational Biology

Overview of Health Informatics. ITI BMI-Dept

Metagenomics of the Human Intestinal Tract

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

WELCOME. Norma J. Nowak, PhD Executive Director, NY State Center of Excellence in Bioinformatics and Life Sciences (CBLS)

China National Grid --- BioNode. Jun Wang Beijing Genomics Institute

DNA Sequencing and Assembly

Whole Genome Sequencing for food safety FSA Chief Scientific Advisor Report and 2013 Listeria pilot study

Overview of Next Generation Sequencing technologies. Céline Keime

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

Achievement Level Descriptors for Medical Interventions

Third Generation Sequencing

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Climate change impacts on animal health and vector borne diseases

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

Bridging the Science - Information Technology Gap

Fun GCAT (Functional Genomic and Computational Assessment of Threats)

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Next Generation Bioinformatics on the Cloud

Matthew Tinning Australian Genome Research Facility. July 2012

Outline and learning objectives. From Proteomics to Systems Biology. Integration of omics - information

The Computational Impact of Genomics on Biotechnology R&D (sort of )

ILLUMINA SEQUENCING SYSTEMS

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

High-Performance Computing (HPC) Up-close

Bioinformatics: A perspective

21.5 The "Omics" Revolution Has Created a New Era of Biological Research

Short summary of the main features

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Next Generation Sequencing (NGS)

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Could modern methods of genetics improve the disease resistance in farm animals?

Molecular methods to characterize the microbiota in the mouse tissues

Bioinformatics: A perspective

Course Agenda. Day One

AUDREY FARBOS JEREMIE POSCHMANN PAUL O NEILL KONRAD PASZKIEWICZ KAREN MOORE

Questionnaire on the use of High Throughput Sequencing, Bioinformatics and Computational Genomics (HTS-BCG) in the OIE Reference Centre network

Towards a P4 Healthcare System: Predictive, Preventive, Personalized & Participatory

Bayer Pharma s High Tech Platform integrates technology experts worldwide establishing one of the leading drug discovery research platforms

DNA amplification and analysis: minipcr TM Food Safety Lab

Bioinformatics. Dick de Ridder. Tuinbouw Digitaal, 12/11/15

Nagahama Institute of Bio-Science and Technology. National Institute of Genetics and SOKENDAI Nagahama Institute of Bio-Science and Technology

Growing Needs for Practical Molecular Diagnostics: Indonesia s Preparedness for Current Trend

Introduction to iplant Collaborative Jinyu Yang Bioinformatics and Mathematical Biosciences Lab

Genetics and Bioinformatics

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

NGS part 2: applications. Tobias Österlund

NOW GENERATION SEQUENCING. Monday, December 5, 11

fastest next-gen workflow 10X more throughput fastest-selling sequencer all in six months Ion Personal Genome Machine Sequencer

Bioinformatics for Microbial Biology

Advances in genomic technologies and understanding infection

Human Genomics. Higher Human Biology

Next Generation Sequencing (NGS) Market Size, Growth and Trends ( )

Biology 644: Bioinformatics

European Genome phenome Archive at the European Bioinformatics Institute. Helen Parkinson Head of Molecular Archives

Next-generation sequencing Technology Overview

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Next Generation Sequencing Applications in Food Safety and Quality

HPC in Bioinformatics and Genomics. Daniel Kahn, Clément Rezvoy and Frédéric Vivien Lyon 1 University & INRIA HELIX team LIP-ENS & INRIA GRAAL team

Motivation From Protein to Gene

Transcription:

Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya

International Livestock Research Institute Nairobi, Kenya ILRI works at the crossroads of livestock and poverty, bringing high quality science and capacity building to bear on poverty reduction and sustainable development. It is one of 15 centers supported by the Consultative Group on International Agricultural Research (CGIAR) that conduct food and environmental research to help alleviate poverty and increase food security. ILRI biotech facilities: Molecular biology Laboratories (>6,000 sqm) State of the art biosciences equipment 2 ABI sequencers (3130, 3730) 1 Roche 454 GS FLX Bioinformatics unit 64 CPU high performance compute cluster BSL3 laboratory Flow cytometry and microscopy Diagnostics (nucleotide and protein based) Vaccine technology/immunology Small and Large Animal units http://www.ilri.org http://hub.africabiosciences.org/

Central dogma of molecular biology

Bioinformatics Bioinformatics is the application of information technology and computer science to the field of molecular biology.

Bioinformatics Genome (DNA) sequence ACGGTGCGTAACGTCAGTCAGGTCAGTCAG Bioinformatics or computational biology Gene or protein properties Comparative analysis Protein structure and function prediction

The Sequencing Revolution Next Generation Sequencing High Throughput Sequencing 2000 High Throughput Sequencing 2010 96 sequences per hour 2.6 million sequences per hour

The Sequencing Revolution Third Generation Sequencing Single Molecule sequencing Pacific Biosciences Oxford Nanopore ~3,000 wells per chip 1,500 bp per well 10 bp per second $1,000 human genome

Sequencing the Human Genome 2001: Human Genome Project 3 billion $, 11 years 10 8 2007: 454 1M$, 3 months Log 10 (price) 6 4 2001: Celera 100 Million $ 3 years 2008: ABI SOLiD 60K$, 2 weeks 2009: Illumina 40-50K$ 2010: 5K$, a few days 2 2000 2005 2010 Year

Next Generation Sequencing Current Projects 1000 Genomes project (www.1000genomes.org) Sequence genomes from 2500 people from divers backgrounds to 4x coverage to identify human genetic variation. Ensembl genomes (www.ensemblgenomes.org) 234 species sequenced from mammalians, birds to parasites. >400 bacterial species sequenced. Plant genomes 18 sequenced (www.phytozome.org/) BGI (China) (www.genomics.cn) 1,000 plant and animal reference genome project.

Cost of Computing 140 2010 Intel icore7 desktop $1,000 GigaFLOPS 10 3 1988 Cray YMP $40,000,000 1998 Sun HPC1000 $1,000,000 1

World Internet Connections

Cloud Computing Cloud computing is a general term for computation as aservice. Computation as a service means that customers rent the hardware and the storage only for the time needed to achieve their goals Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity in the cloud including, High Performance computing (HPC) on demand 23 GB of memory 64 Compute nodes 1.7 Terabytes storage $1.60 per hour or $5,000 per year

Distributed computing Distributed computing is any computing that involves multiple computers remote from each other. A central server sends and receives the work units (essentially just protein structures and sequences). The client uses the spare CPU cycles on a user s computer to run the simulation algorithm on the assigned structure. Results are automatically returned and exchanged for a new work unit on a daily basis. home lab/office anywhere

Distributed computing Folding @home Understand how existing proteins attain their specific, functional three dimensional structures. Use distributed computing through installation of screensaver on user computer. In 2009 was running on 40,000 CPUs or 5 PFLOPS Fastest standalone supercomputer is "Tianhe 1A at 2.5 PFLOPS

Metagenomics Metagenomics is the sequencing and analysis of DNA of organisms recovered from an environment, without the need for culturing them using next generation sequencing technologies. Organisms The Sargasso Sea community survey Acid mine drainage film Human gut communities Symbiotic community from marine worm AVID project

From Sequence (genomics/metagenomics) to impact phylogenetic analysis Diagnostics (meta)genome sequencing geographical mapping Global diseases surveillance Databases protein modeling Vaccine dvlpmt Compilation of complete genomes, metagenomes, annotation and curation of metadata Extraction of important biological information sequence variation analysis Primer, microarray discovery of new microorganisms and pathways Drug dvlpmt Improved drug selection Environmental sustainability Better control tools

AVID Arbovirus Incident & Diversity project Google.org Predict and Prevent funded project. Pilot project on Rift Valley Fever virus. virus is transmitted by mosquitoes and infect both animals and humans deadly to both humans and livestock outbreaks occur every 5 6 years A complex mix of species, sub species, populations. Can we understand its dynamics?

AVID Questions Where is the virus (between outbreaks )? Environment Vectors Reservoirs What is the diversity of? Virus Vector Reservoir And how do these interact? Distribution of other pathogens? Novel pathogens and variants? For example: Does a particular virus variant occur in a particular vector variant associated with a particular mammalian variant? Viral Geneflow

AVID Strategy Samples are collected in specific areas: Human blood, livestock, wildlife, mosquitoes, water, soil Each sample collected with a full meta data description (location, date/time, eco geo socio descriptors). Amplify sequences from multiple points on multiple possible genomes virus, insect, mammal, others. Sequence these amplicons simultaneously from 1,000s of samples using next generation sequencing. Analyse sequences look for distribution and co occurrence. Refine primers for a simple (RT) PCR approach. Move diagnostic sequences on to high throughput PCR diagnostics.

AVID Data management and BioBANK Data management is one of the biggest challenges. The project cannot achieve its goals without great data integration. All samples are biobanked with full data descriptors Opportunity to share samples across projects? Wildlife samples are very expensive and everyone is collecting them for their own purposes!!

Thank You