Introduction to Bioinformatics

Similar documents
Third Generation Sequencing

Introduction to Bioinformatics

Overview of Next Generation Sequencing technologies. Céline Keime

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Introduction to Bioinformatics

Matthew Tinning Australian Genome Research Facility. July 2012

ELE4120 Bioinformatics. Tutorial 5

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Molecular Biology. IMBB 2017 RAB, Kigali - Rwanda May 02 13, Francesca Stomeo

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Introduction to BIOINFORMATICS

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

Introduction to Bioinformatics

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Advanced Technology in Phytoplasma Research

Pioneering Clinical Omics

Gene-centered resources at NCBI

Applications of Next Generation Sequencing in Metagenomics Studies

EURL WORKING GROUP ON WHOLE GENOME SEQUENCING AND PULSENET INTERNATIONAL

DNA Sequencing. Happiness Kumburu BSU- workshop Nov, 2016

What is Bioinformatics?

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Contact us for more information and a quotation

Chapter 2: Access to Information

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

Introduction to BIOINFORMATICS

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Using New ThiNGS on Small Things. Shane Byrne

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Genomic region (ENCODE) Gene definitions

Biology 644: Bioinformatics

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

CBC Data Therapy. Metatranscriptomics Discussion

Next generation sequencing in diagnostic laboratories: opportunities and challenges

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Sequencing techniques

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Introduct op Biosciences

Molecular methods to characterize the microbiota in the mouse tissues

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Computational Biology and Bioinformatics

Genetics and Bioinformatics

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Research school methods seminar Genomics and Transcriptomics

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Fundamentals of Bioinformatics: computation, biology, computational biology

Introduction to Bioinformatics

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Bioinformatics. Dick de Ridder. Tuinbouw Digitaal, 12/11/15

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Genetics Lecture 21 Recombinant DNA

Use of Drosophila Melanogaster as a Model System in the Study of Human Sodium- Dependent Multivitamin Transporter. Michael Brinton BIOL 230W.

Overview of sequencing Technologies

MATH 5610, Computational Biology

Biology 252 Nucleic Acid Methods

Integrated M.Tech. in Biotechnology (B.Tech + M.Tech) programme. Semester 1

Sequencing techniques and applications

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

FUNCTIONAL BIOINFORMATICS

Next-generation sequencing Technology Overview

Textbook Reading Guidelines

Overview of Health Informatics. ITI BMI-Dept

Computational methods in bioinformatics: Lecture 1

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D.

Elixir: European Bioinformatics Research Infrastructure. Rolf Apweiler

'Bioinformatics in academia as related to ehealth' - including the "Genomic Virtual Lab"

Protein Bioinformatics Part I: Access to information

Molecular Basis of Inheritance

About Strand NGS. Strand Genomics, Inc All rights reserved.

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

The University of California, Santa Cruz (UCSC) Genome Browser

Whole Genome Sequencing for TB diagnostics. Adam Witney. Institute for Infection and Immunity St George s, University of London

Sequence Based Function Annotation

NCBI web resources I: databases and Entrez

Understanding the science and technology of whole genome sequencing

Engineering Genetic Circuits

Genome 373: Genomic Informatics. Elhanan Borenstein

VALLIAMMAI ENGINEERING COLLEGE

Klinisk kemisk diagnostik BIOINFORMATICS

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Next Generation Sequencing. Tobias Österlund

ECS 234: Introduction to Computational Functional Genomics ECS 234

Gene Identification in silico

Single Cell Genomics

DESIGNER GENES - BIOTECHNOLOGY

Class XII Chapter 6 Molecular Basis of Inheritance Biology

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Bioinformatics Programming and Analysis CSC Dr. Garrett Dancik

GREG GIBSON SPENCER V. MUSE

Targeted Sequencing in the NBS Laboratory

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Transcription:

Introduction to Bioinformatics IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki

Plan for the Week Introduction to Bioinformatics Raw sanger sequence data Introduction to CLC Bio Quality Control De novo assembly Resolving conflicts BLAST and Biological databases DNA Barcoding Nucleotide sequence Analysis MSA and Phylogenetics Sequence depositing

What is Bioinformatics Bioinformatics is an interdisciplinary science that develops and improves on methods of storing, retrieving, organizing and analyzing biological data. This computational techniques are to solve biological problemsand discoverthewealth of biological information hidden in biological data.

Bioinformatics The design, construction and use of software tools to generate, store, annotate and analyse data and information relating to Molecular Biology.

Bioinformatics The design, construction and use of software tools to generate, store, annotate and analyse data and information relating to Molecular Biology. Here we consider the use of bioinformatics tools rather than their design and construction. Here we consider the access, storage and analysis of data and information items rather than the generation and annotation.

Bioinformatics Experiment Analysis Hypothesis DATA Sequence Structure RESULT Function Evolution Pathway Interaction Mutation expression

Major types of Bioinformatics Data Literature and ontologies Genomes Gene expression Protein sequence DNA & RNA sequence Protein structure DNA & RNA structure Protein families, motifs and domains Chemical entities Protein interactions Pathways Systems

Bioinformatics Research areas Include but not limited to Organization, classification, dissemination and analysis of biological andbiomedical data Biological sequence analysis and phylogenetics. Genome organization andevolution Regulation of gene expression andepiginetics Biological pathways and network in healthy and disease states Protein structure prediction fromsequence Modelling and prediction of the biophysical properties of biomolecules for binding prediction and drug design Design of biomolecularstructure andfunction With applications to Biology, Medicine, Agriculture and Industry

Where did bioinformatics come from? Bioinformatics arose as molecular biology begun to be transformed by the emergence of molecular sequence and structural data Recap: The key dogmas of molecular biology DNA sequence determined protein sequence Protein sequence determines protein structure Protein structure determines protein function Regulatory mechanisms (e.g. gene expression) determines the amount of a particular function in space and time Bioinformatics is now essential for the archiving, organization and analysis of data related to these processes

Bioinformatics involves the application of computer algorithms, computer models and computer databases with the broad goal of understanding the action of genes, transcripts, proteins and large collections in this entities The integration of information learned about this three biological processes gives insight Into the biology of organisms

How does it look like on a computer

A cdna sequence (reading frame) >gi 14456711 ref NM_000558.3 Homo sapiens hemoglobin, alpha 1 (HBA1), mrna ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCC GCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCAC CACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG ACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCAC AAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGC CGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCC GTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC A protein sequence >gi 4504347 ref NP_000549.1 alpha 1 globin [Homo sapiens] MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

How do we actually do Bioinformatics? Prepackage tools and databases vmany online and open source vsome are commercial Tool development vmostly on UNIX environment vknowledge of programming requires(python, Perl, R, C, Java) vmay require specialized or high performance computing resources

History of Bioinformatics

History of Bioinformatics

Sequencing DNA sequencing is a process of determining the order of nucleotides within a DNA molecule.

History of DNA sequencing 1976: Maxam Gilbert sequencing 1977: Sanger sequencing (dideoxy chain termination) 1986: Flourescently labelled ddntps 1987: Applied Biosystems (ABI 370) 1988: Capillary gell electrophoresis 1999: Applied Biosystems ABI 3700 DNA Analyzer 2005 > : Next generation sequencing

Next Generation Sequencing Illumina MiniSeq Illumina MiSeq Illumina NextSeq Ion PGM PacBio RS II PacBio Sequel Ion Proton ONT MinION CTLGH Introduction to Bioinformatics, 13-17 Feb 2017, Nairobi Illumina HiSeq Illumina NovaSeq Ion S5 ONT PromethION Intro to NGS Sequencing Technologies ONT SmidgION Bert Overduin 14

Applications of Bioinformatics Microbial genome applications Molecular medicine Personalized medicine Preventive medicine Gene therapy Drug development Antibiotic resistance Evolutionary studies Biotechnology Climate change studies Crop improvement Forensic analysis Insect resistance Improve nutritional quality Development of drought resistant varieties Veterinary science Bioengineering Agriculture biotechnology.

Limitations of Bioinformatics Bioinformatics is a science of inference hence: Quality of bioinformatics predictions depends on the quality of data and sophistication of algorithms. Sequence data may have errors which subsequently leads to errors in downstream analysis. Many exhaustive algorithms cannot be used due to computational limitations. Trade-off between specificity and sensitivity

Why bioinformatics then In most cases biologics /wet lab is needed to validate bioinformatic predictions Bioinformatics can: Reduce data to a small set of testable predictions Assign a degree of confidence to each prediction The biologist will often have to choose the appropriate degree of confidence, depending on: Cost of validating predictions. Benefit expected from the right predictions. Data mining - the process by which testable hypothesis are generated regarding the function or structure of a gene or protein of interest by identifying homologs in better characterized organisms. Bioinformatics as in sillico biology: Allows for exploration of domains that cannot be addressed manually e.g study of past evolutionary events / patterns.

The End Acknowledging Bert Overduin University or Edinburgh and EBI online courses for some slides

Thank you IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki j.n.njuguna@cgiar.org