Introduction to Bioinformatics

Similar documents
Introduction to Bioinformatics

Overview of Health Informatics. ITI BMI-Dept

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Introduction to BIOINFORMATICS

Introduction to Bioinformatics

Genetics and Bioinformatics

2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP

Our website:

Genome Sequence Assembly

Introduction to Bioinformatics

Pioneering Clinical Omics

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

DEPARTMENT: MICROBIOLOGY PROGRAMME: B SC. Statements of Programme Specific Outcomes (PSOs)

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D.

Functional annotation of metagenomes

short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014

Challenging algorithms in bioinformatics

De novo assembly in RNA-seq analysis.

Introduction to Bioinformatics

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

- OMICS IN PERSONALISED MEDICINE

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

De novo whole genome assembly

Two Mark question and Answers

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Cover Page. The handle holds various files of this Leiden University dissertation.

B R I N G C O M P U T A T I O N T O L I F E

Fundamentals of Bioinformatics: computation, biology, computational biology

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Era with Computational Biology/Toxicology

Genetics Lecture 21 Recombinant DNA

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

Bioinformatics, in general, deals with the following important biological data:

De novo genome assembly with next generation sequencing data!! "

Bioinformatics Specialist

Genomes contain all of the information needed for an organism to grow and survive.

Introduction to Bioinformatics

Centre for Digital Life Norway (DLN) - Interdisciplinary research, education and innovation

G E N OM I C S S E RV I C ES

21.5 The "Omics" Revolution Has Created a New Era of Biological Research

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Computers in Biology and Bioinformatics

Introduction to BIOINFORMATICS

Role of Bioinformatics in Biotechnology

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Assembly. Ian Misner, Ph.D. Bioinformatics Crash Course. Bioinformatics Core

Microbial Metabolism Systems Microbiology

Short Course Instructors

BIOINFORMATICS THE MACHINE LEARNING APPROACH

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

NGS part 2: applications. Tobias Österlund

Bioinformatics for Microbial Biology

NGS-based innovations within the Leiden Network

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine

Introduction to Bioinformatics

Computational Challenges of Medical Genomics

Integrated M.Tech. in Biotechnology (B.Tech + M.Tech) programme. Semester 1

SYLLABUS FOR BS BIOINFORMATICS (4-YEAR DEGREE PROGRAMME)

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

METABOLOMICS: OPPORTUNITIES AND CHALLENGES

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

CSC 121 Computers and Scientific Thinking

Texas A&M University-Corpus Christi CHEM4402 Biochemistry II Laboratory Laboratory 4 - Polymerase Chain Reaction (PCR)

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

NATIONAL SCIENCE AND TECHNOLOGY COUNCIL. National Research and Development Strategy for Microbial Forensics

Processing Very Large Genomic Files

ECS 234: Introduction to Computational Functional Genomics ECS 234

Topic: Genome-Environment Interactions in Inflammatory Skin Disease

Introduction to 'Omics and Bioinformatics

Introduction to Bioinformatics

Engineering Genetic Circuits

Introduction to RNA-Seq in GeneSpring NGS Software

Research Powered by Agilent s GeneSpring

DNA. Clinical Trials. Research RNA. Custom. Reports CLIA CAP GCP. Tumor Genomic Profiling Services for Clinical Trials

WELCOME. Norma J. Nowak, PhD Executive Director, NY State Center of Excellence in Bioinformatics and Life Sciences (CBLS)

Presidential Commission for the Study of Bioethical Issues 2 nd meeting September

Biosc10 schedule reminders

Course Presentation. Ignacio Medina Presentation

Analytics Behind Genomic Testing

Genome Analysis. Bacterial genome projects

Information Technology for Genetic and Genomic Based Personalized Medicine. Submitted. April 23, 2008

DOWNLOAD OR READ : UNDERSTANDING BIOINFORMATICS PDF EBOOK EPUB MOBI

LARGE DATA AND BIOMEDICAL COMPUTATIONAL PIPELINES FOR COMPLEX DISEASES

CENTER FOR BIOTECHNOLOGY

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Ion S5 and Ion S5 XL Systems

University of Athens - Medical School. pmedgr. The Greek Research Infrastructure for Personalized Medicine

Bioinformatics. Dick de Ridder. Tuinbouw Digitaal, 12/11/15

Microbiome: Metagenomics 4/4/2018

Lecture 18: Single-cell Sequencing and Assembly. Spring 2018 May 1, 2018

Outline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing

FACULTY OF ARTS AND SCIENCE SCIENCES CURRICULUM COMMITTEE NEW COURSES

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Informatics of Clinical Genomics

Applications of Next Generation Sequencing in Metagenomics Studies

Transcription:

Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg

Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of informatic processes in biotic systems". Paulien Hogeweg is a Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels.

Definitions of what is Bioinformatics Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and analyzing the DNA sequences. In Bioinfomatics knowledge of many branches are required like biology, mathematics, computer science, laws of physics & chemistry, and of course sound knowledge of IT to analyze biotech data. Bioinformatics Bioinformatics is not limited is an to interdisciplinary the computing field data, that but develops in reality and it improves can be upon used to solve many methods biological for storing, problems retrieving, and find organizing out how living and analyzing things works. biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information. Bioinformatics develops algorithms and biological software of computer to analyze and record the data related to biology for example the data of genes, proteins, drug ingredients and metabolic pathways. dditions: 1. Bioinformatics is a SCIENCE 2. Not only to develop algorithms, store, retrieve, organize and analyze biological data but to CURATE data 3

Biology + computer science + information Biologists collect molecular data: DNA & Protein sequences, gene expression, etc. technology Bioinformaticians Answer biological questions by analyzing molecular data Computer scientists (+Mathematicians, Statisticians, etc.) Develop tools, softwares, algorithms to store and analyze the data. 4

Bioinformatics is being used in following fields: Microbial genome applications Molecular medicine Personalised medicine Preventative medicine Gene therapy Drug development Antibiotic resistance Evolutionary studies Waste cleanup Biotechnology Climate change Studies Alternative energy sources Crop improvement Forensic analysis Bio-weapon creation Insect resistance Improve nutritional quality Development of Drought resistant varieties Vetinary Science 5

Bottlenecks in bioinformatics education of biologists in the use of advanced computing tools the recruitment of computer scientists into the field of bioinformatics the limited availability of developed databases of biological information need for efficient and intelligent search engines for complex databases need for innovative NGS pipelines

Central dogma of Molecular Biology DNA makes RNA makes Protein DNA RNA PROTEIN DNA sequence DNA mutations DNA re-arangements Comparative analysis RNA sequence mrna (level) Protein Sequence Protein Structure Protein Function

DNA QC Sequencing projects lib QC Run QC Data QC DNA Libraries fastq Data analysis Results Interpretation Data applications LIMS - Lab Information Management Software

OMICS GENOMICS TRANSCRIPTOMICS PROTEOMICS SYSTEMS BIOLOGY LIPIDOMICS METABOLOMICS

Complex data sets DNA - one-dimensional information RNA - two-dimensional information proteins - three-dimensional information living systems - four-dimensional information

Microbial genome applications Genome assembly Re-sequencing Comparative analysis Evolutionary studies Antibiotic resistance Waste cleanup Biotechnology 11

Genome Assembly Genome assembly is a very complex computational problem due to enormous amount of data to put together and some other reasons reasons. Ideally an assembly program should produce one contig for every chromosome of the genome being sequenced. But because of the complex nature of the genomes, the ideal conditions just never possible, thus leading to gaps in the genome. 12

De Novo assembly - puzzle without the picture

Assembly Challenges Presence of repeats. Repeats are identical sequences that occur in the genome in different locations and are often seen in varying lengths and in the multiple copies. There are several types of repeats: tandem repeats or interspersed repeats. The read's originating from different copies of the repeat appear identical to the assembler, causing errors in the assembly. Contaminants in samples (eg. from Bacteria or Human). PCR artefacts (eg. Chimeras and Mutations) Sequencing errors, such as Homopolymer errors when eg. 2+ run of same base. MID s (multiplex indexes), primers/adapters still in the raw reads. polyploid genomes 14

Assembly algorithms Overlap-Layout-Consensus - Find overlaps between all reads reads Consensus Problems caused by new sequencing technologies: Hard to find overlaps between short reads Impossible to scale up 15

De Bruijn graph ACGTCGTA k=2 AC CG GT TA ABySS ALLPATHS-LG EULER IDBA Velvet TC Modified from Andrey Prjibelski

E. coli isolate dataset Single-cell dataset IDBA-UD SPAdes Velvet E.coli single-cell dataset

SPAdes pipeline SPAdes Input data Error correction Assembly Postprocessing Contigs 18

19

Gene Prediction and Genome Annotation ATG TAA ATG Based on similarity to known genes blastx (NCBI) Gene finding programs TAA Glimmer for most procaryotic genomes GenMark for both procaryotic genomes and eucaryotic genomes 20

Re-sequencing -characterizing the genetic variations of species or populations Re-sequencing of bacterial and archaeal isolates etc is possible if reference genomes are available This approach can help to better understand bacterial community structure, gene function in bacteria under selective pressure or in mutagenized strains.

Climate change Studies Increasing levels of carbon dioxide emission are thought to contribute to global climate change. One way to decrease atmospheric carbon dioxide is to study the genomes of microbes that use carbone dioxidet as their sole carbon source

Prevotella species were most dramatically reduced among samples from autistic children especially P. copri. (helps the breakdown of protein and carbohydrate foods) Human microbiome MetaHIT - Europe Human Microbiome Project US The human microbiome includes viruses, fungi and bacteria, their genes and their environmental interactions, and is known to influence human physiology. There s very broad variation in these bacteria in different people and that severely limits our ability to create a normal microflora profile for comparison among healthy people and those with any kind of health issues. Children with autism harbor significantly fewer types of gut bacteria than those who are not affected by the disorder, researchers have found.

Bioinformatics in clinics - it can explore the causes of diseases at the molecular level - explain the phenomena of the diseases on the gene/pathway level - make use of computer techniques (data mining, machine learning etc), to analyze and interpret data faster - to enhance the accuracy of the results Reduce the cost and time of drug discovery

To improve drug discovery we need to discover (read "develop") efficient bioinformatics algorithms and approaches for target identification target validation lead identification lead optimization

Bioinformatics and Health Informatics If bioinformatics is the study of the flow of information in biological sciences, Health Informatics is the study of the information in patient care 27

Medicine: Informatics pipeline workflow Sample Patient Physician Order Tier 2: Genome Annotation Medical Knowledgebase Sequence Tier 1: Base Calling Alignment Variant Calling EHR Tier 3: Clinical Report

Huge need in bioinformatics tools Simple pipelines/protocols and easy to read reports Sample sequencing Data Analysis Patients treatment

Team work to set up cancer sequencing facility molecular oncologists (pathway analysis) IT infrastructure (storage, updates etc) 30

Each baby to be sequenced at birth: personal reference -? GATTACA, 1997

De Bruijn graph in a nutshell Don t trouble till you troubles and it doubles others only too

THANK YOU!