Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS"

Transcription

1 Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS The Bioinformatics book covers new topics in the rapidly expanding field of bioinformatics, from next-generation sequencing to drug discovery and metagenomics. The first two chapters overviews genetic measurement methods. The next four chapters discuss topics related to the effect of genetic variants from protein modeling to gene regulatory networks. Standard statistical analysis in association studies are discussed in the next two chapters. The systems biology approach is illustrated by discussing a systems-based biomarker analysis method, the graph-based network science, the dynamical systems based approaches and a Bayesian causal inference method in subsequent chapters. The next chapter discusses text-mining methods in biomedicine, especially their application in interpretation and translation. The decision theoretic approach to study design, especially multi-stage, sequential study design is discussed in the next chapter, introducing the concepts of value of information and the expected value of an experiment. Next, the heterogeneity of biomedical big data sources is overviewed, together with data and knowledge fusion methods, and with the discussion of semantic publishing, which can lead to a new unification of biomedicine. Subsequently, bioinformatic workflow methods are summarized. At last, drug discovery methods are overviewed with an outlook for personalized medicine and the final chapter presents the main steps and workflows in metagenomics. Keywords: genotyping, next-generation sequencing methods, protein modeling, gene regulatory networks, omic networks, study design, data and knowledge fusion, worklfow systems, association study, biomarker analysis, medical decision support systems, semantic publishing, similarity based drug discovery, metagenomics. Budapest University of Technology & Economics and Semmelweis University Typotex Kiadó 2014

2 COPYRIGHT: , Péter Antal, Ádám Arany, Bence Bolgár, András Gézsi, Gergely Hajós, Gábor Hullám, Péter Marx, András Millinghoffer, László Poppe, Péter Sárközy, Budapest University of Technology and Economics, Semmelweis University Creative Commons NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0) Terms of use of : This work can be reproduced, circulated, published and performed for non-commercial purposes without restriction by indicating the author s name, but it cannot be modified. Scientific lectors: Viktor Molnár, András Antos ISBN Prepared under the editorship of Typotex Kiadó Responsible manager: Zsuzsa Votisky Prepared within the framework of the project Konzorcium a biotechnológia aktív tanulásáért ( Consortium for the Active Studying of Biotechnology ) Grant No. TÁMOP /A/1-11/

3 Contents 1 DNA recombinant measurement technology, noise and error models Historic overview Clinical aspects of genome sequencing Partial Genetic Association Studies Genome Wide Association Studies First generation automated Sanger sequencing Next generation sequencing technologies Pyrosequencing and ph based sequencing Reversible terminator based sequencing Nanopore based sequencing Error characteristics of Next Generation Sequencing Carry forward/incomplete extension Homopolymer errors Capture technologies PCR capture Emulsion PCR Bridge amplification Targeted resequencing De-novo sequencing Next generation sequencing workflows Filtering Mapping Assembly Variant calling Paired end sequencing Multiplexing samples The post-processing, haplotype reconstruction, and imputation Genome Genotype Single nucleotide polymorphisms Types of point mutation Haplotypes and recombination

4 Contents Linkage Disequilibrium Haplotype reconstruction Imputation Genotyping platforms Sample preparation Regions of interest Primer Design PCR Probe-tag based genotyping Sanger sequencing Real-time qualitative polymerase chain reaction SNP arrays Genotyping vs. gene expression Call rate and accuracy Comparative protein modeling and molecular docking Introduction The protein structure gap Methods of protein modeling Comparative protein modeling Steps of homology modeling Tools for homology modeling Molecular docking Protein-ligand interaction predictions Protein-biomacromolecule interaction predictions Methods of determining structure of proteins and protein structure databases Introduction Protein identification tools Simple protein analyses Levels and problems of protein structure predictions Experimental methods to determine the secondary structure of proteins Protein circular dichroism (CD) Synchrotron radiation circular dichroism (SRCD) Experimental methods to determining atomic structures of proteins Protein X-ray crystallography Protein NMR spectroscopy Protein electron microscopy, electron diffraction and electron crystallography Protein neutron crystallography

5 Contents 5 5 Quantitative models of the functional effects of genetic variants Introduction Variants SNP, indel Alternative splicing Levels of regulation Different regulatory elements microrna mirna development mirna regulatory methods Transcription factors Epigenetics Methylation Histone modifications Mathematical models of gene regulatory networks Introduction Learning networks Representation Types of network learning algorithms TF, mirna, mrna regulatory networks Standard analysis of genetic association studies Introduction Genetic data transformation Filtering Standard test for Hardy Weinberg equilibrium Phenotype data transformation Transformation Discretization Univariate analysis methods Standard association tests Cochran Armitage test for trend Odds ratios Univariate Bayesian methods Multivariate analysis methods Logistic regression Haplotype association Analysis of statistical power Analyzing gene expression studies Introduction Pre-procession

6 Contents Background correction Normalization Summarization Filtering Data analysis Clustering Differential expression Biological interpretation of results Biomarker analysis 115 Notation Introduction Background Bayesian multilevel analysis of relevance Multivariate scalability: k-mbs and k-mbg features A knowledge-rich aggregation of input features Interaction, redundancy based on posterior decomposition Relevance for multiple targets Conditional and contextual relevance Posteriors for the predictive power of input features Algorithmic aspects and applications Summary Network biology Introduction Biological networks Basics of graph theory Network analysis Network topology Network models and dynamics Assortativity, degree distribution and scale-free networks Tasks and challenges An application to drug discovery Dynamic modeling in cell biology Biochemical concepts and their computational representations Modeling with ordinary differential equations Stochastic modeling Hybrid methods Reaction diffusion systems Model fitting Whole-cell simulation Overview

7 Contents 7 12 Causal inference in biomedicine 152 Notation Introduction Representing independence and causal relations by Bayesian networks Constraint based inference of causal relations and models Learning complete causal domain models Bayesian inference of causal features Edges: direct pairwise dependencies Pairwise causal relations MBG subnetworks Ordering of the variables Effect modifiers Text mining methods in bioinformatics Introduction Biomedical text mining Constructing the corpus Constructing the vocabulary Text mining tasks Basic techniques Pattern matching Document representation Methods for named entity recognition Methods for relation extraction Lexicalized probabilistic context-free grammars Difficulties in biomedical text mining Text mining and knowledge management Experimental design: from the basics to active learning extensions Introduction The elements of experimental design Phases of biomedical DOE Types of biological experiments A decision theoretic approach to DoE Expected value of an experiment Adaptive designs and budgeted learning A Bayesian treatment of sequential decision processes Approaches to target variable selection Gene Prioritization Active learning Other practical tasks relying on bioinformatics

8 Contents 8 15 Big data in biomedicine Introduction The first wave of biomedical big data Post-genomic big data: the second wave The common big data The health-related common big data in biomedicine Bioinformatic challenges of common big data Analysis of heterogeneous biomedical data through information fusion Introduction Information fusion and data fusion Types of data fusion Early fusion Intermediate fusion Late fusion Similarity-based data fusion The Bayesian Encyclopedia Introduction The three worlds of data, knowledge and computation From fragmentation problems to workflow for unification Data repositories with semantic technologies Semantic publishing for the literature world Causal Bayesian network-based data analytic knowledge bases Examples for links between worlds Prospects for the Bayesian Encyclopedia Bioinformatical workflow systems case study Overview of tasks Data model and representation Use cases and architecture Implementation details of the server Postprocessing steps Computational aspects of pharmaceutical research Overview of the process Chemoinformatical background Screening criteria Method Fragment-based design Drug repositioning

9 Contents 9 20 Metagenomics Introduction Metagenome analysis Community profiling Functional metagenomics Metagenomics step by step Sampling Sequencing Assembly Binning Gene calling and functional inference

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Types of Databases - By Scope

Types of Databases - By Scope Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of

More information

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press

More information

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

Lecture #1. Introduction to microarray technology

Lecture #1. Introduction to microarray technology Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing

More information

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome Protein-Protein Interactions Protein Interactions A Protein may interact with: Other proteins Nucleic Acids Small molecules Protein-Protein Interactions: The Interactome Experimental methods: Mass Spec,

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Agenda What is Functional Genomics? RNA Transcription/Gene Expression Measuring Gene

More information

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s

More information

Network System Inference

Network System Inference Network System Inference Francis J. Doyle III University of California, Santa Barbara Douglas Lauffenburger Massachusetts Institute of Technology WTEC Systems Biology Final Workshop March 11, 2005 What

More information

The Integrated Biomedical Sciences Graduate Program

The Integrated Biomedical Sciences Graduate Program The Integrated Biomedical Sciences Graduate Program at the university of notre dame Cutting-edge biomedical research and training that transcends traditional departmental and disciplinary boundaries to

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Multi-omics in biology: integration of omics techniques

Multi-omics in biology: integration of omics techniques 31/07/17 Летняя школа по биоинформатике 2017 Multi-omics in biology: integration of omics techniques Konstantin Okonechnikov Division of Pediatric Neurooncology German Cancer Research Center (DKFZ) 2 Short

More information

Microarray Gene Expression Analysis at CNIO

Microarray Gene Expression Analysis at CNIO Microarray Gene Expression Analysis at CNIO Orlando Domínguez Genomics Unit Biotechnology Program, CNIO 8 May 2013 Workflow, from samples to Gene Expression data Experimental design user/gu/ubio Samples

More information

Introduction to Next Generation Sequencing (NGS)

Introduction to Next Generation Sequencing (NGS) Introduction to Next eneration Sequencing (NS) Simon Rasmussen Assistant Professor enter for Biological Sequence analysis Technical University of Denmark 2012 Today 9.00-9.45: Introduction to NS, How it

More information

Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall Genome Sequencing Technologies Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall Sciences start with Observation Sciences start with Observation and flourish with

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

Biotechnology and Genomics in Public Health. Sharon S. Krag, PhD Johns Hopkins University

Biotechnology and Genomics in Public Health. Sharon S. Krag, PhD Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia

More information

Ontologies - Useful tools in Life Sciences and Forensics

Ontologies - Useful tools in Life Sciences and Forensics Ontologies - Useful tools in Life Sciences and Forensics How today's Life Science Technologies can shape the Crime Sciences of tomorrow 04.07.2015 Dirk Labudde Mittweida Mittweida 2 Watson vs Watson Dr.

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM

SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM SNP GENOTYPING Accurate, sensitive, flexible MassARRAY System SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM Biomarker validation Routine genetic testing Somatic mutation profiling Up to 400

More information

Target Enrichment Strategies for Next Generation Sequencing

Target Enrichment Strategies for Next Generation Sequencing Target Enrichment Strategies for Next Generation Sequencing Anuj Gupta, PhD Agilent Technologies, New Delhi Genotypic Conference, Sept 2014 NGS Timeline Information burst Nearly 30,000 human genomes sequenced

More information

CMPS 3110 : Bioinformatics. High-Throughput Sequencing and Applications

CMPS 3110 : Bioinformatics. High-Throughput Sequencing and Applications CMPS 3110 : Bioinformatics High-Throughput Sequencing and Applications Sanger (1982) introduced chaintermination sequencing. Main idea: Obtain fragments of all possible lengths, ending in A, C, T, G. Using

More information

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential Applications Richard Finkers Researcher Plant Breeding, Wageningen UR Plant Breeding, P.O. Box 16, 6700 AA, Wageningen, The Netherlands,

More information

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology. SIMS2003 Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School Introduction to Microarray Technology. Lecture 1 I. EXPERIMENTAL DETAILS II. ARRAY CONSTRUCTION III. IMAGE ANALYSIS Lecture

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Fundamentals of Clinical Genomics

Fundamentals of Clinical Genomics Fundamentals of Clinical Genomics Wellcome Genome Campus Hinxton, Cambridge, UK 17-19 January 2018 Lectures and Workshops to be held in the Rosalind Franklin Pavilion Lunch and Dinner to be held in the

More information

Lecture 8: Sequencing and SNP. Sept 15, 2006

Lecture 8: Sequencing and SNP. Sept 15, 2006 Lecture 8: Sequencing and SNP Sept 15, 2006 Announcements Random questioning during literature discussion sessions starts next week for real! Schedule changes Moved QTL lecture up Removed landscape genetics

More information

Chapter 15 Gene Technologies and Human Applications

Chapter 15 Gene Technologies and Human Applications Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Richard Corbett Canada s Michael Smith Genome Sciences Centre Vancouver, British Columbia June 28, 2017 Our mandate is to advance knowledge about cancer and other diseases

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

AP Biology Gene Expression/Biotechnology REVIEW

AP Biology Gene Expression/Biotechnology REVIEW AP Biology Gene Expression/Biotechnology REVIEW Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Gene expression can be a. regulated before transcription.

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

Bio 311 Learning Objectives

Bio 311 Learning Objectives Bio 311 Learning Objectives This document outlines the learning objectives for Biol 311 (Principles of Genetics). Biol 311 is part of the BioCore within the Department of Biological Sciences; therefore,

More information

Engineering Genetic Circuits

Engineering Genetic Circuits Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art

More information

Examination Assignments

Examination Assignments Bioinformatics Institute of India H-109, Ground Floor, Sector-63, Noida-201307, UP. INDIA Tel.: 0120-4320801 / 02, M. 09818473366, 09810535368 Email: info@bii.in, Website: www.bii.in INDUSTRY PROGRAM IN

More information

Sequencing the Human Genome

Sequencing the Human Genome The Biotechnology 339 EDVO-Kit # Sequencing the Human Genome Experiment Objective: In this experiment, DNA sequences obtained from automated sequencers will be submitted to Data bank searches using the

More information

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF) Guideline for the submission of DNA sequences derived from genetically modified organisms and associated annotations within the framework of Directive 2001/18/EC and Regulation (EC) No 1829/2003 European

More information

Predictive and Causal Modeling in the Health Sciences. Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics

Predictive and Causal Modeling in the Health Sciences. Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics Predictive and Causal Modeling in the Health Sciences Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics 1 Exponentially Rapid Data Accumulation Protein Sequencing

More information

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview Bioinformatics Some selected examples... and a bit of an overview Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July 19, 2007 @ EnviroHealth Connections Bioinformatics and

More information

Introduction to NGS Technologies

Introduction to NGS Technologies Introduction to NGS Technologies Ignacio Medina imedina@ebi.ac.uk Project Manager & Senior Software Engineer at EBI Variation European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory

More information

UF Center for Pharmacogenomics. Explanation of Services. UF Center for Pharmacogenomics Services

UF Center for Pharmacogenomics. Explanation of Services. UF Center for Pharmacogenomics Services UF Center for Pharmacogenomics Explanation of Services Services are provided either as a price per sample or price per project, depending on the specific needs of the researcher. Basic a la carte services,

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research

E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research www.hcltech.com E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research whitepaper April 2015 TABLE OF CONTENTS Introduction 3 Challenges associated with NGS data analysis 3 HCL s NGS Solution

More information

Bioinformatics, in general, deals with the following important biological data:

Bioinformatics, in general, deals with the following important biological data: Pocket K No. 23 Bioinformatics for Plant Biotechnology Introduction As of July 30, 2006, scientists around the world are pursuing a total of 2,126 genome projects. There are 405 published complete genomes,

More information

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) DNA-Sequencing Technologies & Devices Matthias Platzer Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Genome analysis DNA sequencing platforms ABI 3730xl 4/2004 & 6/2006 1 Mb/day,

More information

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality

More information

Next Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes

Next Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes Next Generation Sequencing Technologies Some slides are modified from Robi Mitra s lecture notes What will you do to understand a disease? What will you do to understand a disease? Genotype Phenotype Hypothesis

More information

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) DNA-Sequencing Technologies & Devices Matthias Platzer Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Genome analysis DNA sequencing platforms ABI 3730xl 4/2004 & 6/2006 1 Mb/day,

More information

Reflection paper on co-development of pharmacogenomic biomarkers and Assays in the context of drug development

Reflection paper on co-development of pharmacogenomic biomarkers and Assays in the context of drug development 1 2 3 24 June 2010 EMA/CHMP/641298/2008 Committee for Medicinal Products for Human Use (CHMP) 4 5 6 7 Reflection paper on co-development of pharmacogenomic biomarkers and Assays in the context of drug

More information

Genetic Engineering & Recombinant DNA

Genetic Engineering & Recombinant DNA Genetic Engineering & Recombinant DNA Chapter 10 Copyright The McGraw-Hill Companies, Inc) Permission required for reproduction or display. Applications of Genetic Engineering Basic science vs. Applied

More information

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi

More information

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx,

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

Introduction to the UCSC genome browser

Introduction to the UCSC genome browser Introduction to the UCSC genome browser Dominik Beck NHMRC Peter Doherty and CINSW ECR Fellow, Senior Lecturer Lowy Cancer Research Centre, UNSW and Centre for Health Technology, UTS SYDNEY NSW AUSTRALIA

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 2 Genes, Genomes, and Mendel Nicholas Wheeler & David Harry

More information

Cancer Genetics Solutions

Cancer Genetics Solutions Cancer Genetics Solutions Cancer Genetics Solutions Pushing the Boundaries in Cancer Genetics Cancer is a formidable foe that presents significant challenges. The complexity of this disease can be daunting

More information

Exam MOL3007 Functional Genomics

Exam MOL3007 Functional Genomics Faculty of Medicine Department of Cancer Research and Molecular Medicine Exam MOL3007 Functional Genomics Tuesday May 29 th 9.00-13.00 ECTS credits: 7.5 Number of pages (included front-page): 5 Supporting

More information

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015 High Throughput Sequencing Technologies UCD Genome Center Bioinformatics Core Monday 15 June 2015 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion 2011 PacBio

More information

Modeling of Protein Production Process by Finite Automata (FA)

Modeling of Protein Production Process by Finite Automata (FA) Modeling of Protein Production Process by Finite Automata (FA) ISSAC BARJIS 1, JOE W. YEOL 2, YEONG SOON RYU 3 Physics and Biomedical Sciences 1, Mechanical Engineering Technology 3 City University of

More information

Molecular Markers CRITFC Genetics Workshop December 9, 2014

Molecular Markers CRITFC Genetics Workshop December 9, 2014 Molecular Markers CRITFC Genetics Workshop December 9, 2014 Molecular Markers Tools that allow us to collect information about an individual, a population, or a species Application in fisheries mating

More information

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Reesab Pathak Dept. of Computer Science Stanford University rpathak@stanford.edu Abstract Transcription factors are

More information

240EQ222 - Genetic Engineering

240EQ222 - Genetic Engineering Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2017 295 - EEBE - Barcelona East School of Engineering 713 - EQ - Department of Chemical Engineering MASTER'S DEGREE IN CHEMICAL ENGINEERING

More information

Reading Between the Genes: Computational Models to Discover Function from Noncoding DNA

Reading Between the Genes: Computational Models to Discover Function from Noncoding DNA Reading Between the Genes: Computational Models to Discover Function from Noncoding DNA Yves A. Lussier, Joanne Berghout, Francesca Vitali, Kenneth S. Ramos Center for Biomedical Informatics and Biostatistics,

More information

Genome research in eukaryotes

Genome research in eukaryotes Functional Genomics Genome and EST sequencing can tell us how many POTENTIAL genes are present in the genome Proteomics can tell us about proteins and their interactions The goal of functional genomics

More information

4.1. Genetics as a Tool in Anthropology

4.1. Genetics as a Tool in Anthropology 4.1. Genetics as a Tool in Anthropology Each biological system and every human being is defined by its genetic material. The genetic material is stored in the cells of the body, mainly in the nucleus of

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Name Class Date. Practice Test

Name Class Date. Practice Test Name Class Date 12 DNA Practice Test Multiple Choice Write the letter that best answers the question or completes the statement on the line provided. 1. What do bacteriophages infect? a. mice. c. viruses.

More information

Lecture 11: Bioinformatics tools and databases Vladimir Rogojin. Fall 2015

Lecture 11: Bioinformatics tools and databases Vladimir Rogojin. Fall 2015 Introduction to Computational and Systems Biology Lecture 11: Bioinformatics tools and databases Vladimir Rogojin Department of Computer Science, Åbo Akademi http://users.abo.fi/ipetre/compsysbio Fall

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Whole genome sequencing in drug discovery research: a one fits all solution?

Whole genome sequencing in drug discovery research: a one fits all solution? Whole genome sequencing in drug discovery research: a one fits all solution? Marc Sultan, September 24th, 2015 Biomarker Development, Translational Medicine, Novartis On behalf of the BMD WGS pilot team:

More information

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning Section A: DNA Cloning 1. DNA technology makes it possible to clone genes for basic research and commercial applications: an overview 2. Restriction enzymes are used to make recombinant DNA 3. Genes can

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Why can GBS be complicated? Tools for filtering, error correction and imputation. Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower

More information

Overview of Health Informatics. ITI BMI-Dept

Overview of Health Informatics. ITI BMI-Dept Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational

More information

General Biology 1004 Chapter 10 Lecture Handout, Summer 2005 Dr. Frisby

General Biology 1004 Chapter 10 Lecture Handout, Summer 2005 Dr. Frisby Slide 1 CHAPTER 10 Molecular Biology of the Gene PowerPoint Lecture Slides for Essential Biology, Second Edition & Essential Biology with Physiology Presentation prepared by Chris C. Romero Neil Campbell,

More information

Chapter 20: Biotechnology

Chapter 20: Biotechnology Name Period The AP Biology exam has reached into this chapter for essay questions on a regular basis over the past 15 years. Student responses show that biotechnology is a difficult topic. This chapter

More information

Introductory Next Gen Workshop

Introductory Next Gen Workshop Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

Gene-Environment Interactions In Complex Human Diseases

Gene-Environment Interactions In Complex Human Diseases Gene-Environment Interactions In Complex Human Diseases Xiaobin Wang, MD, MPH, ScD Director and The Mary Ann & J. Milburn Smith Research Professor The Mary Ann & J. Milburn Smith Child Health Research

More information

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Phasing of 2-SNP Genotypes based on Non-Random Mating Model Phasing of 2-SNP Genotypes based on Non-Random Mating Model Dumitru Brinza and Alexander Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA 30303 {dima,alexz}@cs.gsu.edu Abstract.

More information

Introduction to Molecular Biology

Introduction to Molecular Biology Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve

More information

Mutation entries in SMA databases Guidelines for national curators

Mutation entries in SMA databases Guidelines for national curators 1 Mutation entries in SMA databases Guidelines for national curators GENERAL CONSIDERATIONS Role of the curator(s) of a national database Molecular data can be collected by many different ways. There are

More information

Research school methods seminar Genomics and Transcriptomics

Research school methods seminar Genomics and Transcriptomics Research school methods seminar Genomics and Transcriptomics Stephan Klee 19.11.2014 2 3 4 5 Genetics, Genomics what are we talking about? Genetics and Genomics Study of genes Role of genes in inheritence

More information

8/21/2014. From Gene to Protein

8/21/2014. From Gene to Protein From Gene to Protein Chapter 17 Objectives Describe the contributions made by Garrod, Beadle, and Tatum to our understanding of the relationship between genes and enzymes Briefly explain how information

More information

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping Sept 2. Structure and Organization of Genomes Today: Genetic and Physical Mapping Assignments: Gibson & Muse, pp.4-10 Brown, pp. 126-160 Olson et al., Science 245: 1434 New homework:due, before class,

More information

Sequencing Millions of Animals for Genomic Selection 2.0

Sequencing Millions of Animals for Genomic Selection 2.0 Proceedings, 10 th World Congress of Genetics Applied to Livestock Production Sequencing Millions of Animals for Genomic Selection 2.0 J.M. Hickey 1, G. Gorjanc 1, M.A. Cleveland 2, A. Kranis 1,3, J. Jenko

More information

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017 Next Generation Sequencing Jeroen Van Houdt - Leuven 13/10/2017 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degradation" F Sanger"DNA

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

ICH Topic E16 Genomic Biomarkers Related to Drug Response: Context, Structure and Format of Qualification Submissions. Step 3

ICH Topic E16 Genomic Biomarkers Related to Drug Response: Context, Structure and Format of Qualification Submissions. Step 3 European Medicines Agency June 2009 EMEA/CHMP/ICH/380636/2009 ICH Topic E16 Genomic Biomarkers Related to Drug Response: Context, Structure and Format of Qualification Submissions Step 3 NOTE FOR GUIDANCE

More information

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias Genome Sequencing I: Methods MMG 835, SPRING 2016 Eukaryotic Molecular Genetics George I. Mias Department of Biochemistry and Molecular Biology gmias@msu.edu Sequencing Methods Cost of Sequencing Wetterstrand

More information

Pyrosequencing. Alix Groom

Pyrosequencing. Alix Groom Pyrosequencing Alix Groom Pyrosequencing high-throughput CpG methylation analysis platform real-time, sequence-based detection and quantification % methylation at multiple adjacent CpG sites 80-100 bases

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Feature Selection of Gene Expression Data for Cancer Classification: A Review Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information