ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Size: px
Start display at page:

Download "ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG"

Transcription

1 Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup an informa business A CHAPMAN & HALL BOOK

2 Contents Preface xv 1 Introduction to Molecular Biology 1 11 DNA, RNA, and Protein Proteins DNA RNA 9 12 Genome, Chromosome, and Gene Genome Chromosome Gene Complexity of the Organism versus Genome Size Number of Genes versus Genome Size Replication and Mutation of DNA Central Dogma (from DNA to Protein) Transcription (Prokaryotes) Transcription (Eukaryotes) Translation Post-Translation Modification (PTM) Population Genetics Basic Biotechnological Tools Restriction Enzymes Sonication Cloning PCR Gel Electrophoresis Hybridization Next Generation DNA Sequencing Brief History of Bioinformatics Exercises 27 2 Sequence Similarity Introduction Global Alignment Problem Needleman-Wunsch Algorithm Running Time Issue Space Efficiency Issue 35 vii

3 viii 224 More on Global Alignment Local Alignment Semi-Global Alignment Gap Penalty General Gap Penalty Model Affine Gap Penalty Model Convex Gap Model Scoring Function Scoring Function for DNA Scoring Function for Protein Exercises 53 3 Suffix Tree Introduction Suffix Tree Simple Applications of a Suffix Tree Exact String Matching Problem Longest Repeated Substring Problem Longest Common Substring Problem Longest Common Prefix (LCP) Finding a Palindrome Extracting the Embedded Suffix Tree of a String from the Generalized Suffix Tree Common Substring of 2 or More Strings Construction of a Suffix Tree Step 1: Construct the Odd Suffix Tree Step 2: Construct the Even Suffix Tree Step 3: Merge the Odd and the Even Suffix Trees Suffix Array Construction of a Suffix Array Exact String Matching Using a Suffix Array FM-Index Definition The occ Data Structure Exact String Matching Using the FM-Index Approximate Searching Problem Exercises Genome Alignment Introduction Maximum Unique Match (MUM) How to Find MUMs MUMmerl: LCS Dynamic Programming Algorithm in 0(n2) Time 432 An 0(n logn)-time Algorithm 93

4 ix 44 MUMmer2 and MUMmer Reducing Memory Usage Employing a New Alternative Algorithm for Finding MUMs Clustering Matches Extension of the Definition of MUM Mutation Sensitive Alignment Concepts and Definitions The Idea of the Heuristic Algorithm Experimental Results Dot Plot for Visualizing the Alignment Further Reading Exercises Database Search Introduction Biological Database Database Searching Types of Algorithms Smith-Waterman Algorithm Ill 53 FastA Ill 531 FastP Algorithm FastA Algorithm BLAST BLAST BLAST BLAST1 versus BLAST BLAST versus FastA Statistics for Local Alignment Variations of the BLAST Algorithm MegaBLAST BLAT PatternHunter PSI-BLAST (Position-Specific Iterated BLAST) 56 Q-gram Alignment based on Suffix ARrays 123 (QUASAR) Algorithm Speeding Up and Reducing the Space for QUASAR Time Analysis Locality-Sensitive Hashing BWT-SW Aligning Query Sequence to Suffix Tree Meaningful Alignment Are Existing Database Searching Methods Sensitive Enough? Exercises 136

5 X 6 Multiple Sequence Alignment Introduction Formal Definition of the Multiple Sequence Alignment Prob lem Methods for Solving the MSA Problem Dynamic Programming Method Center Star Method Progressive Alignment Method ClustalW Profile-Profile Alignment Limitation of Progressive Alignment Construction Iterative Method MUSCLE Log-Expectation (LE) Score Further Reading Exercises Phylogeny Reconstruction Introduction Mitochondrial DNA and Inheritance The Constant Molecular Clock Phylogeny Applications of Phylogeny Phylogenetic Tree Reconstruction Character-Based Phylogeny Reconstruction Algorithm Maximum Parsimony Compatibility Maximum Likelihood Problem Distance-Based Phylogeny Reconstruction Algorithm Additive Metric and Ultrametric Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Additive Tree Reconstruction Nearly Additive Tree Reconstruction Can We Apply Distance-Based Methods Given a Character-State Matrix? Bootstrapping Can Tree Reconstruction Methods Infer the Correct Tree? Exercises Phylogeny Comparison Introduction Similarity Measurement Computing MAST by Dynamic Programming MAST for Unrooted Trees 202

6 xi 83 Dissimilarity Measurements Robinson-Foulds Distance Nearest Neighbor Interchange Distance (NNI) Subtree Transfer Distance (STT) Quartet Distance Consensus Tree Problem Strict Consensus Tree Majority Rule Consensus Tree Median Consensus Tree Greedy Consensus Tree R* Tree Further Reading Exercises Genome Rearrangement Introduction Types of Genome Rearrangements Computational Problems Sorting an Unsigned Permutation by Reversals Upper and Lower Bound on an Unsigned Reversal Dis tance Approximation Algorithm for Sorting an Unsigned Permutation Approximation Algorithm for Sorting an Unsigned Permutation Sorting a Signed Permutation by Reversals Upper Bound on Signed Reversal Distance Elementary Intervals, Cycles, and Components The Hannenhalli-Pevzner Theorem Further Reading Exercises Motif Finding Introduction Identifying Binding Regions of TFs Motif Model The Motif Finding Problem Scanning for Known Motifs Statistical Approaches Gibbs Motif Sampler MEME Combinatorial Approaches Exhaustive Pattern-Driven Algorithm Sample-Driven Approach Suffix Tree-Based Algorithm 263

7 xii 1074 Graph-Based Method Scoring Function Motif Ensemble Methods 267 I 091 Approach of MotifVoter Motif Filtering by the Discriminative and Consensus Criteria Sites Extraction and Motif Generation Can Motif Finders Discover the Correct Motifs? Motif Finding Utilizing Additional Information Regulatory Element Detection Using Correlation with Expression Discovery of Regulatory Elements by Phylogenetic Footprinting Exercises RNA Secondary Structure Prediction Introduction 281 II 11 Base Interactions in RNA RNA Structures Obtaining RNA Secondary Structure Experimentally 113 RNA Structure Prediction Based on Sequence Only Structure Prediction with the Assumption That There is No Pseudoknot Nussinov Folding Algorithm ZUKER Algorithm Time Analysis Speeding up Multi-Loops Speeding up Internal Loops Structure Prediction with Pseudoknots Definition of a Simple Pseudoknot Akutsu's Algorithm for Predicting an RNA Secondary Structure with Simple Pseudoknots Exercises Peptide Sequencing Introduction Obtaining the Mass Spectrum of a Peptide Modeling the Mass Spectrum of a Fragmented Peptide Amino Acid Residue Mass Fragment Ion Mass De Novo Peptide Sequencing Using Dynamic Programming Scoring by Considering y-ions Scoring by Considering y-ions and b-ions De Novo Sequencing Using Graph-Based Approach Peptide Sequencing via Database Search 319

8 xiii 127 Further Reading Exercises Population Genetics Introduction Locus, Genotype, Allele, and SNP Genotype Frequency and Allele Frequency Haplotype and Phenotype Technologies for Studying the Human Population Bioinformatics Problems Hardy-Weinberg Equilibrium Linkage Disequilibrium D and D' r Genotype Phasing Clark's Algorithm Perfect Phylogeny Haplotyping Problem Maximum Likelihood Approach Phase Algorithm Tag SNP Selection Zhang et al's Algorithm IdSelect Association Study Categorical Data Analysis Relative Risk and Odds Ratio Linear Regression Logistic Regression Exercises 344 References 349 Index 375

M. Phil. (Computer Science) Programme < >

M. Phil. (Computer Science) Programme < > M. Phil. (Computer Science) Programme Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore-756019, Odisha. MPCS11: Research Methodology Unit

More information

Challenging algorithms in bioinformatics

Challenging algorithms in bioinformatics Challenging algorithms in bioinformatics 11 October 2018 Torbjørn Rognes Department of Informatics, UiO torognes@ifi.uio.no What is bioinformatics? Definition: Bioinformatics is the development and use

More information

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis Introduction CS482/682 Computational Techniques in Biological Sequence Analysis Outline Course logistics A few example problems Course staff Instructor: Bin Ma (DC 3345, http://www.cs.uwaterloo.ca/~binma)

More information

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics Processes Activation Repression Initiation Elongation.... Processes Splicing Editing Degradation Translation.... Transcription Translation DNA Regulators DNA-Binding Transcription Factors Chromatin Remodelers....

More information

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall Computational Genomics Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall 2015-16 1 What s in class this week Motivation Administrata Some very basic biology Some very basic biotechnology Examples of our type

More information

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course logistics Genomes (so many genomes) The computational bottleneck Python: Programs, input and output Number and

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

03-511/711 Computational Genomics and Molecular Biology, Fall

03-511/711 Computational Genomics and Molecular Biology, Fall 03-511/711 Computational Genomics and Molecular Biology, Fall 2011 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics

More information

Software Metrics. Practical Approach. A Rigorous and. Norman Fenton. James Bieman THIRD EDITION. CRC Press CHAPMAN & HALIVCRC INNOVATIONS IN

Software Metrics. Practical Approach. A Rigorous and. Norman Fenton. James Bieman THIRD EDITION. CRC Press CHAPMAN & HALIVCRC INNOVATIONS IN CHAPMAN & HALIVCRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT Software Metrics A Rigorous and Practical Approach THIRD EDITION Norman Fenton Queen Mary University of London. UK James

More information

Examination Assignments

Examination Assignments Bioinformatics Institute of India H-109, Ground Floor, Sector-63, Noida-201307, UP. INDIA Tel.: 0120-4320801 / 02, M. 09818473366, 09810535368 Email: info@bii.in, Website: www.bii.in INDUSTRY PROGRAM IN

More information

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM) PROGRAM TITLE DEGREE TITLE Master of Science Program in Bioinformatics and System Biology (International Program) Master of Science (Bioinformatics

More information

Data Mining and Applications in Genomics

Data Mining and Applications in Genomics Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications

More information

Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity.

Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity. Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity of Alignment Dr.D.Chandrakala 1, Dr.T.Sathish Kumar 2, S.Preethi 3, D.Sowmya

More information

Why do we need statistics to study genetics and evolution?

Why do we need statistics to study genetics and evolution? Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.

More information

Introduction Genetics in Human Society The Universality of Genetic Principles Model Organisms Organizing the Study of Genetics The Concept of the

Introduction Genetics in Human Society The Universality of Genetic Principles Model Organisms Organizing the Study of Genetics The Concept of the Introduction Genetics in Human Society The Universality of Genetic Principles Model Organisms Organizing the Study of Genetics The Concept of the Gene Genetic Analysis Molecular Foundations of Genetics

More information

Tutorial for Stop codon reassignment in the wild

Tutorial for Stop codon reassignment in the wild Tutorial for Stop codon reassignment in the wild Learning Objectives This tutorial has two learning objectives: 1. Finding evidence of stop codon reassignment on DNA fragments. 2. Detecting and confirming

More information

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations An Analytical Upper Bound on the Minimum Number of Recombinations in the History of SNP Sequences in Populations Yufeng Wu Department of Computer Science and Engineering University of Connecticut Storrs,

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Reveal Motif Patterns from Financial Stock Market

Reveal Motif Patterns from Financial Stock Market Reveal Motif Patterns from Financial Stock Market Prakash Kumar Sarangi Department of Information Technology NM Institute of Engineering and Technology, Bhubaneswar, India. Prakashsarangi89@gmail.com.

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

RESEARCH METHODOLOGY, BIOSTATISTICS AND IPR

RESEARCH METHODOLOGY, BIOSTATISTICS AND IPR MB 401: RESEARCH METHODOLOGY, BIOSTATISTICS AND IPR Objectives: The overall aim of the course is to deepen knowledge regarding basic concepts of Biostatistics, the research process in occupational therapy

More information

ESSENTIAL BIOINFORMATICS

ESSENTIAL BIOINFORMATICS ESSENTIAL BIOINFORMATICS Essential Bioinformatics is a concise yet comprehensive textbook of bioinformatics that provides a broad introduction to the entire field. Written specifically for a life science

More information

AC Algorithms for Mining Biological Sequences (COMP 680)

AC Algorithms for Mining Biological Sequences (COMP 680) AC-04-18 Algorithms for Mining Biological Sequences (COMP 680) Instructor: Mathieu Blanchette School of Computer Science and McGill Centre for Bioinformatics, 332 Duff Building McGill University, Montreal,

More information

03-511/711 Computational Genomics and Molecular Biology, Fall

03-511/711 Computational Genomics and Molecular Biology, Fall 03-511/711 Computational Genomics and Molecular Biology, Fall 2010 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics

More information

Applicazioni biotecnologiche

Applicazioni biotecnologiche Applicazioni biotecnologiche Analisi forense Sintesi di proteine ricombinanti Restriction Fragment Length Polymorphism (RFLP) Polymorphism (more fully genetic polymorphism) refers to the simultaneous occurrence

More information

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:

More information

Classification and Learning Using Genetic Algorithms

Classification and Learning Using Genetic Algorithms Sanghamitra Bandyopadhyay Sankar K. Pal Classification and Learning Using Genetic Algorithms Applications in Bioinformatics and Web Intelligence With 87 Figures and 43 Tables 4y Spri rineer 1 Introduction

More information

RNA Structure Prediction. Algorithms in Bioinformatics. SIGCSE 2009 RNA Secondary Structure Prediction. Transfer RNA. RNA Structure Prediction

RNA Structure Prediction. Algorithms in Bioinformatics. SIGCSE 2009 RNA Secondary Structure Prediction. Transfer RNA. RNA Structure Prediction Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Goals of this course Learn about Software tools Databases Methods (Algorithms) in

More information

Engineering. Software VACLAV RAJLICH. The Current Practice. 0\ CRC Press Taylor & Francis Group CHAPMAN & HALL/CRC INNOVATIONS IN

Engineering. Software VACLAV RAJLICH. The Current Practice. 0\ CRC Press Taylor & Francis Group CHAPMAN & HALL/CRC INNOVATIONS IN CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT Software Engineering The Current Practice VACLAV RAJLICH 0\ CRC Press Taylor & Francis Group Boca Raton London New York CRC

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Intelligence and. Vivek Kaie

Intelligence and. Vivek Kaie Enterprise Performance Intelligence and Decision Patterns Vivek Kaie /0\ CRC Press \CtJ Taylor & Francis Croup V- 'S Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an

More information

Introduction to Algorithms in Computational Biology Lecture 1

Introduction to Algorithms in Computational Biology Lecture 1 Introduction to Algorithms in Computational Biology Lecture 1 Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001. This class has been edited from

More information

12/8/09 Comp 590/Comp Fall

12/8/09 Comp 590/Comp Fall 12/8/09 Comp 590/Comp 790-90 Fall 2009 1 One of the first, and simplest models of population genealogies was introduced by Wright (1931) and Fisher (1930). Model emphasizes transmission of genes from one

More information

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11 Proteomics, functional genomics, and systems biology Biosciences 741: Genomics Fall, 2013 Week 11 1 Figure 6.1 The future of genomics Functional Genomics The field of functional genomics represents the

More information

Park /12. Yudin /19. Li /26. Song /9

Park /12. Yudin /19. Li /26. Song /9 Each student is responsible for (1) preparing the slides and (2) leading the discussion (from problems) related to his/her assigned sections. For uniformity, we will use a single Powerpoint template throughout.

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University Course Information Instructor: Luay Nakhleh (nakhleh@rice.edu); office hours by appointment (office: DH 3119) TA: Leo Elworth (DH

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool 14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared

More information

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized 1 2 3 Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized medicine, risk assessment etc Public Health Bio

More information

GREG GIBSON SPENCER V. MUSE

GREG GIBSON SPENCER V. MUSE A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.

More information

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa CSE/Beng/BIMM 182: Biological Data Analysis Instructor: Vineet Bafna TA: Nitin Udpa Today We will explore the syllabus through a series of questions? Please ASK All logistical information will be given

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Course Competencies Template - Form 112

Course Competencies Template - Form 112 Course Competencies Template - Form 112 GENERAL INFORMATION Name: Drs. Susan Neimand and Edwin Ginés- Candelaria Course Prefix/Number: PCB 3060 Number of Credits: 3 Degree Type Phone #: (305) 237-6152,

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. Taysir Hassan Abdel Hamid Lecturer, Information Systems Department Faculty of Computer and Information Assiut University taysirhs@aun.edu.eg taysir_soliman@hotmail.com

More information

An Introduction to Population Genetics

An Introduction to Population Genetics An Introduction to Population Genetics THEORY AND APPLICATIONS f 2 A (1 ) E 1 D [ ] = + 2M ES [ ] fa fa = 1 sf a Rasmus Nielsen Montgomery Slatkin Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari Introduction to Cellular Biology and Bioinformatics Farzaneh Salari Outline Bioinformatics Cellular Biology A Bioinformatics Problem What is bioinformatics? Computer Science Statistics Bioinformatics Mathematics...

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not

More information

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305 MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305 UNIT-1 MARINE GENOMICS AND PROTEOMICS 1. Define genomics? 2. Scope and functional genomics? 3. What is Genetics? 4. Define functional genomics? 5. What

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology Bioinformatics Support of Genome Sequencing Projects Seminar in biology Introduction The Big Picture Biology reminder Enzyme for DNA manipulation DNA cloning DNA mapping Sequencing genomes Alignment of

More information

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score Midterm 1 Results 10 Midterm 1 Akey/ Fields Median - 69 8 Number of Students 6 4 2 0 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 Exam Score Quick review of where we left off Parental type: the

More information

Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS

Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS The Bioinformatics book covers new topics in the rapidly

More information

Introduction to Molecular Biology

Introduction to Molecular Biology Introduction to Molecular Biology Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 2-1- Important points to remember We will study: Problems from bioinformatics. Algorithms used to solve

More information

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview Bioinformatics Some selected examples... and a bit of an overview Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July 19, 2007 @ EnviroHealth Connections Bioinformatics and

More information

CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT

CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT 92 CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT 4.1 INTRODUCTION The major tasks of pattern classification in the given DNA sample, query pattern searching in the target database

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm

Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm Zhongneng Xu * Yayun Yang Beibei Huang, From the Department of Ecology, Jinan University, Guangzhou 510632,

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

Computational Haplotype Analysis: An overview of computational methods in genetic variation study Computational Haplotype Analysis: An overview of computational methods in genetic variation study Phil Hyoun Lee Advisor: Dr. Hagit Shatkay A depth paper submitted to the School of Computing conforming

More information

Lecture 7 Motif Databases and Gene Finding

Lecture 7 Motif Databases and Gene Finding Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Multiple Attribute Decision Making

Multiple Attribute Decision Making Multiple Attribute Decision Making M E T H O D S AND A P P L I C A T I O N S Gwo-Hshiung Tzeng Jih-Jeng Huang CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the

More information

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 ChIP-Seq Tools J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA or

More information

Methods and tools for exploring functional genomics data

Methods and tools for exploring functional genomics data Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind

More information

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1

More information

ChIP-seq analysis 2/28/2018

ChIP-seq analysis 2/28/2018 ChIP-seq analysis 2/28/2018 Acknowledgements Much of the content of this lecture is from: Furey (2012) ChIP-seq and beyond Park (2009) ChIP-seq advantages + challenges Landt et al. (2012) ChIP-seq guidelines

More information

Single alignment: FASTA. 17 march 2017

Single alignment: FASTA. 17 march 2017 Single alignment: FASTA 17 march 2017 FASTA is a DNA and protein sequence alignment software package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985.[1] FASTA is pronounced

More information

Computational Genomics. Ron Shamir & Roded Sharan Fall

Computational Genomics. Ron Shamir & Roded Sharan Fall Computational Genomics Ron Shamir & Roded Sharan Fall 2012-13 Bioinformatics The information science of biology: organize, store, analyze and visualize biological data Responds to the explosion of biological

More information

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these

More information

Motif Discovery in Biological Sequences

Motif Discovery in Biological Sequences San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 Motif Discovery in Biological Sequences Medha Pradhan San Jose State University Follow this and

More information

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze Marker types Potato Association of America Frederiction August 9, 2009 Allen Van Deynze Use of DNA Markers in Breeding Germplasm Analysis Fingerprinting of germplasm Arrangement of diversity (clustering,

More information

NEXT GENERATION SEQUENCING. Farhat Habib

NEXT GENERATION SEQUENCING. Farhat Habib NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp

More information

A. I think it is DNA or RNA (circle your answer) because: B. I think it is DNA or RNA (circle your answer) because:

A. I think it is DNA or RNA (circle your answer) because: B. I think it is DNA or RNA (circle your answer) because: Name: Test Date: Block: Biology I: Unit 7 Molecular Genetics and Biotechnology Review for Unit Test Directions: You should use this as a guide to help you study for your test. You should also read through

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Sequence Alignment and Phylogenetic Tree Construction of Malarial Parasites

Sequence Alignment and Phylogenetic Tree Construction of Malarial Parasites 72 Sequence Alignment and Phylogenetic Tree Construction of Malarial Parasites Sk. Mujaffor 1, Tripti Swarnkar 2, Raktima Bandyopadhyay 3 M.Tech (2 nd Yr.), ITER, S O A University mujaffor09 @ yahoo.in

More information

Identifying Regulatory Regions using Multiple Sequence Alignments

Identifying Regulatory Regions using Multiple Sequence Alignments Identifying Regulatory Regions using Multiple Sequence Alignments Prerequisites: BLAST Exercise: Detecting and Interpreting Genetic Homology. Resources: ClustalW is available at http://www.ebi.ac.uk/tools/clustalw2/index.html

More information

ChIP. November 21, 2017

ChIP. November 21, 2017 ChIP November 21, 2017 functional signals: is DNA enough? what is the smallest number of letters used by a written language? DNA is only one part of the functional genome DNA is heavily bound by proteins,

More information

Genetics and Bioinformatics

Genetics and Bioinformatics Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s

More information

Molecular Evolution. Wen-Hsiung Li

Molecular Evolution. Wen-Hsiung Li Molecular Evolution Wen-Hsiung Li INTRODUCTION Molecular Evolution: A Brief History of the Pre-DNA Era 1 CHAPTER I Gene Structure, Genetic Codes, and Mutation 7 CHAPTER 2 Dynamics of Genes in Populations

More information

College- and Career Readiness Standards for Science Genetics

College- and Career Readiness Standards for Science Genetics College- and Career Readiness Genetics Mississippi 2018 GEN.1 Structure and Function of DNA GEN.1A Students will demonstrate that all cells contain genetic material in the form of DNA. GEN.1A.1 Model the

More information

ReCombinatorics. The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination. Dan Gusfield

ReCombinatorics. The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination. Dan Gusfield ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination! Dan Gusfield NCBS CS and BIO Meeting December 19, 2016 !2 SNP Data A SNP is a Single Nucleotide Polymorphism

More information

RNA-SEQUENCING ANALYSIS

RNA-SEQUENCING ANALYSIS RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS

More information

COURSE OUTLINE Biology 103 Molecular Biology and Genetics

COURSE OUTLINE Biology 103 Molecular Biology and Genetics Degree Applicable I. Catalog Statement COURSE OUTLINE Biology 103 Molecular Biology and Genetics Glendale Community College November 2014 Biology 103 is an extension of the study of molecular biology,

More information

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze SolCAP Solanaceae Coordinated Agricultural Project Supported by the National Research Initiative Plant Genome Program of USDA CSREES for the Improvement of Potato and Tomato Executive Commitee : David

More information

Klinisk kemisk diagnostik BIOINFORMATICS

Klinisk kemisk diagnostik BIOINFORMATICS Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,

More information

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler High-Throughput Bioinformatics: Re-sequencing and de novo assembly Elena Czeizler 13.11.2015 Sequencing data Current sequencing technologies produce large amounts of data: short reads The outputted sequences

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dan Lopresti Computer Science and Engineering Office PL 350 dal9@lehigh.edu BioS 10 November 2018 Slide 1 Last year when I gave this talk BioS 10 November 2018 Slide 2 Motivation

More information

Bioinformatics Practical Course. 80 Practical Hours

Bioinformatics Practical Course. 80 Practical Hours Bioinformatics Practical Course 80 Practical Hours Course Description: This course presents major ideas and techniques for auxiliary bioinformatics and the advanced applications. Points included incorporate

More information

Motivation From Protein to Gene

Motivation From Protein to Gene MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein

More information

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality

More information

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,

More information