Ana Teresa Freitas 2016/2017
|
|
- Kathlyn Kelley
- 6 years ago
- Views:
Transcription
1 Finding Regulatory Motifs in DNA Sequences Ana Teresa Freitas 2016/2017
2 Combinatorial Gene Regulation A recent microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed How can one gene have such drastic effects?
3 Regulatory Proteins Gene X encodes regulatory protein, a.k.a. a transcription factor (TF) The 20 unexpressed genes rely on gene X s TF to induce transcription A single TF may regulate multiple genes TFs influence gene expression by binding to a specific location in the respective gene s regulatory region
4 Regulatory Regions Every gene contains a regulatory region (RR) typically stretching bp upstream of the transcriptional start site Located within the RR are the Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor
5 Transcription Factor Binding Sites A TFBS can be located anywhere within the Regulatory Region (RR). For a single TF to regulate multiple genes, those genes RRs must contain corresponding TFBS TFBS may vary slightly across different regulatory regions since non-essential bases could mutate
6 Motif Logo Motifs can mutate on non important bases The five motifs at top right have mutations in position 3 and 5 Representations called motif logos illustrate the conserved regions of a motif TGGGGGA TGAGAGA TGGGGGA TGAGAGA TGAGGGA
7 Motif Logos: An Example (
8 Motifs and Transcriptional Start Sites ATCCCG TTCCGG ATCCCG gene gene gene ATGCCG gene ATGCCC gene
9 Identifying Motifs Genes are turned on or off by regulatory proteins These proteins bind to upstream regulatory regions of genes to either attract or block an RNA polymerase Regulatory protein X binds to a short DNA sequence called a motif So finding the same motif in multiple genes regulatory regions suggests a regulatory relationship amongst those genes
10 Identifying Motifs: Complications We do not know the motif sequence We do not know where it is located relative to the genes start Motifs can differ slightly from one gene to the next How to discern it from random motifs?
11 The Motif Finding Problem Given a random sample of DNA sequences: cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtacgtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaacgtacgtc Find the pattern that is implanted in each of the individual arrays, namely, the motif
12 The Motif Finding Problem (cont d) Additional information: The hidden sequence is of length 8 The pattern is not exactly the same in each array because random point mutations may occur in the sequences
13 The Motif Finding Problem (cont d) The patterns revealed with no mutations: cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtacgtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaacgtacgtc acgtacgt Consensus String
14 The Motif Finding Problem (cont d) The patterns with 2 point mutations: cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc
15 The Motif Finding Problem (cont d) The problem: Can we still find the motif, now that we have 2 mutations? What is the consensus sequence?
16 Defining Motifs To define a motif, lets say we know where the motif starts in the sequence The motif start positions in their sequences can be represented as s = (s 1,s 2,s 3,,s t )
17 Motifs: Profiles and Consensus Alignment a G g t a c T t C c A t a c g t a c g t T A g t a c g t C c A t C c g t a c g G A Profile C G T Line up the patterns by their start indexes s = (s 1, s 2,, s t ) Construct matrix profile with frequencies of each nucleotide in columns Consensus A C G T A C G T Consensus nucleotide in each position has the highest score in column
18 Consensus Consensus sequences help in finding motifs Think of consensus as an ancestor motif, from which mutated motifs emerged The distance between a real motif and the consensus sequence is generally less than that for two real motifs
19 Consensus (cont d)
20 Positional weight matrix (PWM) Motifs can be summarized in a sequence probability matrix PWM, Position specific scoring matrix (PSSM), or motif For example, a 7-mer binding site: Pos A C G T
21 Log-likelihood matrix PWM can be transformed into a log-likelihood matrix by dividing each entry by the background probability of the corresponding base and taking log of it. For instance, if the background probability of T is 0.3 then PWM(T,1) = ln(0.35/0.3).
22 Motif Logos Information and entropy Conserved amino acid regions contain high degree of information (high order == low entropy) Variable amino acid regions contain low degree of information (low order == high entropy) Shannon Entropy (DNA) D(i) = 2 + Σ k={a,c,g,t} P k (i)log 2 P k (i) The 2 is from log 2 ( A ); A is the number of elements in A (Alphabet), (A=4 for DNA) P k (i) is the probability of observing base k in position i
23 Motif Logos For a position with nucleotide probabilities P = 1/4, the information content is zero D(i) = 2 + 1/4 log2(1/4) + 1/4 log2(1/4) + 1/4 log2(1/4) + 1/4 log2(1/4) = 0 The size of each base printed in the logo is determined by multiplying the frequency of that base by the total information at that position Height of base k at position l = P k (l) D(l)
24 Vizualization Bases are stacked on top of each other in increasing order of their frequencies
25 Evaluating Motifs We have a guess about the consensus sequence, but how good is this consensus? Need to introduce a scoring function to compare different guesses and choose the best one.
26 Defining Some Terms t - number of sample DNA sequences n - length of each DNA sequence DNA - sample of DNA sequences (t x n array) l - length of the motif (l-mer) s i - starting position of an l-mer in sequence i s=(s 1, s 2, s t ) - array of motif s starting positions
27 Parameters In our sample sequence: l = 8 DNA cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatccatacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc t=5 aaacgttagtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtccatataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaccgtacggc n = 69 s s 1 = 26 s 2 = 21 s 3 = 3 s 4 = 56 s 5 = 60
28 Scoring Function Given s = (s 1, s t ) and DNA: Score(s,DNA) = l max i= 1 k { A, T, C, G} count( k, i) a G g t a c T t C c A t a c g t a c g t T A g t a c g t C c A t C c g t a c g G A C G T l t Consensus a c g t a c g t Score =30
29 The Motif Finding Problem If starting positions s=(s 1, s 2, s t ) are given, the problem is easy even with mutations in the sequences because we can simply construct the profile to find the motif (consensus) But the starting positions s are usually not given. How can we align the patterns and compute the best profile matrix?
30 The Motif Finding Problem: Formulation The Motif Finding Problem: Given a set of DNA sequences, find a set of l-mers, one from each sequence, that maximizes the consensus score Input: A t x n matrix of DNA, and l, the length of the pattern to find Output: An array of t starting positions s = (s 1, s 2, s t ) maximizing Score(s,DNA)
31 The Motif Finding Problem: Brute Force Solution Compute the scores for each possible combination of starting positions s The best score will determine the best profile and the consensus pattern in DNA The goal is to maximize Score(s,DNA) by varying the starting positions s i, where: s i = [1,, n-l+1] i = [1,, t]
32 Brute Force Approach: Running Time Varying (n - l + 1) positions in each of t sequences, we re looking at (n - l + 1) t sets of starting positions For each set of starting positions, the scoring function makes l operations, so complexity is l (n l + 1) t = O(l n t )
33 Pseudocode for Brute Force Motif Search 1. BruteForceMotifSearch(DNA, t, n,l) 2. bestscore ß 0 3. for each s=(s 1,s 2,..., s t ) from (1,1... 1) to (n-l+1,..., n-l+1) 4. if (Score(s,DNA) > bestscore) 5. bestscore ß Score(s, DNA) 6. bestmotif ß (s 1,s 2,..., s t ) 7. return bestmotif
34 Running Time of BruteForceMotifSearch That means that for t = 8, n = 1000, l = 10 Must perform 7.322E+25 computations Assuming each computation takes a cycle on a 3 GHz CPU, it would take 7.33 billion years to search all the possibilities This algorithm is not practical Lets explore some ways to speed it up
35 Some Motif Finding Programs CONSENSUS Hertz, Stromo (1989) GibbsDNA Lawrence et al (1993) MEME Bailey, Elkan (1995) RandomProjections Buhler, Tompa (2002) MULTIPROFILER Keich, Pevzner (2002) MITRA Eskin, Pevzner (2002) Pattern Branching Price, Pevzner (2003) RISO Carvalho et al (2006) MUSA Mendes at al (2006)
Finding Regulatory Motifs in DNA Sequences. Bioinfo I (Institut Pasteur de Montevideo) Algorithm Techniques -class2- July 12th, / 75
Finding Regulatory Motifs in DNA Sequences Bioinfo I (Institut Pasteur de Montevideo) Algorithm Techniques -class2- July 12th, 2011 1 / 75 Outline Implanting Patterns in Random Text Gene Regulation Regulatory
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Gene Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Gene Prediction Now What? Suppose we want to annotate a genome according to genetic traits. Given a genome, where are the genes? Given
More informationCSC 2427: Algorithms in Molecular Biology Lecture #14
CSC 2427: Algorithms in Molecular Biology Lecture #14 Lecturer: Michael Brudno Scribe Note: Hyonho Lee Department of Computer Science University of Toronto 03 March 2006 Microarrays Revisited In the last
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 What we have
More informationCharacterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057
Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Reviewing sites: affinity and specificity representation binding
More informationReveal Motif Patterns from Financial Stock Market
Reveal Motif Patterns from Financial Stock Market Prakash Kumar Sarangi Department of Information Technology NM Institute of Engineering and Technology, Bhubaneswar, India. Prakashsarangi89@gmail.com.
More informationMotif Finding: Summary of Approaches. ECS 234, Filkov
Motif Finding: Summary of Approaches Lecture Outline Flashback: Gene regulation, the cis-region, and tying function to sequence Motivation Representation simple motifs weight matrices Problem: Finding
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationChIP. November 21, 2017
ChIP November 21, 2017 functional signals: is DNA enough? what is the smallest number of letters used by a written language? DNA is only one part of the functional genome DNA is heavily bound by proteins,
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationSequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing
Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence
More informationLecture 7: April 7, 2005
Analysis of Gene Expression Data Spring Semester, 2005 Lecture 7: April 7, 2005 Lecturer: R.Shamir and C.Linhart Scribe: A.Mosseri, E.Hirsh and Z.Bronstein 1 7.1 Promoter Analysis 7.1.1 Introduction to
More informationMethods and tools for exploring functional genomics data
Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for
More informationFunctional Genomics and Motif Finding
Functional Genomics and Motif Finding 1 Outline Implanting Patterns in Random Text Gene Regulation Regulatory Motifs The Gold Bug Problem The Motif Finding Problem Brute Force Motif Finding The Median
More informationÜbung V. Einführung, Teil 1. Transktiptionelle Regulation TFBS
Übung V Einführung, Teil 1 Transktiptionelle Regulation TFBS Transcription Factors These proteins promote transcription 1. Bind DNA 2. Activate Transcription These two functions usually reside on separate
More informationSequence Motif Analysis
Sequence Motif Analysis Lecture in M.Sc. Biomedizin, Module: Proteinbiochemie und Bioinformatik Jonas Ibn-Salem Andrade group Johannes Gutenberg University Mainz Institute of Molecular Biology March 7,
More informationMotif Search CMSC 423
Motif Search CMSC 423 Central Dogma of Biology proteins Translation mrna (T U) Transcription Genome DNA = double-stranded, linear molecule each strand is string over {A,C,G,T} strands are complements of
More informationCS273B: Deep learning for Genomics and Biomedicine
CS273B: Deep learning for Genomics and Biomedicine Lecture 2: Convolutional neural networks and applications to functional genomics 09/28/2016 Anshul Kundaje, James Zou, Serafim Batzoglou Outline Anatomy
More informationHomework 4. Due in class, Wednesday, November 10, 2004
1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors
More informationFile S1. Program overview and features
File S1 Program overview and features Query list filtering. Further filtering may be applied through user selected query lists (Figure. 2B, Table S3) that restrict the results and/or report specifically
More informationCOMBINATORIAL MOTIF ANALYSIS IN YEAST GENE PROMOTERS: THE BENEFITS OF A BIOLOGICAL CONSIDERATION OF MOTIFS. A Thesis KEVIN L.
COMBINATORIAL MOTIF ANALYSIS IN YEAST GENE PROMOTERS: THE BENEFITS OF A BIOLOGICAL CONSIDERATION OF MOTIFS A Thesis by KEVIN L. CHILDS Submitted to the Office of Graduate Studies of Texas A&M University
More informationNovel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites
Novel Motif Detection Algorithms for Finding Protein-Protein Interaction Sites January Wisniewski MS in Computer Information System Engineering Advisor: Dr. Chen College of Engineering, Department of Computer
More informationSequence logos for DNA sequence alignments
Sequence logos for DNA sequence alignments Oliver Bembom Division of Biostatistics, University of California, Berkeley October, 202 Introduction An alignment of DNA or amino acid sequences is commonly
More informationIntroduction to Transcription Factor Binding Sites (TFBS) Cells control the expression of genes using Transcription Factors.
Identification of Functional Transcription Factor Binding Sites using Closely Related Saccharomyces species Scott W. Doniger 1, Juyong Huh 2, and Justin C. Fay 1,2 1 Computation Biology Program and 2 Department
More informationComputational Investigation of Gene Regulatory Elements. Ryan Weddle Computational Biosciences Internship Presentation 12/15/2004
Computational Investigation of Gene Regulatory Elements Ryan Weddle Computational Biosciences Internship Presentation 12/15/2004 1 Table of Contents Introduction.... 3 Goals..... 9 Methods.... 12 Results.....
More informationProblem Set #2
20.320 Problem Set #2 Due on September 30rd, 2011 at 11:59am. No extensions will be granted. General Instructions: 1. You are expected to state all of your assumptions, and provide step-by-step solutions
More informationAnnotating the Genome (H)
Annotating the Genome (H) Annotation principles (H1) What is annotation? In general: annotation = explanatory note* What could be useful as an annotation of a DNA sequence? an amino acid sequence? What
More informationLecture 8: June 14, 2007
Analysis of Gene Expression Data Spring Semester, 2007 Lecture 8: June 14, 2007 Lecturer: R.Shamir and C.Linhart Scribe: O.Ish-Shalom, G.Tannenbaum 1 8.1 Promoter analysis 8.1.1 Control of gene expression
More informationAccelerating Motif Finding in DNA Sequences with Multicore CPUs
Accelerating Motif Finding in DNA Sequences with Multicore CPUs Pramitha Perera and Roshan Ragel, Member, IEEE Abstract Motif discovery in DNA sequences is a challenging task in molecular biology. In computational
More informationLearning Methods for DNA Binding in Computational Biology
Learning Methods for DNA Binding in Computational Biology Mark Kon Dustin Holloway Yue Fan Chaitanya Sai Charles DeLisi Boston University IJCNN Orlando August 16, 2007 Outline Background on Transcription
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015
ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA
More informationVL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch
VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Vorlesungsthemen Part 1: Background
More informationChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind
More informationChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015
ChIP-Seq Tools J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA or
More information03-511/711 Computational Genomics and Molecular Biology, Fall
03-511/711 Computational Genomics and Molecular Biology, Fall 2010 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics
More informationA novel swarm intelligence algorithm for finding DNA motifs. Chengwei Lei and Jianhua Ruan*
Int. J. Computational Biology and Drug Design, Vol. 2, No. 4, 2009 323 A novel swarm intelligence algorithm for finding DNA motifs Chengwei Lei and Jianhua Ruan* Department of Computer Science, The University
More informationProtein Architecture: Conserved Functional Domains
PROTOCOL Protein Motif Analysis compiled by John R. Finnerty Protein Architecture: Conserved Functional Domains Proteins are like machines in that different parts of the protein perform different sub-functions,
More informationMotif Discovery in Biological Sequences
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 Motif Discovery in Biological Sequences Medha Pradhan San Jose State University Follow this and
More informationCS 5854: Predictive Models of Transcriptional Regulatory Networks
CS 5854: Predictive Models of Transcriptional Regulatory Networks T. M. Murali February 28, Mar 5, 7, 2013 Predicting Transcriptional Control Networks 1. Integrated biclustering of heterogeneous genome-wide
More informationImprovement of TRANSFAC Matrices Using Multiple Local Alignment of Transcription Factor Binding Site Sequences
68 Genome Informatics 16(1): 68 72 (2005) Improvement of TRANSFAC Matrices Using Multiple Local Alignment of Transcription Factor Binding Site Sequences Yutao Fu 1 Zhiping Weng 1,2 bibin@bu.edu zhiping@bu.edu
More information1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).
1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna). Which statement best describes the error shown in the diagram? (A) The mrna strand contains the uracil
More informationYear III Pharm.D Dr. V. Chitra
Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves
More informationStalking the Genetic Basis of a Trait
INTRODUCTION The short film Popped Secret: The Mysterious Origin of Corn describes how the evolution of corn was mostly a mystery until George Beadle proposed a bold new hypothesis in 1939: corn evolved
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationReview Model I Model II Model III
Maximum Likelihood Estimation & Expectation Maximization Lectures 3 Oct 5, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson
More informationMCAT: Motif Combining and Association Tool
MCAT: Motif Combining and Association Tool Yanshen Yang Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree
More informationLearning Bayesian Network Models of Gene Regulation
Learning Bayesian Network Models of Gene Regulation CIBM Retreat October 3, 2003 Keith Noto Mark Craven s Group University of Wisconsin-Madison CIBM Retreat 2003 Poster Session p.1/18 Abstract Our knowledge
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 4,100 116,000 120M Open access books available International authors and editors Downloads Our
More informationDiscovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks
Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Reesab Pathak Dept. of Computer Science Stanford University rpathak@stanford.edu Abstract Transcription factors are
More informationSSA Signal Search Analysis II
SSA Signal Search Analysis II SSA other applications - translation In contrast to translation initiation in bacteria, translation initiation in eukaryotes is not guided by a Shine-Dalgarno like motif.
More informationXPRIME-EM: Eliciting Expert Prior Information for Motif Exploration Using the Expectation- Maximization Algorithm
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2012-06-22 XPRIME-EM: Eliciting Expert Prior Information for Motif Exploration Using the Expectation- Maximization Algorithm Wei
More informationProfile HMMs. 2/10/05 CAP5510/CGS5166 (Lec 10) 1 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END
Profile HMMs START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END 2/10/05 CAP5510/CGS5166 (Lec 10) 1 Profile HMMs with InDels Insertions Deletions Insertions & Deletions DELETE 1 DELETE 2 DELETE 3
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More information03-511/711 Computational Genomics and Molecular Biology, Fall
03-511/711 Computational Genomics and Molecular Biology, Fall 2011 1 Study questions These study problems are intended to help you to review for the final exam. This is not an exhaustive list of the topics
More informationGenBank Growth. In 2003 ~ 31 million sequences ~ 37 billion base pairs
Gene Finding GenBank Growth GenBank Growth In 2003 ~ 31 million sequences ~ 37 billion base pairs GenBank: Exponential Growth Growth of GenBank in billions of base pairs from release 3 in April of 1994
More informationMotifs. BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin
Motifs BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin An example transcriptional regulatory cascade Here, controlling Salmonella bacteria multidrug resistance Sequencespecific
More informationLecture 5: Regulation
Machine Learning in Computational Biology CSC 2431 Lecture 5: Regulation Instructor: Anna Goldenberg Central Dogma of Biology Transcription DNA RNA protein Process of producing RNA from DNA Constitutive
More informationAdmission Exam for the Graduate Course in Bioinformatics. November 17 th, 2017 NAME:
1 Admission Exam for the Graduate Course in Bioinformatics November 17 th, 2017 NAME: This exam contains 30 (thirty) questions divided in 3 (three) areas (maths/statistics, computer science, biological
More informationVideos. Lesson Overview. Fermentation
Lesson Overview Fermentation Videos Bozeman Transcription and Translation: https://youtu.be/h3b9arupxzg Drawing transcription and translation: https://youtu.be/6yqplgnjr4q Objectives 29a) I can contrast
More informationScoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein
Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course logistics Genomes (so many genomes) The computational bottleneck Python: Programs, input and output Number and
More informationMotif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana
Motif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim Indiana University, School of Informatics
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationLimitations and potentials of current motif discovery algorithms
Published online September 2, 2005 Limitations and potentials of current motif discovery algorithms Jianjun Hu 1,2, Bin Li 2 and Daisuke Kihara 1,2,3,4, * Nucleic Acids Research, 2005, Vol. 33, No. 15
More informationVideos. Bozeman Transcription and Translation: Drawing transcription and translation:
Videos Bozeman Transcription and Translation: https://youtu.be/h3b9arupxzg Drawing transcription and translation: https://youtu.be/6yqplgnjr4q Objectives 29a) I can contrast RNA and DNA. 29b) I can explain
More informationLecture 10. Ab initio gene finding
Lecture 10 Ab initio gene finding Uses of probabilistic sequence Segmentation models/hmms Multiple alignment using profile HMMs Prediction of sequence function (gene family models) ** Gene finding ** Review
More informationIntroduction. CS482/682 Computational Techniques in Biological Sequence Analysis
Introduction CS482/682 Computational Techniques in Biological Sequence Analysis Outline Course logistics A few example problems Course staff Instructor: Bin Ma (DC 3345, http://www.cs.uwaterloo.ca/~binma)
More informationThemes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!
Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic
More informationA niched Pareto genetic algorithm for finding variable length regulatory motifs in DNA sequences
3 Biotech (2012) 2:141 148 DOI 10.1007/s13205-011-0040-6 ORIGINAL ARTICLE A niched Pareto genetic algorithm for finding variable length regulatory motifs in DNA sequences Shripal Vijayvargiya Pratyoosh
More informationTranscription. By : Lucia Dhiantika Witasari M.Biotech., Apt
Transcription By : Lucia Dhiantika Witasari M.Biotech., Apt REGULATION OF GENE EXPRESSION 11/26/2010 2 RNA Messenger RNAs (mrnas) encode the amino acid sequence of one or more polypeptides specified by
More informationGenomic and bioinformatics resources
Genomic and bioinformatics resources 徐唯哲 Paul Wei-Che HSU Assistant Research Specialist Bioinformatics Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C. 1 What Bioinformatics Can Do
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics
A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these
More informationHybrid Gibbs-Sampling Algorithm for Challenging Motif Discovery: GibbsDST
Genome Informatics 17(2): 3{13 (2006) 3 Hybrid Gibbs-Sampling Algorithm for Challenging Motif Discovery: GibbsDST Kazuhito Shida shida@cir.tohoku.ac.jp TUBERO (Tohoku University Biomedical Engineering
More informationRNA-Seq Now What? BIS180L Professor Maloof May 24, 2018
RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018 We have differentially expressed genes, what do we want to know about them? We have differentially expressed genes, what do we want to know about
More informationECS 234: Genomic Data Integration ECS 234
: Genomic Data Integration Heterogeneous Data Integration DNA Sequence Microarray Proteomics >gi 12004594 gb AF217406.1 Saccharomyces cerevisiae uridine nucleosidase (URH1) gene, complete cds ATGGAATCTGCTGATTTTTTTACCTCACGAAACTTATTAAAACAGATAATTTCCCTCATCTGCAAGGTTG
More informationChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland
ChIP-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? Short introduction to ChIP-seq Analyzing ChIP-seq data Central concepts Analysis
More informationIdentifying Regulatory Regions using Multiple Sequence Alignments
Identifying Regulatory Regions using Multiple Sequence Alignments Prerequisites: BLAST Exercise: Detecting and Interpreting Genetic Homology. Resources: ClustalW is available at http://www.ebi.ac.uk/tools/clustalw2/index.html
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationFinding motifs in the twilight zone
Finding motifs in the twilight zone Uri Keich Department of Computer Science and Engineering University of California San Diego La Jolla, CA 92093, USA keich@cs.ucsd.edu Pavel A. Pevzner Department of
More informationThe String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.
Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif
More informationOutline. 1. Introduction. 2. Exon Chaining Problem. 3. Spliced Alignment. 4. Gene Prediction Tools
Outline 1. Introduction 2. Exon Chaining Problem 3. Spliced Alignment 4. Gene Prediction Tools Section 1: Introduction Similarity-Based Approach to Gene Prediction Some genomes may be well-studied, with
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationOutline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation
Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project
More informationGene Prediction in Eukaryotes
Gene Prediction in Eukaryotes Jan-Jaap Wesselink Biomol Informatics, S.L. jjw@biomol-informatics.com June 2010/Madrid jjw@biomol-informatics.com (BI) Gene Prediction June 2010/Madrid 1 / 34 Outline 1 Gene
More informationThe Next Generation of Transcription Factor Binding Site Prediction
The Next Generation of Transcription Factor Binding Site Prediction Anthony Mathelier*, Wyeth W. Wasserman* Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department
More informationUnit 6 DNA ppt 3 Gene Expression and Mutations Chapter 8.6 & 8.7 pg
Unit 6 DNA ppt 3 Gene Expression and Mutations Chapter 8.6 & 8.7 pg 248-255 Which genes are transcribed on the chromosomes are carefully regulated at many points. Watch this! https://www.youtube.com/watch?v=oewozs_jtgk
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationExtraction of Hidden Markov Model Representations of Signal Patterns in. DNA Sequences
686 Extraction of Hidden Markov Model Representations of Signal Patterns in. DNA Sequences Tetsushi Yada The Japan Information Center of Science and Technology (JICST) 5-3 YonbancllO, Clliyoda-ku, Tokyo
More informationFrom Promoter Sequence to Expression: A Probabilistic Framework
From Promoter Sequence to Expression: A Probabilistic Framework Eran Segal omputer Science Department Stanford University Stanford, A 94305-9010 eran@cs.stanford.edu Nir Friedman School of omputer Science
More informationFinding subtle motifs with variable gaps in unaligned DNA sequences
Computer Methods and Programs in Biomedicine 70 (2003) 11 20 www.elsevier.com/locate/cmpb Finding subtle motifs with variable gaps in unaligned DNA sequences Yuh-Jyh Hu * Computer and Information Science
More informationFinding Patterns in Biological Sequences
Finding Patterns in Biological Sequences Broňa Brejová, Chrysanne DiMarco, Tomáš Vinař Department of Computer Science University of Waterloo Gina Holguin, Cheryl Patten Department of Biology University
More informationIn silico representation and discovery of transcription factor binding sites Giulio Pavesi, Giancarlo Mauri and Graziano Pesole
Giulio Pavesi is assistant professor of Computer Science at the University of Milan. His research interests are mainly focused on bioinformatics in general, and regulatory motif discovery in particular.
More informationLogoBar: Bar graph visualization of protein logos with gaps. Karolinska Institutet, and Dept. of Life Sciences, Södertörns högskola,
Bioinformatics Advance Access published November 3, 2005 The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
More informationComputational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq
Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne Data flow in ChIP-Seq data analysis Level 1:
More informationBayesian Variable Selection and Data Integration for Biological Regulatory Networks
Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu Gary
More informationSupplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.
Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Wenxiu Ma 1, Lin Yang 2, Remo Rohs 2, and William Stafford Noble 3 1 Department of Statistics,
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationComputational gene finding. Devika Subramanian Comp 470
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationDiscovering Patterns from Sequences with Applications to Protein-Protein and Protein-DNA Interaction
Discovering Patterns from Sequences with Applications to Protein-Protein and Protein-DNA Interaction by Ho Yin Sze-To A thesis presented to the University of Waterloo in fulfillment of the thesis requirement
More information