Protein function prediction using sequence motifs: A research proposal
|
|
- Harold Nelson Hampton
- 5 years ago
- Views:
Transcription
1 Protein function prediction using sequence motifs: A research proposal Asa Ben-Hur Abstract Protein function prediction, i.e. classification of protein sequences according to their biological function is an important task in bioinformatics. We propose a study of function prediction that uses protein sequence motifs as features; the result will be a system with state-of-the-art performance in function prediction, that provides interpretable results that can help in providing biological insight on the relationships between different classes of proteins. As a proof of concept we will focus on enzymes, which have a well developed system of classification, namely the Enzyme Commission (EC) numbering system. 1 Background Advances in DNA sequencing are yielding a wealth of sequenced genomes. And yet, understanding the function of the proteins coded by a specific genome is still an unsolved problem. The determination of the function of genes and gene products is performed mainly on the basis of sequence similarity (homology) [1]. Comparing entire proteomes to a database of known sequences leaves the function of a large percentage of genes undetermined: close to 40% of the known human genes do not have a functional classification by homology [2, 3]. Most homology-based methods depend on either global similarity to a known, closely related protein, or homology to a family of known proteins using BLAST, PSI-BLAST, profiles, or HMM methods [4, 5, 6]. The function of genes that have arisen recently by exon shuffling or with no closely related ortholog remains undetermined by these approaches [7]. Motifs on the other hand, represent short, highly conserved regions of proteins. Very often these motifs correspond to functional regions of a protein catalytic sites, binding sites, structural motifs etc. The presence of such protein motifs often reveals important clues to a protein s role even if it is not globally similar to any known protein. The motifs for most catalytic sites and binding sites are conserved over much larger taxonomic distances and evolutionary time than the rest of the sequence. However, a single motif is often not sufficient to determine the function of a protein. The catalytic site or binding site of a protein might be composed of several regions that are not contiguous in sequence, but are close in the folded protein structure (for example in serine proteases). In addition, a motif representing a binding site might be common to several protein families that bind the same substrate. Therefore, a pattern
2 of motifs is required in general to classify a protein into a certain family of proteins. Manually constructed fingerprints are provided by the PRINTS database [8]. In this proposal we suggest an automatic method for the construction of such fingerprints, and to do this in a discriminative manner using tools of supervised learning. Most motif methods extract conserved elements (blocks) from multiple sequence alignments of groups of proteins. These conserved elements are then represented as either discrete sequence motifs, Position Specific Scoring Matrices (PSSMs), or profiles. In this proposal we focus on discrete ungapped motifs and PSSMs. There are several databases of blocks/motifs that are publicly available, including PROSITE, BLOCKS+, PRINTS, and emotif [9, 10, 8, 11, 12]. The use of a motif database as the basis for constructing a classifier will enable us to compare the merit of different databases and methods for constructing motifs, according to how well the resulting classifier performs. 2 Enzyme Classification Enzymes represent a substantial fraction of the proteins in the SwissProt database [13], and have a well established system of annotation. The current version of SwissProt contains over 35,000 enzymes. The function of an enzyme is specified by a name given to it by the Enzyme Commission (EC) [14]. The name corresponds to an EC number, which is of the form: n1.n2.n3.n4, e.g for alcohol oxidase. The first number is between 1 and 6, and indicates the general type of chemical reaction catalyzed by the enzyme; the main categories are oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. The remaining numbers have meanings that are particular to each category. Consider for example, the oxidoreductases (EC number starting with 1), which involve reactions in which hydrogen or oxygen atoms, or electrons are transferred between molecules. In these enzymes, n2 specifies the chemical group of the (electron) donor molecule, n3 specifies the (electron) acceptor, and n4 specifies the substrate. The EC classification system specifies over 750 enzyme names, and a particular protein can have several enzymatic activities. Therefore, as a machine learning classification problem, it is not a standard multi-class problem, since each pattern can have more than one class label; this type of problem is sometimes called a multi-label problem [15]. However, in preliminary experiments we performed, we found that this multi-label problem can be reduced to a regular multi-class problem by considering a group of enzyms that have several activities as a class by itself; for example, there are 22 enzymes that have EC numbers and , and these can be perfectly distinguished from enzymes with the single EC number using a classifier that uses the motif content of the proteins. When looking at the SwissProt database we then found that these two groups are indeed recognized as distinct.
3 3 Methods We propose to use the motif composition of a protein to define a similarity measure or kernel function that can be used with various kernel based classification methods such as Support Vector Machines (SVM). In a recent study we found that a bag of motifs representation of a protein sequence provides state of the art performance in detecting remote homologs [16]. The motif database we plan to use contains over a million discrete sequence motifs [12]. The motif composition vector of a protein sequence is therefore high dimensional, and is also very sparse, much like the bag of words representation of text documents. However, unlike textual data, motifs are highly predictive features: We found that using methods such as the Recursive Feature Elimination (RFE) [17], we could reduce the number of motifs to a few tens at the most, while keeping the same level of classification accuracy. A classifier based on a handful of features has the advantage of interpretability, and can provide biological insight on what differentiates different classes of enzymes from each other. During the development of the system we plan to benchmark various classifiers and feature selection methods. Recall that protein function prediction is a classification problem with a large number of classes (hundreds of classes in the EC classification alone). This poses a computational challenge when using a two-class classifier such as SVM: the oneagainst-one method, that appears to perform better than the one-against-rest method of multi-class classification [18], can be too computationally intensive for this application. However, in view of the sparsity of the data in the motif representation, for a given query, only a small number of classes are likely to show any degree of similarity. So in practice only a small number of candidate classes will need to be discriminated agains each other. The motifs in the database we use can be highly redundant in their pattern of occurrence in a group of proteins. Due to this level of redundancy simple feature selection methods that are based on ranking individual features, cannot be used in order to find a small subset of motifs. Wrapper methods such as RFE were checked to be effective in reducing the dimensionality, while maintaining the accuracy of the resulting classifier. Preliminary results of our experiments will be presented at the workshop on Feature Selection at NIPS Deliverables A suite of tools for classification of proteins sequences will be provided as modules for the PyML machine learning environment (current version available at asab/pyml.html. A web based interface for performing protein classification. For a given query the tool will show the component motifs in the query, the motifs that contributed to the classification of the motif and the pattern of occurrence of those motifs across different enzyme classes.
4 References [1] F.S. Domingues and T. Lengauer. Protein function from sequence and structure. Applied Bioinformatics, 2(1):3 12, [2] E.S. Lander, L.M. Linton, and B. Birren. Initial sequencing and analysis of the human genome. Nature, 409(6822): , [3] J.C. Venter, M.D. Adams, E.W. Myers, and P.W. Li. The sequence of the human genome. Science, 2901(16): , [4] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25: , [5] M. Gribskov, A.D. McLachlan, and D. Eisenberg. Profile analysis: Dectection of distantly related proteins. Proc. Natl. Acad. Sci. USA, 84: , [6] E.L. Sonnhammer, S.R. Eddy, and E. Birney. Pfam: multiple sequence alignments and hmm-profiles of protein domains. Nucleic Acids Research, 26(1): , [7] M. Ruvolo. Molecular phylogeny of the hominoids: inferences from multiple independent dna sequence data sets. Mol. Biol. Evol., 14(3): , [8] T.K. Attwood, M. Blythe, D.R. Flower, A. Gaulton, J.E. Mabey, N. Maudling, L. McGregor, A. Mitchell, G. Moulton, K. Paine, and P. Scordis. Prints and prints-s shed light on protein ancestry. Nucleic Acids Research, 30(1): , [9] L. Falquet, M. Pagni, P. Bucher, N. Hulo, C.J. Sigrist, K. Hofmann, and A. Bairoch. The PROSITE database, its status in Nucliec Acids Research, 30: , [10] S. Henikoff, J.G. Henikoff, and S. Pietrokovski. Blocks+: A non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics, 15(6): , [11] J.Y. Huang and D.L. Brutlag. The emotif database. Nucleic Acids Research, 29(1): , [12] Q. Su, S. Saxonov, and D.L. Brutlag. eblocks: an automated database of protein conserved regions maximizing sensitivity and specificity, [13] C. O Donovan, M.J. Martin, A. Gattiker, E. Gasteiger, A. Bairoch A., and R. Apweiler. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief. Bioinform., 3: , 2002.
5 [14] Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme Nomenclature. Recommendations Academic Press, [15] A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems, [16] A. Ben-Hur and D. Brutlag. Remote homology detection: A motif based approach. In Proceedings, eleventh international conference on intelligent systems for molecular biology, volume 19 suppl 1 of Bioinformatics, pages i26 i33, [17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46: , [18] C.W. Hsu and C.J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13: , 2002.
Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*
COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationCross-references to the corresponding SWISS-PROT entries as well as to matched sequences from the PDB 3D-structure database 2 are also provided.
Amos Bairoch and Philipp Bucher are group leaders at the Swiss Institute of Bioinformatics (SIB), whose mission is to promote research, development of software tools and databases as well as to provide
More informationMotif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana
Motif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim Indiana University, School of Informatics
More informationCross-references to the corresponding SWISS-PROT entries as well as to matched sequences from the PDB 3D-structure database 2 are also provided.
Amos Bairoch and Philipp Bucher are group leaders at the Swiss Institute of Bioinformatics (SIB), whose mission is to promote research, development of software tools and databases as well as to provide
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationMaking Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University
Making Sense of DNA and Protein Sequences Lily Wang, PhD Department of Biostatistics Vanderbilt University 1 Outline Biological background Major biological sequence databanks Basic concepts in sequence
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationTeaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics
KBM Journal of Science Education (2010) 1 (1): 7-12 doi: 10.5147/kbmjse/2010/0013 Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics Pablo Sobrado Department of Biochemistry,
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationORGANISATION AND STANDARDISATION OF INFORMATION IN SWISS-PROT AND TREMBL
13 ORGANISATION AND STANDARDISATION OF INFORMATION IN SWISS-PROT AND TREMBL Michele Magrane* and Rolf Apweiler. EMBL Outstation European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton,
More informationImproving SIM-based Annotation Method of Protein Sequence Using Support Vector Machine
SU-E2-3 SCIS&ISIS2006 @ Tokyo, Japan (September 20-24, 2006) Improving SIM-based Annotation Method of Protein Sequence Using Support Vector Machine Jung-Ying Wang 1,2 1 1,3, Cheng-Kang Liu and Hahn-Ming
More informationRepresentation in Supervised Machine Learning Application to Biological Problems
Representation in Supervised Machine Learning Application to Biological Problems Frank Lab Howard Hughes Medical Institute & Columbia University 2010 Robert Howard Langlois Hughes Medical Institute What
More informationproteins PREDICTION REPORT Fast and accurate automatic structure prediction with HHpred
proteins STRUCTURE O FUNCTION O BIOINFORMATICS PREDICTION REPORT Fast and accurate automatic structure prediction with HHpred Andrea Hildebrand, Michael Remmert, Andreas Biegert, and Johannes Söding* Gene
More informationStructural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions
Introduction Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke, Rochester Institute of Technology Mentor: Carlos Camacho, University
More informationJAFA: a Protein Function Annotation Meta-Server
JAFA: a Protein Function Annotation Meta-Server Iddo Friedberg *, Tim Harder* and Adam Godzik Burnham Institute for Medical Research Program in Bioinformatics and Systems Biology 10901 North Torrey Pines
More informationChristian Sigrist. January 27 SIB Protein Bioinformatics course 2016 Basel 1
Christian Sigrist January 27 SIB Protein Bioinformatics course 2016 Basel 1 General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific
More informationAnnotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station
Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station 1 Library preparation Sequencing Hypothesis testing Bioinformatics 2 Why annotate?
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationAn Overview of Protein Structure Prediction: From Homology to Ab Initio
An Overview of Protein Structure Prediction: From Homology to Ab Initio Final Project For Bioc218, Computational Molecular Biology Zhiyong Zhang Abstract The current status of the protein prediction methods,
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1 How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not
More informationAC Algorithms for Mining Biological Sequences (COMP 680)
AC-04-18 Algorithms for Mining Biological Sequences (COMP 680) Instructor: Mathieu Blanchette School of Computer Science and McGill Centre for Bioinformatics, 332 Duff Building McGill University, Montreal,
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationThe Brutlag Bioinformatics Group Conserved Structural and Functional Motifs in Proteins
The Brutlag Bioinformatics Group http://brutlag.stanford.edu/ Conserved Structural and Functional Motifs in Proteins Doug Brutlag Department of Biochemistry & Medicine (by courtesy) The Zinc Finger A Typical
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationA Hidden Markov Model for Identification of Helix-Turn-Helix Motifs
A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs CHANGHUI YAN and JING HU Department of Computer Science Utah State University Logan, UT 84341 USA cyan@cc.usu.edu http://www.cs.usu.edu/~cyan
More informationBioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics
The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the
More informationA Novel Splice Site Prediction Method using Support Vector Machine
Journal of Computational Information Systems 9: 20 (2013) 8053 8060 Available at http://www.jofcis.com A Novel Splice Site Prediction Method using Support Vector Machine Dan WEI 1,2, Huiling ZHANG 2, Yanjie
More informationExtracting Database Properties for Sequence Alignment and Secondary Structure Prediction
Available online at www.ijpab.com ISSN: 2320 7051 Int. J. Pure App. Biosci. 2 (1): 35-39 (2014) International Journal of Pure & Applied Bioscience Research Article Extracting Database Properties for Sequence
More informationThe Catalytic Site Atlas: a resource of catalytic sites and residues identi ed in enzymes using structural data
The Catalytic Site Atlas: a resource of catalytic sites and residues identi ed in enzymes using structural data Craig T. Porter 1, Gail J. Bartlett 1,2 and Janet M. Thornton 1, * Nucleic Acids Research,
More informationGenome-Scale Predictions of the Transcription Factor Binding Sites of Cys 2 His 2 Zinc Finger Proteins in Yeast June 17 th, 2005
Genome-Scale Predictions of the Transcription Factor Binding Sites of Cys 2 His 2 Zinc Finger Proteins in Yeast June 17 th, 2005 John Brothers II 1,3 and Panayiotis V. Benos 1,2 1 Bioengineering and Bioinformatics
More informationA Support Vector Machine Approach for LTP Using Amino Acid Composition
A Support Vector Machine Approach for LTP Using Amino Acid Composition N. Hemalatha and N.K. Narayanan Abstract Identifying the functional characteristic in new annotated proteins is a challenging problem.
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationApplication of Emerging Patterns for Multi-source Bio-Data Classification and Analysis
Application of Emerging Patterns for Multi-source Bio-Data Classification and Analysis Hye-Sung Yoon 1, Sang-Ho Lee 1,andJuHanKim 2 1 Ewha Womans University, Department of Computer Science and Engineering,
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: January 16, 2013 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationStudy on the Application of Data Mining in Bioinformatics. Mingyang Yuan
International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New
More informationMethods and tools for exploring functional genomics data
Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for
More informationExploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling
Exploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling Presented by Kate Stafford 1,2 Research Mentor: Troy Wymore 3 1 Bioengineering and Bioinformatics
More informationExploring Similarities of Conserved Domains/Motifs
Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;
More informationFunctional profiling of metagenomic short reads: How complex are complex microbial communities?
Functional profiling of metagenomic short reads: How complex are complex microbial communities? Rohita Sinha Senior Scientist (Bioinformatics), Viracor-Eurofins, Lee s summit, MO Understanding reality,
More informationFiltering Redundancies For Sequence Similarity Search Programs
Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 22, Issue Number 4, (2005) Adenine Press (2005) Abstract Filtering Redundancies For Sequence Similarity Search Programs http://www.jbsdonline.com
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationBioinformation by Biomedical Informatics Publishing Group
Algorithm to find distant repeats in a single protein sequence Nirjhar Banerjee 1, Rangarajan Sarani 1, Chellamuthu Vasuki Ranjani 1, Govindaraj Sowmiya 1, Daliah Michael 1, Narayanasamy Balakrishnan 2,
More informationPredicting prokaryotic incubation times from genomic features Maeva Fincker - Final report
Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.
More informationBLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments
BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database
More informationSmall Genome Annotation and Data Management at TIGR
Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient
More informationDNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences
DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,
More informationApplying Machine Learning Strategy in Transcription Factor DNA Bindings Site Prediction
Applying Machine Learning Strategy in Transcription Factor DNA Bindings Site Prediction Ziliang Qian Key Laboratory of Systems Biology Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences,
More informationEURASIP Journal on Bioinformatics and Systems Biology
EURASIP Journal on Bioinformatics and Systems Biology This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available
More informationDesigning Filters for Fast Protein and RNA Annotation. Yanni Sun Dept. of Computer Science and Engineering Advisor: Jeremy Buhler
Designing Filters for Fast Protein and RNA Annotation Yanni Sun Dept. of Computer Science and Engineering Advisor: Jeremy Buhler 1 Outline Background on sequence annotation Protein annotation acceleration
More informationGrundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1
More informationSequence Analysis. BBSI 2006: Lecture #(χ+3) Takis Benos (2006) BBSI MAY P. Benos 1
Sequence Analysis (part III) BBSI 2006: Lecture #(χ+3) Takis Benos (2006) BBSI 2006 31-MAY-2006 2006 P. Benos 1 Outline Sequence variation Distance measures Scoring matrices Pairwise alignments (global,
More informationSupplementary materials - MeMo: A web tool for prediction of protein. methylation modifications
Supplementary materials - MeMo: A web tool for prediction of protein methylation modifications Hu Chen 1, Yu Xue 2, Ni Huang 1, Xuebiao Yao 2, * and Zhirong Sun 1, * 1 Institute of Bioinformatics and Systems
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationPrediction of Binding Sites in the Mouse Genome Using Support Vector Machines
Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines Yi Sun 1, Mark Robinson 2, Rod Adams 1, Alistair Rust 3, and Neil Davey 1 1 Science and technology research school, University
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationPrediction of DNA-Binding Proteins from Structural Features
Prediction of DNA-Binding Proteins from Structural Features Andrea Szabóová 1, Ondřej Kuželka 1, Filip Železný1, and Jakub Tolar 2 1 Czech Technical University, Prague, Czech Republic 2 University of Minnesota,
More informationIterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data
http://www.psi.toronto.edu Iterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data Jim C. Huang, Quaid D. Morris, Brendan J. Frey October 06, 2004 PSI TR 2004 031 Iterated
More informationAn Evolutionary Optimization for Multiple Sequence Alignment
195 An Evolutionary Optimization for Multiple Sequence Alignment 1 K. Lohitha Lakshmi, 2 P. Rajesh 1 M tech Scholar Department of Computer Science, VVIT Nambur, Guntur,A.P. 2 Assistant Prof Department
More informationSVM based prediction of RNA-binding proteins using binding residues and evolutionary information
Research Article Received: 17 October 2009, Revised: 3 May 2010, Accepted: 13 May 2010, Published online in Wiley InterScience: 2010 (www.interscience.wiley.com) DOI:10.1002/jmr.1061 SVM based prediction
More informationDatabase Searching and BLAST Dannie Durand
Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is
More informationApplying Hidden Markov Model to Protein Sequence Alignment
Applying Hidden Markov Model to Protein Sequence Alignment Er. Neeshu Sharma #1, Er. Dinesh Kumar *2, Er. Reet Kamal Kaur #3 # CSE, PTU #1 RIMT-MAEC, #3 RIMT-MAEC CSE, PTU DAVIET, Jallandhar Abstract----Hidden
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 19 2005, pages 3711 3718 doi:10.1093/bioinformatics/bti608 Structural bioinformatics Motif-based protein ranking by network propagation Rui Kuang 1, Jason Weston
More informationBINARY CLASSIFICATION OF UNCHARACTERIZED PROTEINS INTO DNA BINDING/NON-DNA BINDING PROTEINS FROM SEQUENCE DERIVED FEATURES USING ANN
Digest Journal of Nanomaterials and Biostructures Vol. 4, No. 4, December 2009, p. 775-782 BINARY CLASSIFICATION OF UNCHARACTERIZED PROTEINS INTO DNA BINDING/NON-DNA BINDING PROTEINS FROM SEQUENCE DERIVED
More information9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3
cdna libraries, EST clusters, gene prediction and functional annotation Biosciences 741: Genomics Fall, 2013 Week 3 1 2 3 4 5 6 Figure 2.14 Relationship between gene structure, cdna, and EST sequences
More informationNACSVM Pred : A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RICE USING SUPPORT VECTOR MACHINES
NACSVM Pred : A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RICE USING SUPPORT VECTOR MACHINES Hemalatha N. 1,*, Rajesh M. K. 2 and Narayanan N. K. 3 1 AIMIT, St. Aloysius College, Mangalore,
More informationPh.D. in Information and Computer Science (Area: Bioinformatics), University of California, Irvine, August, (Advisor: Dr.
Jianlin Cheng Assistant Professor School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816 Phone: (407) 968-9746 Email: jianlin.cheng@gmail.com Web: http://www.eecs.ucf.edu/~jcheng
More informationSupplement for the manuscript entitled
Supplement for the manuscript entitled Prediction and Analysis of Nucleotide Binding Residues Using Sequence and Sequence-derived Structural Descriptors by Ke Chen, Marcin Mizianty and Lukasz Kurgan FEATURE-BASED
More informationI-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure
W306 W310 Nucleic Acids Research, 2005, Vol. 33, Web Server issue doi:10.1093/nar/gki375 I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure Emidio Capriotti,
More informationEnhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks
Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks Meghana Chitale and Daisuke Kihara Abstract After reviewing the underlying framework required for computational
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationComparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center
Comparative Modeling Part 1 Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center Function is the most important feature of a protein Function is related to structure Structure is
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationModeling gene expression data via positive Boolean functions
Modeling gene expression data via positive Boolean functions Francesca Ruffino 1, Marco Muselli 2, Giorgio Valentini 1 1 DSI, Dipartimento di Scienze dell Informazione, Università degli Studi di Milano,
More informationGA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES
GA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES N.REVATHY 1, Dr.R.BALASUBRAMANIAN 2 1 Assistant Professor, Department of Computer Applications, Karpagam
More informationFollowing text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005
Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of
More informationBasic Local Alignment Search Tool
14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren
More informationIdentifying Splice Sites Of Messenger RNA Using Support Vector Machines
Identifying Splice Sites Of Messenger RNA Using Support Vector Machines Paige Diamond, Zachary Elkins, Kayla Huff, Lauren Naylor, Sarah Schoeberle, Shannon White, Timothy Urness, Matthew Zwier Drake University
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 What we have
More informationCommunity-assisted genome annotation: The Pseudomonas example. Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada
Community-assisted genome annotation: The Pseudomonas example Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada Overview Pseudomonas Community Annotation Project (PseudoCAP) Past
More informationStatistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Jianlin Cheng, PhD Computer Science Department and Informatics Institute University of
More informationTranscriptional Regulatory Code of a Eukaryotic Genome
Supplementary Information for Transcriptional Regulatory Code of a Eukaryotic Genome Christopher T. Harbison 1,2*, D. Benjamin Gordon 1*, Tong Ihn Lee 1, Nicola J. Rinaldi 1,2, Kenzie Macisaac 3, Timothy
More informationArticle A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm
Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm Zhongneng Xu * Yayun Yang Beibei Huang, From the Department of Ecology, Jinan University, Guangzhou 510632,
More informationA Hybrid Approach for Gene Selection and Classification using Support Vector Machine
The International Arab Journal of Information Technology, Vol. 1, No. 6A, 015 695 A Hybrid Approach for Gene Selection and Classification using Support Vector Machine Jaison Bennet 1, Chilambuchelvan Ganaprakasam
More informationHiSP: A Probabilistic Data Mining Technique for Protein Classification
HiSP: A Probabilistic Data Mining Technique for Protein Classification Luiz Merschmann and Alexandre Plastino Departamento de Ciência da Computação Universidade Federal Fluminense Niterói, Brazil {lmerschmann,
More informationFinding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm
Finding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm Yen-Wei Chu 1,3, Chuen-Tsai Sun 3, Chung-Yuan Huang 2,3 1) Department of Information Management 2) Department
More informationReport on DIMACS Workshop on Machine Learning Techniques in Bioinformatics
Report on DIMACS Workshop on Machine Learning Techniques in Bioinformatics Dates: July 11-12, 2006 Location: DIMACS Center, CoRE Building, Rutgers University Organizers: Dechang Chen, Uniformed Services
More informationFunction Prediction of Proteins from their Sequences with BAR 3.0
Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2
More informationLearning Methods for DNA Binding in Computational Biology
Learning Methods for DNA Binding in Computational Biology Mark Kon Dustin Holloway Yue Fan Chaitanya Sai Charles DeLisi Boston University IJCNN Orlando August 16, 2007 Outline Background on Transcription
More informationOutline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases
Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing
More informationBIOINFORMATICS THE MACHINE LEARNING APPROACH
88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and
More informationComparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.
Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied
More informationWavelet Transform for Detection of Conserved Motifs in Protein Sequences with Ten Bit Physico-Chemical Properties
Wavelet Transform for Detection of Conserved Motifs in Protein Sequences with Ten Bit Physico-Chemical Properties J. K. Meher, M. K. Raval, P. K. Meher, and G. N. Dash Abstract Detection of common motifs
More informationBioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)
Vol. 6(1), pp. 1-6, April 2014 DOI: 10.5897/IJBC2013.0086 Article Number: 093849744377 ISSN 2141-2464 Copyright 2014 Author(s) retain the copyright of this article http://www.academicjournals.org/jbsa
More information