Software review. Bioinformatics software resources

Similar documents
Types of Databases - By Scope

bioinformatica 6EF2F181AA1830ABC10ABAC56EA5E191 Bioinformatica 1 / 5

Introduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1

The Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

State strategies for bioinformatics in BRICs and the UK. Professor Brian Salter

Access to Information from Molecular Biology and Genome Research

Two Mark question and Answers

Proteomics: New Discipline, New Resources. Fred Stoss, University at Buffalo, NERM 2004, Rochester, NY

Practical Bioinformatics for Biologists (BIOS 441/641)

Bioinformatics 2. Lecture 1

Lecture 1. Bioinformatics 2. About me... The class (2009) Course Outcomes. What do I think you know?

Open Access in EWI Focus Workshop 14 th October Johanna Kuhn BioMed Central Ltd

Supporting Information Tasks with User-Centred System Design: The development of an interface supporting bioinformatics analysis

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Introduction to BIOINFORMATICS

Examination Assignments

Bioinformatics Specialist

Introduction to Bioinformatics

Introduc)on to Databases and Resources Biological Databases and Resources

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

Minimum Information About a Microarray Experiment (MIAME) Successes, Failures, Challenges

The Need for Scientific. Data Annotation. Alick K Law, Ph.D., M.B.A. Marketing Manager IBM Life Sciences.

Genetics and Bioinformatics

BIOL 274 Introduction to Bioinformatics Fall 2016

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis. The Open2Dprot Project. Introduction

1 st transplant user training workshop Versailles, 12th-13th November 2012

Practical Bioinformatics for Biologists (BIOS493/700)

Arshan Nasir Ph:

Product Applications for the Sequence Analysis Collection

Introduction to Bioinformatics

Genome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita

BIORISE D6.1 SYLLABUS OF THE CSMM BIOINFORMATICS COURSE

Media Information 2010

2018 Media Kit. The world s leading online food safety community!

Customer Service Portal Overview

I nternet Resources for Bioinformatics Data and Tools

G4120: Introduction to Computational Biology

Plant genome annotation using bioinformatics

NETTAB On the Use of Agents in a Bioinformatics Grid. Luc Moreau, University of Southampton

Bioinformatics. Joshua Gilkerson Albert Kalim Ka-him Leung David Owen

Computational Genomics ( )

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Overview of Health Informatics. ITI BMI-Dept

Perspectives on the Priorities for Bioinformatics Education in the 21 st Century

ELE4120 Bioinformatics. Tutorial 5

Increase throughput: spend less time cutting and. any CDS into Agilent's ELN

DIGITAL ADVERTISING. OPPORTUNITIES NEWSLETTER n WEBSITE n JOB BOARDS n PRODUCT DIRECTORY. CLINICAL INFORMATICS NEWS Clinical Trials to the Clinic

ECS 234: Introduction to Computational Functional Genomics ECS 234

Introduction to Bioinformatics

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

European Genome phenome Archive at the European Bioinformatics Institute. Helen Parkinson Head of Molecular Archives

Media Kit Incorporated in 2003 the IFSQN is the worlds leading independent food safety networking website

SEM (Search Engine Marketing)

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

The EMBL-Bioinformatics and Data-Intensive Informatics

Computational methods in bioinformatics: Lecture 1

NCBI web resources I: databases and Entrez

Karmann Mills 1, Anthony Hickey 1, Alexander Tropsha 2

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools

What is Bioinformatics?

Functional analysis using EBI Metagenomics

Flare Specialised Modules

OMICS Journals are welcoming Submissions

Genome Sequence Assembly

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Cost Effective Legal Research

Advertise with BIO to reach more than 100,000 leaders in the biotech and pharmaceutical industry.

Content Management and The Research Librarian

Global Service Providers Guide 2019 Features

Food Safety (Bio-)Informatics

Introduction. Highlights. Prepare Library Sequence Analyze Data

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

Springer Protocols. springer.com. Introducing the world s most comprehensive collection of peer-reviewed life sciences protocols

An Introduction to the Association of Biomolecular Resource Facilities (ABRF) and Recent Results of a Laboratory Funding Survey

On-Demand Solution Planning Guide

B I O I N F O R M A T I C S

Grand Challenges in Computational Biology

2015 Media Kit ENTOMOLOGY 2013 ESA 61ST ANNUAL MEETING NOVEMBER 10 13, AUSTIN, TEXAS AUSTIN CONVENTION CENTER. Program Book

Protein Bioinformatics Part I: Access to information

Product Portfolio. CASU SW Marketing February 19, 2009

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

High peformance computing infrastructure for bioinformatics

Biological databases an introduction

Case for Support: A Workflow-based High Throughput Toolkit for Primer Design

ECS 234: Introduction to Computational Functional Genomics ECS 234

ANVIL MARKETING SERVICES

Authors: Steven Jewell Assistant Director IT and e-government Tel: ; Paul Fleming Systems Architect

Bioinformatics Sequence And Genome Analysis David W Mount

U.S. Navy Hunters Point Shipyard (HPS) Web Portal

Applications in Bio-informatics and Biomedical Engineering

Success you can measure

M a x i m i z in g Value from NGS Analytics in t h e E n terprise

Free, open access to research. Membership Director. Becky Fishman. papers.

The Ultimate News Brief

OPEN ACCESS A Commercial Publisher s View. Karen Hunter Senior Vice President, Strategy, Elsevier ACS Meeting, August 23, 2004 Philadelphia

Bioinformatics, in general, deals with the following important biological data:

Certified Digital Marketing Professional VS-1217

Transcription:

Bioinformatics software resources Keywords: bioinformatics software archives, web hyperlink catalogues, bioinformatics news Abstract This review looks at internet archives, repositories and lists for obtaining popular and useful biology and bioinformatics software. Resources include collections of free software, services for the collaborative development of new programs, software news media and catalogues of links to bioinformatics software and web tools. Problems with such resources arise from needs for continued curator effort to collect and update these, combined with less than optimal community support, funding and collaboration. Despite some problems, the available software repositories provide needed public access to many tools that are a foundation for analyses in bioscience research efforts. INTRODUCTION When the time comes to analyse results of a new experiment, bioscientists now supplement their tool set of spreadsheet, commercial bioinformatics and statistics programs, and word processor with bioinformatics tools located through the web. This is essential for many biology fields including sequence analysis, biodata management, phylogenetics, gene expression and proteomics. One can save time and reach more reliable conclusions by choosing the most appropriate analysis tools. Free bioinformatics tools are widely available, but it is not always easy to find the relevant ones, or even those that were available a few years ago. Web searches via Google, Yahoo and similar general search systems can miss the best of biology s unique and focused resources. This review highlights some of biology s current clearing-houses of informatics tools. Internet resources that one can use to find suitable biology software fall into four general groupings: resource sites (archives, bioinformatics service organisations); web lists and catalogues pointing to other sources; news and discussion groups where information on new and updated software are announced; and publications. Table 1 summarises a selection of these resources, which is by no means comprehensive. These were selected to represent some of the more comprehensive and/or actively managed resources for biology software. Archives or repositories maintain collections of software from several authors. These serve both as long-term libraries of useful tools, and a reference for new software authors who want to avoid reinventing wheels. Bioinformatics programs, including older ones especially with source code, are widely read and used as reference works and building blocks by other bioinformaticians in development of new software. Long-term archives provide continuing access when existence of authors web or ftp sites is often limited to a few years. 1 RESOURCES Archives and web tools When one wants to run software on a workstation, stock a bioinformatics service centre with tools, or refer to related programs when developing new software, one needs to browse the web for programs. Large collections of 300 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 3. 300 304. SEPTEMBER 2004

Table 1: Sources for biology software Source Web URL Notes Selected bioinformatics resource sites Bioinformatics.org http://bioinformatics.org/ Bioinformatics developer repository BioPortal, Weizmann Institute http://bioportal.weizmann.ac.il/ Software and data archive, web tools BioWeb, Pasteur Institute http://bioweb.pasteur.fr/ Software and data archive, web tools Canadian Bioinformatics Resource (CBR) http://cbr-rbc.nrc-cnrc.gc.ca/ Web tools European Bioinformatics Institute http://www.ebi.ac.uk/ Web tools, data and software archive, catalogue IUBio Archive, Indiana University http://iubio.bio.indiana.edu/ Software and data archive, catalogue, web tools RFSB, Bioinformatics Centre, IMT http://imtech.res.in/pdsb/ Software archive SourceForge.net http://sourceforge.net/ General developer repository with bioinformatics section Selected web lists and catalogues BioHunt, ExPASy http://www.expasy.org/biohunt/ Automatic robot updated list Bioinformatics.ca http://www.bioinformatics.ca/links_directory Curated list Bioinformatics.net http://www.bioinformatics.net/ Curated list Bioinformatik.de http://www.bioinformatik.de/ Curated list BioNetbook, Pasteur Institute http://www.pasteur.fr/recherche/bnb/bnb-en.html Semi-automatic updates CSM Molecular Biology Resource, SDSC http://restools.sdsc.edu/ Curated list GenomeWeb http://www.hgmp.mrc.ac.uk/genomeweb/ Curated list Open Directory Project http://dmoz.org/science/biology/bioinformatics/ Submitted links SouthWest Biotechnology and Informatics Centerhttp://www.swbic.org/ Curated list News and discussion groups BIOSCI/Bionet http://www.bio.net/ Biologist and bioinformatics focus Bioinformatics.org http://bioinformatics.org/ Bioinformatics focus Bioinformatics.net http://bioinformatics.net/ Biologist focus Selected bioinformatics publications for software tools BioInform http://www.bioinform.com/ Bioinformatics news briefs Bioinformatics http://bioinformatics.oupjournals.org/ Bioinformatics focus, see application notes for biologists BMC Bioinformatics http://www.biomedcentral.com/bmcbioinformatics/ Bioinformatics focus Briefings in Bioinformatics http://www.henrystewart.com/journals/bib/ Biologist and bioinformatics focus programs are available through BioWeb at Pasteur, BioPortal at Weizmann Institute, European Bioinformatics Institute (EBI), IUBio Archive at Indiana University and RFSB at IMT India. The project repositories at Bioinformatics.org and SourceForge.net also offer bio-software with source code, many in active development. IUBio Archive maintained by this author is one such long-term archive, serving public biology and bioinformatics software since 1989. It houses over 500 software titles, many of which have been added in recent years. The addition of EPrints.org 2 self-archiving web database for promoting author-contributed software will help expand this archive with minimal support. The Repository of Free Software in Biology (RFSB) at the Institute of Microbial Technology, India, collects and archives free, academic biology software. 3 This resource also offers long-term access to many popular biology programs. Among those resources offering bioinformatics web tools, some of the more comprehensive include EBI, BioWeb Pasteur and Canadian Bioinformatics Resource. There are a number of common bioinformatics analyses one can perform at these sites, including BLAST and sequence analyses, primer tools and phylogenetics tree construction. EMBOSS sequence analysis package and SRS bio-database access are among the widely useful web tools available at these and other resource sites. Web lists and catalogues There are numerous web lists of bioinformatics resources, with many aimed at the biologist looking for software. These are successors to last & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 3. 300 304. SEPTEMBER 2004 301

decade s popular bioinformatics lists by Keith Robison 4 and Pedro Coutinho. 5 Some of these, such as Bioinformatics.net, include discussion forums on the use of biology software. These are useful for biologists, as well as bioinformatics engineers looking for tools related to their work, or to be used at service centres. Many of these share a similar organisation by functional categories, with many of the same links. It is useful to compare these for their different editorial perspectives, eg genomics/molecular biology or proteomics/biochemistry, as well as effort to update and remove obsolete links. General resources such as Google, Amazon s Alexa and Open Directory Project at Mozilla.org include biology and bioinformatics categories in their directories. These directories are populated by robots or from submissions; they tend to lack the comprehensiveness of biologist-maintained lists. Bioinformatics.ca provides a curated list of links that are well organised in categories, with main sections that include human genome and model organisms, sequences, gene expression, education and computer-related resources. Most or all of these include useful editorial comments on the content and value of the linked resources, making this list especially useful in learning about resources. The GenomeWeb at MRC, UK, offers a similar very useful catalogue of links with editorial abstracts. An interesting function at Bioinformatics.ca is provided by an XML standard for web news called RSS, for sharing bioinformatics links. This allows customers and other web sites to have computable access to this catalogue. For instance, you can use an RSS program to notify you of additions and changes to this catalogue. The BioNetbook project at Pasteur Institute provides an example of resource lists that are searchable by several bioinformatics criteria: Biological Domain, eg sequence analysis or structural biology, Resource type, eg database or online analysis tools, and Organism. This biology-focused search engine proves especially useful in finding that tool or resource most relevant to one s research. This project also has implemented link maintenance by using semi-automatic scanning of internet news and resources (robot-like) to update the catalogue. A similar project is BioHunt, which uses internet robot technology to search and update molecular biology resources. BioHunt maintains current entries (it shows update times of this review month for several searches), making it especially useful to find new or updated tools that one has heard of, but lacks curated cataloguing of these to make it easy to find by subject matter. Bioinformatics.net is a catalogue of online biology resources, specialising in bioinformatics tools. Its focus is towards the needs of molecular biologists and life science professionals, more than for bioinformaticians, and includes discussion and help forums on the use of software and bioscience topics. Jonathan Rees, who developed this resource, also curates biology lists in the Open Directory Project. This service is supported in part by advertising, as are others reviewed here, one of the limited options available to maintain such services. Bioinformatik.de offers a similar directory style collection of curated bioinformatics and biology resource links. The CMS molecular biology resource is an extensive catalogue of biology resources, including software tools. The SouthWest Biotechnology Center also maintains a useful catalogue covering a broad range of biology resources. Bioinformatics.org and SourceForge.net are resources that support software developers and bioinformatics engineers, but are also useful to biologists looking for tools. Open-source software development in bioinformatics and other fields is being invigorated through agencies such as these. The number of active, widely used and valuable bioinformatics projects at these services is growing, including Generic Model Organism Database, Gene 302 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 3. 300 304. SEPTEMBER 2004

Ontology, GeneX Gene Expression Database and Staden Package for sequence analysis. These agencies allow for software archiving, but the primary attractions to software developers are infrastructure and tools that enable collaborative software development. A historical archive or catalogue service of bioinformatics software is limited, and maintenance of software releases is left to developers using this service. News and publications Sources for announcements of software include the 15-year-old BIOSCI/Bionet public forums of bionet.software, bionet.software.www, bionet.biology.computational and bionet.announce news groups. The Bionet groups have been overshadowed in recent years by web discussion forums, such as those at Bioinformatics.org and Bioinformatics.net. They have a core value by carrying openly accessible and widely distributed biology software news and discussion, including through web portals such as www.bio.net and Google groups. Other sources include the growing number of bioinformatics paper and electronic publications, and web list/ forum sites devoted to biology and bioinformatics. Distinctions made between biologist and bioinformatics focus are fuzzy, but suggest the tendency of resources to address needs of those with one or the other field of primary training. This journal Briefings in Bioinformatics covers a range of new software and bioinformatics projects, generally with a focus that spans the interests of bioscientists and bioinformaticians. The journal Bioinformatics, originally CABios in the 1980s, is the longest-running publication for original papers in this field. It includes a useful section of short application notes, where many new programs are announced. The weekly newsletter BioInform has a short section of recent new releases and updates to software and database projects in biology, along with in-depth news articles, often focused on biotechnology and pharmaceutical industry issues. BMC Bioinformatics is an electronic journal that requires no subscriber fees. It publishes a range of original reports on new bioinformatics software, with open access for their redistribution. DISCUSSION There are numerous, well-developed bioinformatics resources that bioscientists can use to find the best available research tools. All of those discussed here have particular values; the interested scientist will find time well spent browsing through some of these to find a particular tool. The most useful are those that have a high level of editorial comment and curation by biologists; these also tend to become out of date as funding for maintenance declines. The BioHunt and BioNetbook projects show that robot automation can add significant currency and reduce curation effort to maintaining bioinformatics catalogues and search services. Aside from the publications listed in Table 1, these resources are noncommercial, usually maintained by bioscientists and bioinformaticians, and are generally supported by an institution. A mixture of government support and growing biotechnology industry advertising supplements these. Although several of these have been in existence for many years, they share with a larger group of such resources an uncertain future. As Wren 1 noted in his survey of the longevity of access to URLs published in Medline abstracts (many of these for bioinformatics-related publications), 20 per cent or more disappear in less than a decade. With a yearly loss in access, older software URLs are likely to be unavailable. Often bioinformatics tools remain useful beyond the support an individual author or group can provide, especially those designed for specific problems that do not attract new development. Many of these are used by other bioinformaticians as reference works for development of new software. One major problem in maintaining & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 3. 300 304. SEPTEMBER 2004 303

software archives is collecting the software. All of the noted biology collections have been the effort of a few archivists, rather than author contributions. For several years, the journal Bioinformatics recommended IUBio Archive as a repository for published software, but this did not lead to any notable increase in author contributions. The issue of long-term software archiving is one that might be better handled with an institutional library paradigm such as maintains books, journals and other science reference material. The current archives are dependent on enthusiasm and dedication of those individuals who see value in contributing their time in the face of limited institutional and agency support. Bioinformatics is maturing as an academic discipline, and the field s journals are catching up to other electronic media for timely release of news and software announcements, in addition to becoming the preferred route of authors for such announcements. Merging into one venue these functions of software archiving, news and discussion, along with author publication reports, would make much sense for future bioinformatics software publication. This may be forthcoming, though a growing influence of traditional academic publishing modes, dispersion of these functions among several institutions, and limited funding, all must be addressed by any project to improve biology software publication and access. Acknowledgments This work is supported in part by NSF grant 0090782 and NIH grant 1R01HG002733-01 to D. Gilbert. References Don Gilbert Biology Department, Indiana University, Bloomington, Indiana 47405 USA Tel: +1 812 855 0587 E-mail: gilbertd@indiana.edu 1. Wren, J. D. (2004), 404 not found: The stability and persistence of URLs published in MEDLINE, Bioinformatics, Vol. 20, pp. 668 672 (DOI: 10.1093/bioinformatics/btg465). 2. EPrints.org, Self-Archiving and Open Archives organization (URL: http://www.eprints.org/). 3. Raghava, G. P. S. (2001), PDSB: Public domain software in biology, Biotech Software Internet Rep., Vol. 2, pp. 154 156. 4. Robison, K. (1993), WWW Virtual Library for Biosciences (URL: http://web.archive.org/ web/*/http://golgi.harvard.edu/biopages.html; see also http://mcb.harvard.edu/biolinks.html, http://vlib.org/biosciences.html). 5. Coutinho, P. (1995), Pedro s Biomolecular Research Tools (URL: http:// www.public.iastate.edu/pedro/ research_tools.html). 304 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 3. 300 304. SEPTEMBER 2004