Applications in Bio-informatics and Biomedical Engineering

Applications in Bio-informatics and Biomedical Engineering I. Rojas 1, H. Pomares 1, O. Valenzuela 2, and J.L. Bernier 1 1 Department of Computer Architecture and Computer Technology, CITIC-UGR 2 Department of Applied Mathematics University of Granada. Spain {irojas,hector,ovalenzuela,jbernier}@ugr.es Abstract. In this paper, an overview of the main topics presented in the special session of bioinformatics and biomedical engineering is presented. Bioinformatics consists of two subfields: the development of computational tools and databases, and the application of these tools and databases in generating biological knowledge to better understand living systems, being the main subject genomics and proteomics. Another knowledge scope close to the previous one are the problems related to medicine and biomedical engineering in which it is needed the participation of computer technologies and intelligent systems. The evolution of both disciplines, analyzing the number of publications presented in the bibliography during the last twenty years is presented. Keywords: Bioinformatics, proteomics, genomics, biomedical engineering. 1 Introduction: Genesis of an Important Bioinformatics Project On June 26 th of the year 2000, the sciences of biology, medicine and bioinformatics were altered by an important event: The Prime Minister of the United Kingdom and President of the United States held a joint press conference, linked via satellite, to announce the completion of the draft of the Human Genome Project (HPG). The HPG was completed in 2003, and it was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China and others. The sequence of three billion bases was the culmination of over a decade of work, during which the goal was always clearly in sight and the only questions were how fast the technology could progress and how generously the funding would flow (Table 1 shows the Landmarks in the Human Genome Project). Bioinformatics, the intersection of molecular biology and computer science, is a fascinating and challenging area on which to work, hybridizing different paradigms and technologies [1]-[4]. Bioinformatics could be defined as the science and J. Cabestany et al. (Eds.): IWANN 2009, Part I, LNCS 5517, pp. 820 828, 2009. Springer-Verlag Berlin Heidelberg 2009

Applications in Bio-informatics and Biomedical Engineering 821 Table 1. Main landmarks in the Human Genome Project 1953 Watson-Crick structure of DNA published. 1975 F. Sanger, and independently A. Maxam and W. Gilbert, develop methods for sequencing DNA. 1977 Bacteriophage Q)X-174 sequenced: first 'complete genome'. 1980 US Supreme Court holds that genetically-modified bacteria are patentable. This decision was the original basis for patenting of genes. 1981 Human mitochondrial DNA sequenced: 16569 base pairs. 1984 Epstein-Barr virus genome sequenced: 172 281 base pairs. 1990 International Human Genome Project launched-target horizon 15 years. 1991 J. Craig Venter and colleagues identify active genes via Expressed Sequence Tags-sequences of initial portions of DNA complementary to messenger RNA. 1992 Complete low resolution linkage map of the human genome. 1992 Wellcome Trust and United Kingdom Medical Research Council establish The Sanger Centre for large-scale genomic sequencing, directed by J. Sulsron. 1992 J. Craig Venter forms The Institute for Genome Research (T1GR), associated with plans to exploit sequencing commercially through gene identification and drug discovery. 1995 First complete sequence of a bacterial genome, Haemophilus influenzae, by TIGR. 1996 High resolution map of human genome-markers spaced by near 600 000 base pairs, 1996 Completion of yeast genome, first eukaryotic genome sequence. 1998 Celera claims to be able to finish human genome by 2001. Wellcome responds by increasing funding to Sanger Centre 1998 Caenorhabditis elegans genome sequence published. 1999 Human Genome Project states goal: working draft of human genome by 2001 (90% of genes sequenced to >95% accuracy). 12/1999 Sequence of first complete human chromosome published. 06/2000 Joint announcement of complete draft sequence of human genome, 2003 Fiftieth anniversary of discovery of the structure of DNA. Completion of highquality human genome sequence by public consortium technique for organizing and analyzing biological data. The advent of various omics tools motivated the rapid generation of enormous amounts of data, such as DNA and protein sequences, gene expression profiles and protein-protein interactions. Related with this topic, biomedical engineering applications is also a very interested field, in which theoretical advances and applications of information systems, artificial intelligence, signal processing, electronics and other engineering tools in knowledge areas related to biology and medicine have a big impact in the multi-disciplinary research community. A large part of the information to support biological and biomedical research is available in an increasingly wide variety of rapidly growing decentralized databases, and the use of advanced computing plays an important role in this field. For example, the new so-called high throughput measurement and sequence techniques have made possible genome wide studies of gene function. The use of computer technology for storing DNA sequence

822 I. Rojas et al. information and constructing the correct DNA sequences from fragments identified by restriction enzymes was one of the first applications, arising from the Human Genome Project, where intelligent systems and computer applications have an important impact. 2 Application of Intelligent Systems in Bioinformatics and Biomedicine Bioinformatics, understood as the development and use of computational techniques and tools for the analysis of biological information, is a discipline born to cover the needs of managing the enormous quantities of information coming from the projects destined to obtain the composition of life molecules (DNA and proteins). The development of Computer Science together with the establishment of Internet and molecule databases have made possible, in these last years, the study of genetic information by means of computer programs that accelerate the analysis of a high number of variables. In any case, the studies that Bioinformatics tries to carry out require the collaboration of a number of heterogeneous disciplines not so close to Biological problems. Computer science, as an indispensable complement, provides and develops systems, tools and algorithms for the study of the hereditary information. Statistics is essential for the management of large numbers and for the development of models with a high number of parameters. Physics and Chemistry are used in the development of models and in the analysis of molecular-level processes. Furthermore there are many disciplines that, in some way, tackle with the study of complex systems; and being biological systems the most complex in the known universe, there are many of those that can be involved in this new Science field, highlighting for example the, in principle seemingly remote, Information Theory. The increasing computational demand is a real problem in the bioinformatics area, and is present at least in two fronts: the uncontrolled growth of the biological data that results in higher execution times (seldom in a linear relation) and new and more complex problems caused by the availability of these data. Sequential computing is not always capable enough to solve this demand. Parallel computing, the use of multiprocessor systems and the new approximations from Grid computing are coming out as the best options to successfully tackle with those problems. Another knowledge scope close to the previous one are the problems related to medicine in which it is needed the participation of computer technologies. An area in which very successful results are being presented is the usage of intelligent systems formed using soft-computing paradigms (use of neural networks [7], fuzzy logic [8], support vector machines or evolutionary algorithms). The research performed in the area of bio-medical applications using softcomputing techniques is an emerging field, in which every time there are more contributions and publications aimed to solve different problems related to medicine. The database MEDLINE (http://www.nlm.nih.gov/) is considered as one of the most

Applications in Bio-informatics and Biomedical Engineering 823 important in the field, and the number of papers that make use of soft-computing tools has impressively grown during the last years. For example, during 1999-2003 more than two thousand and five hundred papers can be found related to fuzzy logic, neural networks and evolutionary algorithms, and during the last years the number of papers keeps on growing up. Although it is difficult to predict what will happen in the future, according to [5] and considering the classic evolution of a scientific discipline, the medical applications field tacked using intelligent systems capable of handling and facing complex problems with a high volume of data and high computational cost is under a phase of huge growth on an international level. 3 Main Topics of Bioinformatics and Biomedicine Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science. A variety of definitions exist in the literature and in the World Wide Web. Luscombe et al. define bioinformatics as a union of biology and informatics. Bioinformatics involves the technology that uses computers for storage, retrieval, manipulation and distribution of information related to biological macromolecules such as DNA, RNA, and proteins. Within bioinformatics two main topics are defined: genomics and proteomics. Genomics: The focus of genomics is the genome, being a genome all the DNA in an organism, including its genes. Genes carry information for making all the proteins required by all organisms. These proteins determine, among other things, how the organism looks, how well its body metabolizes food or fights infections, and sometimes even how it behaves. DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases. The particular order of As, Ts, Cs, and Gs is extremely important. The order underlies all of life's diversity, even dictating whether an organism is human or another species such as yeast, rice, or fruit fly, all of which have their own genomes and are themselves the focus of genome projects. Because all organisms are related through similarities in DNA sequences, insights gained from nonhuman genomes often lead to new knowledge about human biology [3]. Proteomics: Proteomics is focused on the identification of the tri-dimensional structure of the proteins, its interaction and its functionality. Papers about prediction of the structure and protein-protein interaction are presented in this session. The functional analyses include gene expression profiling, protein-protein interaction prediction, protein sub-cellular localization prediction, metabolic pathway reconstruction, and simulation [9]. The main aspects of bioinformatics analysis are not isolated but often interact in order to obtain a better understanding of the biological mechanism. For example, protein structure prediction depends on sequence alignment data; clustering of gene

824 I. Rojas et al. Structure Analysis Protein structure prediction Protein structure classification Protein structure compariosn Sequence Analysis Phylogeny Sequence aligment Mofitdiscovery Genome comparison Gene & promoter prediction Function Analysis Gene expression profiling Protein interation prediction Metabolic pathway modeling Databaseconstructionand curation Intelligent/Advanced Software Powerful computer platforms Fig. 1. Main topic in bioinformatics and the intrinsic connections with software development Table 2. Sources of data used in bioinformatic and subject areas that utilize this data when the draft of Human Genome was announced (August 2000, table from [6])

Applications in Bio-informatics and Biomedical Engineering 825 expression profiles requires the use of phylogenetic tree construction methods derived in sequence analysis (Figure 1). The advent of various omics tools allows the rapid generation of enormous amounts of data, such as DNA and protein sequences, gene expression profiles and protein-protein interactions [6]. Analyzing this new paradigm, some questions can be formulated. For example: How do we deal with these massive amounts of data and make sense out of them? How computer science and intelligent systems (such as neural networks) could deal 500 Evolution of the number of publications in ISI Web of Knowledge 300 Evolution of the number of publications in ISI Web of Knowledge 450 400 250 350 200 300 250 150 200 150 100 100 50 50 0 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 0 19901991199219931994199519961997199819992000200120022003200420052006200720082009 40,00% 35,00% 30,00% 25,00% 20,00% Percentage of publications in ISI Web of Knowledge during 1990-2009 in Bioinformatic and Intelligent Systems 30,00% Percentage of publications in ISI Web of Knowledge during 1990-2009 25,00% 20,00% 15,00% 15,00% 10,00% 10,00% 5,00% 0,00% 5,00% 0,00% 70,00% Type of publication (%) 60,00% Type of publication (in %) 60,00% 50,00% 50,00% 40,00% 40,00% 30,00% 30,00% 20,00% 20,00% 10,00% 10,00% 0,00% 0,00% PROCEEDINGS ARTICLE REVIEW EDITORIAL LETTER REPRINT ARTICLE PROCEEDING REVIEW MEETING LETTER NOTE SOFT.REVIEW PAPER MATERIAL Fig. 2. A) Left: Analysis of the publication in bioinformatics and intelligent systems; B) Right: Analysis of the publication in biomedical engineering and intelligent systems

826 I. Rojas et al. with and manage all the information contained in such large data base? In order to have an idea of the volume of data used in bioinformatics, Table 2 shows the order of magnitude used in genomics and proteomics. The use of intelligent systems and powerful platforms (such as clusters of computers) is where bioinformatics becomes extremely useful and essential for the next generation biologists. The emphasis is on the use of computers because most of the tasks in genomics data analysis are highly repetitive or mathematically complex. The use of computers, and the use of intelligent/advanced systems, is absolutely indispensable in mining genomes for information gathering and knowledge building. Another knowledge scope close to the previous one are the problems related to medical or biomedical applications using intelligent system [10]. This is a very active and energetic field, where multidisciplinary research teams work together in different disciplines, such as: Biomedical imaging, image processing & visualization, rehabilitation engineering and clinical engineering, health monitoring and wearable systems, bio-signal processing and analysis, biometrics and bio-measurement, medical robotics, micro nano biomedical devices & systems, neural and advanced systems engineering, telemedicine & healthcare; etc. 4 Future Trend and Conclusion The evolution of the number of papers per year of publication, obtained by the ISI Web of Knowledge in Journal and Conference proceedings is shown in Figure 2.a for bioinformatics applications using intelligent systems (defined with keywords such as neural networks, fuzzy systems and genetic algorithms), and Figure 2.b shows biomedical applications using also intelligent systems (using medical or medicine or biomedicine and some of the intelligent systems previously defined). It is important to note, that, of course, this is just an example of the keywords used for the search. Nevertheless, we will try to explore the evolution of publications related with bioinformatics and biomedicine and soft computing techniques. It is difficult to predict the future, but considering the classic shape representing the life of the scientific disciplines (Figure 3), the field of bioinformatics and biomedicine applications using soft-computing techniques has left behind the initial period of slow and moderate growth and is now in the phase of an explosive/strong growth. One can expect that this phase will last until 2010, and then will reach the plateau for about 10-15 years (depending of the new discoveries). The aim of the special session is to bring together researchers, professionals and industrial practitioners from all over the world for interaction and exchange of knowledge and ideas in all areas of bioinformatics, computational biology and biomedical engineering. Research, development or applications of advanced computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze or visualize such data are welcome in this session (NIH).

Applications in Bio-informatics and Biomedical Engineering 827 Fig. 3. Predictable development of the applications of intelligent systems in bioinformatics and biomedical engineering Acknowledgement This work has been partially supported by the CICYT Spanish Project TIN2007-60587. References 1. Ramamurthi, K.S., Lecuyer, S., Stone, H.A., et al.: Geometric Cue for Protein Localization in a Bacterium. Science 323(5919), 1354 1357 (2009) 2. Venteicher, A.S., Abreu, E.B., Meng, Z.J., et al.: A Human Telomerase Holoenzyme Protein Required for Cajal Body Localization and Telomere Synthesis. Science 323(5914), 644 648 (2009) 3. Marass, F., Upton, C.: Sequence Searcher: A Java tool to perform regular expression and fuzzy searches of multiple DNA and protein sequences. BMC Res. Notes 2, 14 (2009) 4. Palidwor, G.A., Shcherbinin, S., Huska, M.R., et al.: Detection of alpha-rod protein repeats using a neural network and application to huntingtin. PLoS Comput. Biol. 5(3) (2009) 5. Teodorescu, H., Kandel, A., Jain, L.C.: Fuzzy and neuro-fuzzy systems in medicine. CRC Press, Boca Raton (1999) 6. Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? An introduction and overview. Yearbook of Medical Informatics 2001 (2001) 7. Rojas, I., Pomares, H., Gonzales, J., et al.: Analysis of the functional block involved in the design of radial basis function networks. Neural Processing Letters 12(1), 1 17 (2000)

828 I. Rojas et al. 8. Rojas, I., Ortega, J., Pelayo, F.J., et al.: Statistical analysis of the main parameters in the fuzzy inference process. Fuzzy Sets And Systems 102(2), 157 173 (1999) 9. Malekpour, S.A., Naghizaideh, S., Pezeshk, H., et al.: Protein secondary structure prediction using three neural networks and a segmental semi Markov model. Mathematical Biosciences 217(2), 145 150 (2009) 10. Ubeyli, E.D.: Decision support systems for time-varying biomedical signals: EEG signals classification. Expert Systems With Applications 36(2), 2275 2284 (2009)