Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm

Similar documents
Dynamic Programming Algorithms

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Database Searching and BLAST Dannie Durand

Creation of a PAM matrix

BIOINFORMATICS IN BIOCHEMISTRY

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Introduction to Bioinformatics

Introduction to Bioinformatics

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

Exploring Similarities of Conserved Domains/Motifs

AN IMPROVED ALGORITHM FOR MULTIPLE SEQUENCE ALIGNMENT OF PROTEIN SEQUENCES USING GENETIC ALGORITHM

MATH 5610, Computational Biology

Practical Bioinformatics for Biologists (BIOS493/700)

90 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 4, 2006

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Practical Bioinformatics for Biologists (BIOS 441/641)

Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)

M. Phil. (Computer Science) Programme < >

Evaluation of Trace Alignment Quality and its Application in Medical Process Mining

Motif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana

Bioinformatics (Globex, Summer 2015) Lecture 1

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

A Greedy Algorithm for Minimizing the Number of Primers in Multiple PCR Experiments

[4] SCHEMA-Guided Protein Recombination

Computational Methods in Bioinformatics

Protein Structure Prediction. christian studer , EPFL

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

Introducing Bioinformatics Concepts in CS1

The Method Description of Target Gene Prediction

3 Designing Primers for Site-Directed Mutagenesis

Gibbs Sampling and Centroids for Gene Regulation

The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Catalog Addendum

Recommendations from the BCB Graduate Curriculum Committee 1

Methods for comparing multiple microbial communities. james robert white, October 1 st, 2007

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment

Teaching Bioinformatics in the High School Classroom. Models for Disease. Why teach bioinformatics in high school?

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Why learn sequence database searching? Searching Molecular Databases with BLAST

Biology Fundamentals (2): Genes

PCR PRIMER DESIGN SARIKA GARG SCHOOL OF BIOTECHNOLGY DEVI AHILYA UNIVERSITY INDORE INDIA

Unified nomenclature for the winged helix/forkhead transcription factors

Simulation Study of the Reliability and Robustness of the Statistical Methods for Detecting Positive Selection at Single Amino Acid Sites

Relationship between nucleotide sequence and 3D protein structure of six genes in Escherichia coli, by analysis of DNA sequence using a Markov model

STUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX

Optimal sequence alignments (chicken hemoglobin/homology-analogy/distance similarity/gap weighting/sellers, Needleman-Wunsch algorithms)

Study on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining

New Programs in Quantitative Biology: Hunter College.

A New Cellular Automata Based Converter for Genetic Sequences

Sequence Analysis Lab Protocol

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Genetic Algorithms For Protein Threading

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 2. Bioinformatics 1: Biology, Sequences, Phylogenetics

BIOINFORMATICS Introduction

Human KIR sequences 2003

Alignment to a database. November 3, 2016

Evolutionary Genetics. LV Lecture with exercises 6KP

GENOME ANALYSIS AND BIOINFORMATICS

1.1 What is bioinformatics? What is computational biology?

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

COMPUTER RESOURCES II:

Classification and Learning Using Genetic Algorithms

Introduction to BIOINFORMATICS

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Introduction to Bioinformatics. Ulf Leser

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Molecular Biology

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Study of Competency of Sports Teachers in Colleges and Universities Base on Students Cognition

Why Use BLAST? David Form - August 15,

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Gene Identification in silico

Theory and Application of Multiple Sequence Alignments

RNA Secondary Structure Prediction Computational Genomics Seyoung Kim

CROWNBench: A Grid Performance Testing System Using Customizable Synthetic Workload

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch

The Role of Unequal Crossover in Alpha-Satellite DNA Evolution: A Computational Analysis ABSTRACT

INTRODUCTION TO PLANT MOLECULAR BIOLOGY

CISC 436/636 Computational Biology &Bioinformatics (Fall 2016) Lecture 1

Introduction to bioinformatics

RNA folding & ncrna discovery

BIO 101 : The genetic code and the central dogma

CHAPTER 2 PROBLEM STATEMENT

CHAPTER 21 LECTURE SLIDES

Baum-Welch and HMM applications. November 16, 2017

APPLYING FEATURE-BASED RESAMPLING TO PROTEIN STRUCTURE PREDICTION

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University

A New Technique to Manage Big Bioinformatics Data Using Genetic Algorithms

S Basics for biosystems of the Cell PATENTING OF PROTEIN STRUCTURES AND PROTEOMICS INVENTIONS IN THE EUROPEAN PATENT OFFICE

Understanding DNA Structure

Transcription:

Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm Zhongneng Xu * Yayun Yang Beibei Huang, From the Department of Ecology, Jinan University, Guangzhou 510632, China, Departament d Enginyeria Quimica, Universitat Rovira i Virgili, 26 Av. dels Paisos Catalans, 43007 Tarragona, Spain, Department of Experimental Therapeutics, The University of Texas M. D. Anderson Cancer Center, Unit 36, Houston, TX 77030, USA Abstract The Needleman Wunsch algorithm has become one of the core algorithms in bioinformatics; however, this programming requires more suitable explanations for students with different major backgrounds. In supposing sample sequences and using a simple store system, the connection between the exhaustive search method and the Needleman Wunsch algorithm was analyzed to more thoroughly explain this algorithm. The present study could benefit the teaching and learning of the Needleman Wunsch algorithm. VC 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):194 204, 2017. Keywords: Needleman Wunsch algorithm; exhaustive search; sequence alignment; teaching Introduction The Needleman Wunsch algorithm (NW) has been widely used in global sequence alignment even though running this program presented substantial time and space requirements [1 3]. In addition to its use in global alignment, NW helped in the development of other algorithms, such as the Smith-Waterman algorithm, BLASTs, the CLUSTALs series [4 9]. How to understand this algorithm was a requirement to the bioinformatics learners. Since the development of the NW, efforts have been made to make it more widely understood and more easily used. Smith and Waterman (1981a; 1981b) [10,11] proved the equation mathematically and added an improved scoring system, thereby providing the equation currently used to calculate the scoring matrix for the alignment between two sequences: Volume 45, Number 3, May/June 2017, Pages 194 204 *Address for correspondence to: Department of Ecology, Jinan University, Guangzhou 510632, China. E-mail: txuzn@jnu.edu.cn Received 4 April 2016; Revised 12 August 2016; Accepted 6 September 2016 DOI 10.1002/bmb.21027 Published online 14 October 2016 in Wiley Online Library (wileyonlinelibrary.com) 8 0; ðwhen i51 and j51þ >< S i;j21 1g; ðwhen i51 and j > 1Þ S i;j 5 S i21;j 1g; ðwhen i > 1 and j51þ >: max ðs i21;j21 1M i;j ; S i;j21 1g; S i21;j 1gÞ; ðwhen i > 1 and j > 1Þ where S i,j is the score at the position in row i and line j of the matrix, g is the penalty for a gap, and M i,j is the score for aligning the characters at the position in row i and line j of the matrix. The application of affine gap penalties in linear time was reported in the work of improving NW [12,13]. Setubal and Meidanis (1997) [14] used a simple scoring system (three scores for a match, a mismatch, and a gap, respectively) and two DNA sequences to explain NW for how to calculate the values in the scoring matrix. Being a core content in the bioinformatics course which became recently a biological base, NW was taught to students with different major backgrounds. Given the available publications on such a programming, it remained difficult to teach NW to some students or new learners, especially the ones lacked the knowledge of dynamic programming. Developing new methods to teach NW was sometimes necessary. The methods of exhaustive search might be easy for some students who did not have strong mathematical background, even though using them to solve pair sequence comparison was an NP-complete problem. If a teaching scenario could deduce a dynamic programming using an exhaustive search method, the NW could be easily (1) 194 Biochemistry and Molecular Biology Education

TABLE 1 understood. Furthermore, a new perspective on NW could help in obtaining new questions or in putting forward better alignment algorithms. To facilitate the above, the present study developed a teaching method. Methods Suppose a Pair of Sample Sequences For a simple example, suppose a pair of DNA sequences, ac (Sequence M) and at (Sequence N), to be globally aligned. Alignment Extension To easily explain alignment extension, the positions of the alignment are shown in Table 1. When a position is extended, there are three possible situations: (1) Sequence M provides a character and Sequence N provides a gap, (2) Sequence M provides a character and Sequence N provides a character, and (3) Sequence M provides a gap and Sequence N provides a character. Scoring System A scoring system must be confirmed when the alignment was carried out. To more easily explain the method, three simple scores for each position of two-sequence alignment [15] were selected: 8 Matching characters : 11 >< Mismatching characters : 21 >: A character aligning a gap : 22 Results The positions of the pair alignment 1 st position 2 nd position 3 rd position 4 th position Sequence M: a c Sequence N: a t Note: this table shows only one of the results of the global alignment of the sample sequences. In the exhaustive search method, there were 13 complete results of the example (Fig. 1). When there was at least one character in both sequences, the alignment extended in three directions; when there were no characters left in one sequence but not the other, the alignment extended in only one direction, and if there were no characters left in either sequence, the alignment stopped the extension. There are nine combinations of characters from Sequence M and Sequence N (Fig. 2). Based on the simple scoring system noted, there is a maximum alignment score, representing the best alignment, in each combination of characters. As noted above, in the 13 combinations of all characters from both sequences, the maximum of the thirteen scores is 0, indicating the best global alignment, ac vs. at, of the two sequences. It can delete some of the derived alignments from the same combination of characters. For an example, there were three forms of alignment in the combination of two characters of a from each sequence, and each alignment extended to three derived alignments, thereby producing nine resulting alignments (Fig. 3). In each derived group of three alignments, the added characters in the next position were the same, as shown below: 8 " Sequence M : c >< >: Sequence N : 2 " Sequence M : c Sequence N : t " Sequence M : 2 Sequence N : t The added characters in all extended positions in each derived group of the three alignments were also the same, as follows: 8 " Sequence M : c2 Sequence N : 2t " >< Sequence M : c >: Sequence N : t " Sequence M : 2c Sequence N : t2 That is, if the combination of the front characters (no matter how many gaps) in the sequences were the same, the characters added to the extended positions of the derived alignments were uniform. Thus, a comparison of the front characters in the sequences can lead to the deletion of the alignments that were derived from the lower-score alignments of the front characters. Nine boxes containing alignments from 0 aligning 0 to two characters aligning two characters were set in a 3 3 3 matrix (Fig. 4). In the matrix, the positions of alignments could extend from the left boxes to the right boxes, from up to down, or from left-up to right-down. If the position of an alignment extended from the left to right for one box, one character from Sequence M was added to the top sequence in the extended position and one gap was added in the Xu et al. 195

Biochemistry and Molecular Biology Education FIG 1 The global alignment of the two DNA sequences, ac and at, using an exhaustive search method. The last row is the scores of the complete alignments through the utilisation of a simple score system, 11 for a match, 21 for a mismatch, and 22 for a gap in each position of an alignment. 196 From Exhaustive Search Method to Needle Wunsch Algorithm

FIG 2 Nine combinations of characters from Sequence M (ac) and Sequence N (at) during the global alignment process. I, Sequence M provides zero characters and Sequence N provides zero characters. II, Sequence M provides one character and Sequence N provides zero characters. III, Sequence M provides two characters and Sequence N provides zero characters. IV, Sequence M provides zero characters and Sequence N provides one character. V, Sequence M provides one character and Sequence N provides one character. VI, Sequence M provides two characters and Sequence N provides one character. VII, Sequence M provides zero characters and Sequence N provides two characters. VIII, Sequence M provides one character and Sequence N provides two characters. IX, Sequence M provides two characters and Sequence N provides two characters.

Biochemistry and Molecular Biology Education FIG 3 Comparison of the alignments derived from the same combination of characters ( a from Sequence M and a from Sequence N) in an example. [Color figure can be viewed at wileyonlinelibrary.com] 198 From Exhaustive Search Method to Needle Wunsch Algorithm

FIG 4 3 3 3 matrix that can contain alignments from 0 to aligning two characters in Sequence M (ac) and Sequence N (at). The numbers in the boxes represent the numbers provided by the relative sequences. down sequence; if the position of an alignment extended from up to down for one box, one gap was added in the top sequence in the extended position and one character from Sequence N was added in the down sequence; if the position of an alignment extended from left-up to right-down for one box, one character from Sequence M and one character from Sequence N were added in the up sequence and the down sequence, respectively, in the extended position. With the topologic change, all alignments of the example could be filled in these boxes (Fig. 5). Different alignments in the same combinations of characters result in different scores. The highest score of each box was left, thereby omitting the others. The comparison of the front characters provides sufficient information in deciding which alignments were left; thus, when the positions of the alignments extended from left to right, from up to down, and from left-up to right-down, one box by one box, we can select the alignment with the highest score left in each box (Fig. 6). This method can save on calculating time and storage space compared with the exhaustive searching method (Fig. 1 or Fig. 5). When the scores replaced the character alignments in each box (Fig. 7), the NW emerges. Discussion Suggestion for Teaching Practices When we taught NW in classes, we first provide the students the example of the pair of sequences, ac (Sequence M) and at (Sequence N), and three simple scores for each position of two-sequence alignment, 11 for matching characters, 21 for mismatching characters, and 22 for a character aligning a gap. Then the students were required to find by themselves how many were the results of aligning the sequences and which one had the highest score. According to the responses of the students, we taught to them how to find all the results and introduced the exhaustive search tree. The students were also told that if the sequences were long enough, this NP-complete problem let us/computers do not able to find the answer in the effective limited time. Thus, NW was introduced in the way described in the present study. To make the students understand this deduction clearly, all the processes of the deduction were suggested to be written in the blackboard/ papers step by step, and sometimes asked the students to finish some of the procedures. A sample lesson plan was included in Table 2. Basing on their practice, the students could understand how to find all the results by the exhaustive search method, but this method was time- and energy- consumed. By observing the step-by-step deduction from the exhaustive search method to NW, sometimes joining in doing the deductive procedures, they could understand this algorithm, which was time-saved and suitable to computer programming, within the scope of their knowledge even if they had not enough mathematical background. In addition to the homework with a more complicated example, NW might be impressive in their brains. After the students grasped a complex dynamic programming deduced by a simpler exhaustive search method, they could understand easily why NW is a dynamic programming and why the highest score is obtained by using Eq. (1), and could acquire easily the knowledge of other chapters partly basing on NW, such as multiple sequence alignment, phylogenetic analysis, and database searching for similar sequences, and will have greater confidence in learning bioinformatics. Because of understanding the details of this deduction, some students in our classes triggered new ideas for sequence comparison in the course or in their later research projects. So the suggested activity fits into the learning and teaching of a bioinformatics course. Sequence alignment was the basis in a bioinformatics course, and NW was the basis of sequences alignment. Understanding NW helps the students easily catch other useful algorithms for sequence comparison. When sequence alignment is taught, we suggest the deduction from the exhaustive search method to NW is included in the series for teaching (Fig. 8). In the curriculum of bioinformatics major, Sequence Alignment should be an independent course after the basic mathematics course, and the deduction in the present study could be taught and discussed in the earlier chapters. If there is a bioinformatics course opening to students with different major backgrounds, sequence alignment is the separate chapter, and in such limited time NW, which is taught from the exhaustive search method, may be the single algorithm needed the detailed deduction. The Understanding of the Needleman Wunsch Algorithm From the Viewpoint of Research In 1970s, although NW widely used by the biological community, it was not mathematically proved, lacked the Xu et al. 199

Biochemistry and Molecular Biology Education FIG 5 Alignments of the two DNA sequences, ac and at, as filled in the 3 3 3 matrix of boxes shown in Fig. 4. 200 From Exhaustive Search Method to Needle Wunsch Algorithm

FIG 6 The selection of the optimal alignment with the highest score in each box in a 3 3 3 matrix of the alignment of the two DNA sequences, ac and at, from Figs. 4 and 5. The alignments of each box were deduced from three directions: an additional character in the top sequence provided from Sequence M and an additional gap in the down sequence were added in the alignment from the left box; an additional gap in the top sequence and an additional character in the down sequence provided from Sequence N were added in the alignment from the up box; and an additional character in the top sequence provided from Sequence M and an additional character in the down sequence provided from Sequence N were added in the alignment from the left-up box. The alignments in the boxes in the first line of the matrix were deduced from only the left boxes, and those in the first row were deduced from only the up boxes. The optimal alignment with the maximum score in a simple score system, e.g., 11 for a match, 21 for a mismatch, and 22 for a gap in each position of an alignment, was selected. widely useful store matrix, and sometimes was not sensitive to find the local similarity [11]. Smith and Waterman provided the proof of NW and suggested the suitable store system (1981b) for this algorithm. In fact, the format of NW described above was not the origin one which was published in 1970. The improved NW co-contributed by Waterman and other scientists was easier to express and deduce. NW first provided iterative matrix method to find sequence homology, but it focused on the similarity of the whole sequences and alignment of the whole length, so the more meaningful local homology, such as a gene with short sequence in a longer DNA sequence, may be neglected. Smith-Water algorithm, an algorithm revised from NW, and other algorithms for local alignment could be more agile to find homological segments in the long sequences [10]. As an earlier heuristic homology algorithm for sequence comparison and one of the basic algorithms in bioinformatics, NW was used for global alignment but not suitable to be a search program, and its improved transforms, like the Smith-Waterman algorithm, was better to search meaningful sequences, take BLAST programming as an instance. Previous works have attempted to explain how to calculate the score matrix of NW to make it more useful [1,10,16]. Although the computation is easily described [Eq. (1)], the previous deduction of the NW, e.g., the inspiration partly from the dot matrix or the later mathematical proofs, had not been clearly performed for the new learners. Not thoroughly understanding the foundational algorithm may impact the users understanding of the significance of the results obtained using the programs involved with the NW and may inhibit the development of further Xu et al. 201

Biochemistry and Molecular Biology Education FIG 7 The scores replaced the character alignments in the boxes of Fig. 6. The scores of the alignments were calculated with a simple score system, 11 for a match, 21 for a mismatch, and 22 for a gap in each position of an alignment. [Color figure can be viewed at wileyonlinelibrary.com] dynamic algorithms for obtaining optimal alignment results, especially in multi-alignment. Rather than listing the alignment scores the present study extended the alignment positions one by one from an exhaustive search method to obtain the optimal alignment. The exhaustive search method for comparing two sequences is time-consuming but more easily understood by new users and learners than the previous algorithms. Thus, the gap between the basic concept and the algorithm of sequence alignment is bridged in the present study. Benefiting from an easy understanding of the NW and the information from the above deduction, some teaching skills of relative algorithms could be mined. Calculating the number of global pair alignments was previously estimated [17,18], but still represented a mathematical question for some researchers. In Sequence I (i is the number of characters in this sequence) and Sequence J (j is the number of characters in this sequence), the number of all possible results for global alignment (R ij ) ranges from 3 maximum of i and j to 3 i1j.r ij could be calculated in detail; however, some methods were complicated, as in the following: 8 1; ðwhen j50þ 2i11; ðwhen j51þ >< R i;j 5 2i 2 12i11; ðwhen j52þ 4i 3 16i 2 18i13 ; ðwhen j53þ >: 3 Based on the deduction in this study, the values of R ij could be listed in a matrix when they were calculated (Figs. 4 and 5) shown as the following: R o;o 51 R 1;o 51 R 2;o 51 R 3;o 51 R 4;o 51 R 5;o 51 R o;1 51 R 1;1 53 R 2;1 55 R 3;1 57 R 4;1 59 R 5;1 511 R o;2 51 R 1;2 55 R 2;2 513 R 3;2 525 R 4;2 541 R 5;2 561 R o;3 51 R 1;3 57 R 2;3 525 R 3;3 563 R 4;3 5129 R 5;3 5231 R o;4 51 R 1;4 59 R 2;4 541 R 3;4 5129 R 4;4 5321 R 5;4 5681 R o;5 51 R 1;5 511 R 2;5 561 R 3;5 5231 R 4;5 5681 R 5;5 51683 From the above matrix, an iterative formula to calculate R ij was obtained: (2) 202 From Exhaustive Search Method to Needle Wunsch Algorithm

TABLE 2 A sample lesson plan for teaching the Needleman Wunsch algorithm Items Course Chapter Lesson title Lesson objectives Summary of actions Materials/ Equipment References Description An introduction to bioinformatics Alignment of pairs of sequences From the exhaustive search method to the Needleman Wunsch algorithm Let the students who may not have strong mathematical background easily understand the Needleman Wunsch algorithm by using the deduction beginning with the exhaustive search method. The students are first provided the example of two sequences to find the alignment result with the highest score. The teacher then introduced the exhaustive search tree to find all the results, from which the highest score one was found. To save the effective time, the Needleman Wunsch algorithm is introduced in the way described in the present study. Markers and a whiteboard, or chalks and a blackboard [1] The present study. [2] Mount, D. W. (2004]) Bioinformatics: Sequence and genome analysis (the 2nd edition). New York: Cold Spring Harbor Laboratory Press. Homework [3] Setubal, J. C. and Meidanis J. (1997) Introduction to Computational Molecular Biology. Boston: PWS Publishing Company. 1. Using the Needleman Wunsch algorithm to align a pair of amino acids sequences, MTP and MSRDETHTP, with three simple scores, 11 for matching characters, 21 for mismatching characters, and 22 for a character aligning a gap. 2. Aligning a pair of amino acids sequences, MTP and MSRDETHTP, with the score system of BLOSUM62, 210 for a gap opening penalty, and 20.5 for a gap extension penalty. R i;j 5R i21;j21 1R i21;j 1R i;j21 (3) the total of the alignments of any length of two sequences was easily calculated. FIG 8 The suggestion for teaching sequence alignment in a bioinformatics course. This research helps to reconsider time-saving methods. Because the trace-back was performed after the completion of the computation of the score matrix [1,2,10,14], each score needed space for simultaneous storage. Moreover, memory-efficient algorithms improved from NW were created [4,12,13,19]. In the present study, when the subalignment matrix was used and the alignments in the boxes were performed from left to right and from up to down, computer space was required for only the sub-alignments in the current line/row of boxes, and the memory needed for the previous lines/rows of boxes was no longer required. But larger spaces to save the characters presented trade-offs in some situations. The understanding of NW presented herein represents a new approach that might lead to new applications, new methods, or algorithms. These warrant further study. Xu et al. 203

Biochemistry and Molecular Biology Education Conclusion An exhaustive search for pair alignment, which is an NPcomplete problem, is herein more easily understood. The NW presented herein required limited time and space to be run by computers, and a deduction of its exhaustive origin was reported to rich the teaching methods of NW. Topological transformation (Fig. 5) was a bridge to an exhaustive search (Fig. 1) with the NW. The success of the deduction from an exhaustive method to NW in the present study facilitates an understanding of the foundational dynamic algorithm of global pair alignment and encourages new thinking in the exploration and application of alignment algorithms. References [1] Needleman, S. B., and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443 453. [2] Mount, D. W. (2004) Bioinformatics: Sequence and Genome Analysis, 2nd ed., Cold Spring Harbor Laboratory Press, New York. [3] Chakraborty, A., and Bandyopadhyay, S. (2013) FOGSAA: Fast optimal global sequence alignment algorithm. Sci. Rep. 3, 1746. [4] Chao, K.-M. Hardison, R. C., and Miller, W. (1994) Recent developments in linear-space alignment methods: A survey. J. Comp. Biol. 1, 271 291. [5] Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acid Res. 22, 4673 4680. [6] Altschul, S. F. Madden, T. L. Sch affer, A. A. Zhang, J. Zhang, Z. Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid. Res. 25, 3389 3402. [7] Chenna, R. Sugawara, H. Koike, T. Lopez, R. Gibson, T. J. Higgins, D. G., and Thompson, J. D. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497 3500. [8] Huang, X., and Chao, K. (2003) A generalized global alignment algorithm. Bioinformatics 19, 228 233. [9] Huang, W. Umbach, D. M., and Li, L. (2006) Accurate anchoring alignment of divergent sequences. Bioinformatics 22, 29 34. [10] Smith, T. F. and Waterman, M. S. (1981a) Identification of common molecular subsequences. J. Mol. Biol. 147, 195 197. [11] Smith, T. F., and Waterman, M. S. (1981b) Comparison of biosequences. Adv. Appl. Math. 2, 482 489. [12] Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705 708. [13] Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359 373. [14] Setubal, J. C. and Meidanis J. (1997) Introduction to Computational Molecular Biology. Boston: PWS Publishing Company. [15] Xu, Z. N., Ed. (2008) Bioinformatics. Beijing: Tsinghua University Press. [16] Waterman, M. S. Smith, T. F., and Beyer, W. A. (1976) Some biological sequence metrics. Adv. Math. 20, 367 387. [17] Waterman, M. S. (1994) Parametric and ensemble sequence alignment algorithms. Bull. Math. Biol. 56, 743 767. [18] Waterman, M. S. Eggert, M., and Lander, E. (1992) Parametric sequence comparisons. Proc. Natl. Acad. Sci. U. S. A. 89, 6090 6093. [19] Hirschberg, D. S. (1975) A linear space algorithm for computing maximal common subsequences. Commun. ACM 18, 341 343. 204 From Exhaustive Search Method to Needle Wunsch Algorithm