CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT

Size: px
Start display at page:

Download "CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT"

Transcription

1 92 CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT 4.1 INTRODUCTION The major tasks of pattern classification in the given DNA sample, query pattern searching in the target database and global alignment of given protein sequences using both existing and proposed system along with the performance evaluation is discussed in this chapter. 4.2 PATTERN CLASSIFICATION FOR GENERATING UNIQUE IDENTIFICATION NUMBER The automation of generating unique identification number from a given Human DNA sample can be achieved using the proposed system which classifies the sequence into valid sequences and invalid sequences for which the unique identification numbers is generated.the identification numbers of valid sequences are checked for its repetition, from which the unique identification number of the individual is found. Various representations of nucleotides present in DNA and RNA is shown in Table 4.1. The set of valid sequences (VS) identified using the proposed system for the given DNA sample with Base pair = 32 and Sequence=25 are shown in Figure 4.1.

2 93 Table 4.1 Representation of nucleotides present in DNA and RNA S.No. Nucleotides Presence Character Fuzzy &Color Equivalent Representation 1 Adenine DNA/RNA A Thymine DNA T Guanine DNA/RNA G Cytosine DNA/RNA C Uracil RNA U 0.5 DNA SAMPLE: HUMAN- 1[BASE PAIR=32, SEQUENCE =25] AATGTGTTGTGTGACCCCTCAAAATCTCTCAAATGTGTTTTTACAC TCCGTTGGTAATATGGAATGTGTTAAAGTTGCTACCCGGGGTTTT TTAATGTGTCTCT Figure 4.1 Identification of valid sequences from the sample

3 94 VS [1] = ,VS [2] = ,VS [3] = , VS [4] = ,VS [5] = ,VS [6] = , VS [7] = ,VS [8] = ,VS [9] = , VS [10] = ,VS [11] = ,VS [12] = , VS [13] = The Unique Identification Number is: PATTERN SEARCHING TECHNIQUES Pattern Searching in DNA Sequence using Hash Coding Technique This is a method particularly suited for fast, ungapped searches of a small sequence, sequence pattern or motif through a large database. The word hash derives from the name hash table commonly used in computer science for a look-up table constructed from a database. Such tables represent an abstract or index of the informaton present in the database and are often used for fast searches.a hash table is also called an associative array, where a specific name or key is associated with each piece of data or value stored in it. As applied in sequence searching, hashing is the process of breaking the sequence into small words or k-tuples of a specified size and creating a hash table with those words keyed to position numbers. The values associated with each key are stored in buckets.the number of buckets obviously is the same as the number of keys specified and this is dependent on database, that match and the offset is saved.in general hashing reduces the complexity of the search problem to the order of the length of all the sequences in the database.pattern searching for the sample

4 95 shown in Figure 4.2 is done using hash coding technique, for the key size k = 1there will be 4 buckets, for key size k = 2 the number of buckets is 16 and the values obtained are the positions of each key value in the target sequence,similar hash table is constructed for the query sequence with k-tuple values k = 1 and k = 2, resulting in the position of each key value in the query sequence and the offset value which yields the number of matches is obtained by subtracting the values in target sequence with query sequence and finally the performance of the algorithm is analysed. Figure 4.2 Sample of pattern searching Analysis of Hash Coding Algorithm in Pattern Searching In hash coding method if the k-tuple is large, the speed is high,specificity, that is the ability to pick up accurate and meaningful matches is high and the sensitivity, that is the approximate or distant matching is low.conversely, if the k-tuple is small, the speed is low, the specificity is low, but the sensitivity is high.the hash coding algorithm is used in very widely used sequence matching and search programs BLAST and FASTA which are currently available in National Center for Biotechnology Information (NCBI) tool. In these case since the comparison

5 96 is executed in each repetition of the loop, the comparison operation is considered as the algorithm s base operation. Target Database Let c1 and c2 are the count of base operation performed in the hash table for k-tuple (k = 1) that is A, T, G, and C and for k-tuple (k = 2) that is AA, AT, AG, AC CC respectively. For target input of size 43 c1 (43) = 172 c2 (43) = 172 Count of base operation for target database ct (43) = c1 (43) +c2 (43) = 344 Query Pattern Let c3 and c4 are the count of basic operation performed in the hash table for k-tuple (k = 1) that is A, T, G, and C and for k-tuple (k = 2) that is AA, AT, AG, AC CC respectively. For query pattern of size 7 c3 (7) = 28 c4 (7) = 28 Count of base operation for query pattern cq (7) = c3 (7) +c4 (7) = 56 The total count of base operation in hash coding algorithm for input of size 50, C hash (50) is given by

6 97 C hash (50) = ct (43)+cq (7) ; C hash (50) = ; C hash (50)=400. The run time efficiency of hash coding algorithm, for the input of size 50, T hash (50) isgiven by T hash (50) C hash (50) = 400/ Seconds where, one base operation = 1/100 th of a second Pattern Searching in DNA Sequence using NFPR Technique The objective of NFPR algorithm is to generate unique identification number of given Human DNA sample and to check whether query pattern is present in the given target database. Step 1 Training inputs to the Neural-Fuzzy processor is normalized using various condtions as given in Table 3.1. Step 2 Generating Weight for Inference and Categoy for Inference using Ignition Function (IGF) and Tracking Function (TRF) as in Table 3.3. Step3 Inference of Category for the nucleotide pairs in Target database is done using Category Inference Function(CIF) as in Table 3.7. Step4 Generate unique identification number for all valid sequence in Target Database using Equation (3.8). Step5 Generate unique identification number for the Query Pattern using Equation (3.8).

7 98 Step6 Compare the identification number of Query Pattern with all the identification number of Target database and if there is a match then the pattern is confirmed in the database, if no match the absence of pattern in the database is confirmed. shown in Figure 4.3. The Separator Output of NFPR Processor for Target Database is Figure 4.3 Separator output of NFPR processor Various valid sequence of Target database Valid sequence [VS1] = Valid sequence [VS2] = Valid sequence [VS3] = Valid sequence [VS4] = Valid sequence [VS5] = Valid sequence [VS6] = Output for Query Pattern Query pattern =

8 Classification performed in NFPR System The proposed NFPR system classifies the given Human DNA sample into valid sequence (AA, AT, AG, AC, TA,TG, TC, GA, GT, GC, CA,CT,CG) and invalid sequence(tt,gg,cc); if the sequence pair is identified as valid, the consecutive five nucleotide base is considered as one complete sequence. If the sequence pair is identified as invalid, the system will consider the second nucleotide base in the pair with the next nucleotide base in the sample as a new pair and check for the category valid or invalid, if it s also invalid, the system will consider the second nucleotide base in the current pair with the next nucleotide base in the sample and so on. For example consider the following DNA sample: AATGTGTTGTGTGACCCCTCAAAATCTCTCAAATGTGTT TTTACACTCCGTTGGTAATATGGAATGTGTTAAAGTTGCTACCCG GGGTTTTTTAATGTGTCTCT Sequence pair (valid) Sequence pair (invalid) AATGTGT TGTGTGA CCC CTCAAAA TCTCTCA AATGTGT TTT One complete sequence Invalid sequence pairs or Valid sequence TACACTC CGTTGGT AATATGG AATGTGT TAAAGTT GCTACCC G GG GTTTTTT AATGTGT CTCTXXX

9 Analysis of Proposed NFPR Technique in Pattern Searching Let c1 is the count of base operation performed in the training and inference, c2 is the count of base operation performed in comparison of identification numbers in proposed NFPR system. For target input of size 50: c1 (50) = = 109; c2 (50) = 6; The total count of base operation in NFPR algorithm for input of size 50, C nfpr (50) is given by C nfpr (50) = c1 (50) +c2 (50) ; C nfpr (50) = =115; The run time efficiency of NFPR algorithm, for input of size 50, T nfpr (50),where one base operation =1/100 th of a second, is given by T nfpr (50) C nfpr (50) = 115/ Seconds in Figure 4.4. The base operation of NFPR and Hash coding algorithm is shown Figure 4.4 Base operation of NFPR and Hash coding algorithm

10 SEQUENCE ALIGNMENT TECHNIQUES It is a process of determination of the order of nucleotides in a DNA or RNA molecule or the order of amino acids in a protein. Alignments are the basis of sequence analysis methods and are used to pinpoint the occurrence of conserved motifs. Motifs are consecutive string of amino acids in a protein sequence whose general character is repeated, or conserved, in all sequences in a multiple alignment at a particular position. Sequence similarity measures can be classified into as either global or local. There are two mathematical aspects to sequence alignments. The first is the algorithm used to find sequence similarities and the second is the method used to determine which similarities are interesting and important. Similarities between sequences can be studied using methods such as dot plot method and dynamic programming algorithms such as Needleman-Wunsch algorithm and Smith-Waterman algorithm used by FASTA and BLAST programs. The objective is to determine the optimum match of two protein sequences. Usually the matching procedures such as Needleman-Wunsch algorithm involves scoring schemes that impose a gap penalty every time a skip is made in one or the other sequences in order to improve the degree of matching. The matching itself may be of a sort whereby only the identities are scored or it may involve a weighted scale that gives a partial credit for matched amino acids that are structurally similar or that are genetically similar or evolutionary favored. All these factors are taken into account, i.e. matched identities or similar residues counting positively and gaps counting negatively and an alignment score may be computed.

11 Dynamic Programming using Needleman-Wunsch Algorithm The dynamic programming provides a reliable computational method for aligning DNA and protein sequences. It is a computational method that is used to align two protein or nucleic acid sequences. The method is very important for sequence analysis because it provides the alignment between sequences. This method compares every pair of characters in the two sequences and generates an alignment. An alignment is generated by starting at the ends of the two sequences and attempting to match all possible pairs of characters between the sequences and by following a scoring scheme for matches, mismatches and gaps. This alignment will include matched and mismatched characters and gaps in the two sequences that are positioned so that the number of matches between identical or related characters is the maximum possible. The dynamic programming method usually used for global alignment of sequences is Needleman-Wunsch. This algorithm will maximize the number of matches between the sequences along the entire length of the sequences. For protein sequences, the simplest system of comparison is one based on identity. A match in an alignment is only scored if the two aligned amino acids are identical. This procedure generates a matrix of number that represents all possible alignments between the sequences. The highest set of sequential scores in the matrix defines an optimal alignment. The dynamic programming method is guaranteed in a mathematical sense to provide the optimal alignment for a given set of user-defined variables, including choice of scoring matrix and gap penalties. Gaps may also be present at the ends of sequences, in case there is extra sequence left over after the alignment. These end gaps are often but not always, given a gap penalty.

12 Global Alignment of Protein Sequences using National Center for Biotechnology Information (NCBI) Tool The process of performing global alignment of two protein sequences using NCBI-BLAST tool which has been implemented using Needleman-Wunsch technique is discussed below. Various character and color representation of amino acids present in protein sequences is given in Table 4.2. Table 4.2 Representation of amino acids present in protein sequences S.No. Amino acids Type Character & Color Representation Fuzzy Equivalent 1 alanine Hydrophobic A cysteine Hydrophilic C aspartic acid charged(-ve) D glycine Hydrophobic G lysine charged(+ve) K leucine Hydrophobic L methionine Hydrophobic M asparagine Hydrophilic N proline Hydrophobic P glutamine Hydrophilic Q arginine charged (+ve) R tyrosine Hydrophilic Y Space S 0.50 The subject sequence and query sequence which are to be globally aligned using NCBI-BLAST tool is given in Figure 4.5.

13 104 Figure 4.5 Sample of protein sequences for aligning The output generated for the above sample of protein sequences by NCBI-BLAST is given in Table 4.3 and also the aligned sequence generated by NCBI-BLAST is shown in Figure 4.6. Table 4.3 Generated output from NCBI-BLAST Query ID : lcl >lcl unnamed protein product Description : None Length : 3 Molecule type : amino acid NW Score : 23 Query Length : 12 Identities : 6/13 (46%) Subject ID : Gaps : 1/13 (8%) Description : None Mismatches : 6 Molecule type : amino acid Query 1AGC-GNRCKCRYP 12 A C G +C CR Sbjct 1ADCNGRQCLCRPM 13

14 105 Figure 4.6 Aligned sequences generated by NCBI-BLAST Number of Matches / Identities ( ) = 6/13 (46%) Number of Mismatches ( ) = 6 Number of gaps ( ) = 1 Alignment Efficiency =46% Analysis of Needlman-Wunsch Technique in Sequence Alignment Let c1 is the count of base operation performed in building matrix and c2 is the count of base operation performed in trace back operation of Needleman-Wunsch algorithm. For sequence input of size 25 c1 (25) = 132; c2 (25) = 156 The total count of base operation in Needleman-Wunsch algorithm for input of size 25, C needl (25) is given by C needl (25) = c1 (25)+c2 (25) ; C needl (25) = C needl (25)= 288

15 106 The run time efficiency for Needleman-Wunsch algorithm, for input of size 50, T needl (25) isgiven by T needl (25) C needl (25) = 288/ Seconds where, one base operation = 1/100 th of a second Classification Performed in NFSA Algorithm The proposed NFSA system classifies the given normalized amino acid sample into match (AA, CC, DD, GG, KK, LL, MM, NN, PP, QQ, RR) and no-match (AC, CD,) category; if the amino acid pair is identified as match, the location of first amino acid in the pair is stored and the next two consecutive amino acids after the second in the pair is considered as a new pair for classification. If the amino acid pair is identified as no-match, the system will consider second amino acid in the pair with the next amino acid in the sample as a new pair and check for the category match and no-match, if it s also no-match the system will consider the second amino acid in the current pair with the next amino acid in the sample and so on. For example consider the following normalized sample: AADGCCNGGNRRQCCKLCCRRYPPM Amino acid pair (match); location 1, 5,8,11.., are stored AADGCCNGGNRRQCCKLCCRRYPPM Amino acid pair (no-match)

16 Figure 4.7 Screen shot of existing NCBI tool used in sequence alignment 107

17 Global Alignment of Protein Sequences using Proposed NFSA Technique The objective of proposed NFSA algorithm is to perform global alignment of protein sequences in order to identify number of matches, gaps and mismatches. The Neural-Fuzzy processor is firstly trained with all the amino acids that are present in the protein sequences by its fuzzy equivalent to bring them either in match or no-match category. The normalized form of sequence1 and sequence 2 that are to be aligned is generated as shown in Figure 4.8. Figure 4.8 Normalized sequences The various processes performed in the associator are: a) Using the generated weights for inference and category for inference from Neural-Fuzzy processor, the associator identifies the location of match sequences indicated by ( ) as shown in Figure 4.9. {1, 5, 8, 11, 14, 18, 20, 23} indicates there are 8 matches between sequences.

18 109 b) Generate set of values for the above identified matched locations such as {(1, 2) (5, 6) (8, 9) (11, 12) (14, 15) (18, 19) (20, 21) (23, 24)} That is {(1, 1+1) (5, 5+1)..(23, 23+1)} Figure 4.9 Location of match sequences c) Create a single set from above set of values as below A= {1, 2, 5, 6, 8, 9, 11, 12,14,15,18, 19, 20, 21, 23, 24} d) Identify the missing elements in above A and create a set B B= {3, 4, 7, 10, 13, 16, 17, 22, 25} e) Check every alternate element in the above set A for even or odd, if it s even interchange the element with very next element as A= {(1,2),(5,6),(9,8),(11,12),(15,14),(19,18),(21,20),(23,24)} f) Check whether the element in the above set B are continuous if not continuous check for even or odd,if it s even add and element S before the element if odd add S after the element as B = {(3, 4), (7, S), (S, 10), (13, S), (17, 16), (S, 22), (25, S)} g) Map the elements of set A and set B one after another as in Figure 4.10

19 110 h) Check for other set of possible matches using set of values from process (b) {(1, 2) (5, 6) (8, 9) (11, 12) (14, 15) (18, 19) (20, 21) (23, 24)} that is for set (a, b) >1 check whether element a, b and a-1, b+1 elements are same if so other possible match is found. Set A with values greater than 1 are {(5, 6) (8, 9) (11, 12) (14, 15) (18, 19) (20, 21) (23, 24)} for set (5, 6): 5 th, 6 th element and 5-1=4 th, 6+1=7 th element are not same so there is no possibility for another match.but For set (8, 9): 8 th, 9 th element and 8-1=7 th, 9+1=10 th element are same so another possible match is found between 7 th and 10 th element (7,10) as shown in Figure Figure 4.10 Mapped sequences of proposed NFSA system Also it s found that there is no possible match for other sets (11, 12) (14, 15) (18, 19) (20, 21) (23, 24)making the total number of matches in proposed system as 9. i) Append set A and set B to create set C {(1,2),(3,4),(5,6),(7,S),(9,8),(S,10)(11,12),(13,S)(15,14),(17,16), (19,18),(21,20),(S,22),(23,24),(25,S)}

20 111 Figure 4.11 Aligned sequences of proposed NFSA system Number of Matches /Identities ( ) = 9/13(70%) Number of Mismatches ( ) = 2 Number of gaps ( ) =5 Algorithm Efficiency Number of Comparisons used in alignment = (Training+ Inference+ Aligning) =132 Comparisons Alignment Efficiency =70% The alignment efficiency and base operation of Needleman-Wunsch and NFSA is shown in Figure 4.12 and Figure 4.13.

21 112 Figure 4.12 Alignment efficiency of Needleman-Wunsch versus NFSA Analysis of Proposed NFSA Technique in Sequence Alignment Let c1 is the count of base operation performed in the training and inference, c2 is the count of base operation for aligning the sequence in proposed NFSA system. For the sequence input of size 25 c1 (25) = 102; c2 (25) = 30; The total count of base operation for proposed NFSA algorithm for input sequence of size 25, C nfsa (25) is given by C nfsa (25) = c1 (25)+c2 (25) ; C nfsa (25) = ; C nfsa (25) =132;

22 113 The run time efficiency of proposed NFSA algorithm, for the input sequence of size 25, T nfsa (25) is given by T nfsa (25) C nfsa (25) = 132/ Seconds where, one base operation = 1/100 th of a second. Figure 4.13 Base operation of NFSA and Needleman-Wunsch

23 IDENTIFICATION OF MUTATION IN HUMAN DNA The objective of NFPR algorithm is also to precisely identify the location of occurrence of mutation in given Human DNA sample using identification number, by classifying it into valid and invalid sequence. Case 1 Before Mutation VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGT TGTGTGA C C C CTCAAAA TCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTC CGTTGGT AATATGG AATGTGT TAAAGTT VS11 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G G G GTTTTTT AATGTGT CTCTXXX After Mutation in Valid sequence [Point mutation] VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGT TGTGTGA C C C CTCA C AA TCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTC CGTTGGT AATATGG AATGTGT TAAAGTT VS11 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G G G GTTTTTT AATGTGT CTCTXXX In case 1 the point mutation occurs in valid sequence VS 1,3 by the mutant C which can be identified with the change in identification number of VS 1,3 as shown in Table 4.4 with no alteration in its invalid sequence as shown in Table 4.5, resulting in a change in polypeptide sequence, it might change the shape or function of the protein, depending on where in the sequence occurs. The process of point mutation is shown in Figure 4.14.

24 115 Table 4.4 Discriminator-D1 outputs for valid sequence of HUMAN-1 before and after point mutation Valid Sequence (VS) Identification Number Before Mutation Repeated Sequence (RS) Identification Number After Mutation Repeated Sequence (RS) VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS VS 1, VS 1, VS 1, VS 1, RS VS 1, VS 1, VS 1, VS 1, RS VS 1, RS VS 1, Table 4.5 Discriminator-D2 outputs for invalid sequence of HUMAN-1 before and after point mutation Invalid Sequence (IS) Identification Number Before Mutation Identification Number After Mutation IS 1, IS 1, IS 1,

25 116 Figure 4.14 Point mutation Case 2 After Mutation in Valid sequence: [Frame shift mutation-insertion] VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGTTGTGTGA C C C CTCAAAATCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTCCGTTGGT AATATGGAATGTGT TAAAGTT VS11 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G CGGGTTT T T TAATGTG TCTCTXX

26 117 In case 2 the frame shift mutation (insertion) occurs in one of the invalid sequence IS 1,3 by the mutant C which alters the valid sequence VS 1,12 as shown in Table 4.6 and invalid sequence IS 1,3 as shown in Table 4.7which can be identified by the change in identification number of both valid and invalid sequences after mutation. The frame shift mutationinsertion results in a change of polypeptide sequence, it might change the shape or function of the protein, depending on where in the sequence occurs. The process of frame shift mutation-insertion is shown in Figure Table 4.6 Discriminator-D1 outputs for valid sequence of HUMAN-1 before and after frame-shift [insertion] mutation Valid Sequence (VS) Identification Number Before Mutation Repeated Sequence (RS) Identification Number After Mutation Repeated Sequence (RS) VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS VS 1,

27 118 Table 4.7 Discriminator-D2 outputs for invalid sequence of HUMAN-1 before and after frame-shift [insertion] mutation Invalid Sequence (IS) Identification Number Before Mutation Identification Number After Mutation IS 1, IS 1, IS 1, IS 1, Figure 4.15 Frame shift mutation-insertion

28 119 Case 3 After Mutation in Invalid sequence: [Point mutation (Neutral or Silent)] VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGTTGTGTGA C C C CTCAAAA TCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTCCGTTGGTAATATGGAATGTGTTAAAGTT VS11 IS3 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G G G G GTTTTTT AATGTGTCTCTXXX In case 3 the point mutation is occurs in same IS 1,3 as case 2 but with mutant G which only alters the invalid sequence IS 1,3 and can be identified only using the change in identification number of invalid sequence IS 1,3 as shown in Table 4.8 and the identification number of valid sequences remains unaltered as shown in Table 4.9. The point mutation results in no change of polypeptide sequence and also possible consequence for the organism is none. The process of point mutation (neutral or silent) is shown in Figure Table 4.8 Discriminator-D2 outputs for invalid sequence of HUMAN-1 before and after point [neutral or silent] mutation InvalidSequence (IS) IdentificationNumber Before Mutation IdentificationNumber After Mutation IS 1, IS 1, IS 1,

29 120 Table 4.9 Discriminator-D1 outputs for valid sequence of HUMAN-1 before and after point [neutral or silent] mutation Identification Identification Repeated Valid Repeated Number Number Sequence Sequence Sequence Before After (RS) (VS) (RS) Mutation Mutation VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1,

30 121 Figure 4.16 Point mutation-neutral or Silent Case 4 After Mutation in Valid sequence: [Frame shift mutation-deletion] VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGTTGTGTGA C C C CTCAAAA TCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTCCGTTGGTAATATGGAATGTGTTAAAGTT VS11 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G G G GTTTTTA ATGTGTC TCTXXX In case 4 the frame mutation-deletion occurred in valid sequence VS 1,12 by the removal of mutant T which can be identified with the change in identification number of valid sequence VS 1,12 as shown in

31 122 Table 4.10,with no alteration in any of the invalid sequence as shown in Table As a result change in polypeptide sequence occurs which might change the shape or function of the protein, depending on where in the sequence occurs. The process of frame mutation-deletion is shown in Figure Table 4.10 Discriminator-D1 outputs for valid sequence of HUMAN-1 before and after frame-shift [deletion] mutation Identification Identification Valid Repeated Repeated Number Number Sequence Sequence Sequence Before After (VS) (RS) (RS) Mutation Mutation VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS VS 1,

32 123 Table 4.11 Discriminator-D2 outputs for invalid sequence of HUMAN-1 before and after frame-shift [deletion] mutation Identification Identification Invalid Number Number Sequence Before After (IS) Mutation Mutation IS 1, IS 1, IS 1, Figure 4.17 Frame shift mutation-deletion

33 124 Case 5 After Mutation in Valid sequence: [Inversion mutation] VS1/RS VS2 IS1 IS1 IS1 VS3 VS4 VS5/RS AATGTGTTGTGTGA C C C CTCAAAATCTCTCA AATGTGT IS2 IS2 IS2 VS6 VS7 VS8 VS9/RS VS10 T T T TACACTCCGTTGGTAATATGGAATGTGTTTGAAAT VS11 IS3 IS3 IS3 VS12 VS13/RS VS14 GCTACCC G G G GTTTTTT AATGTGT CTCTXXX Table 4.12 Discriminator-D1 outputs for valid sequence of HUMAN-1 Valid Sequence (VS) before and after inversion mutation Identification Number Before Mutation Repeated Sequence (RS) Identification Number After Mutation Repeated Sequence (RS) VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS RS VS 1, VS 1, VS 1, VS 1, RS VS 1,

34 125 In case 5 the inversion mutation occurs in valid sequence VS 1,10 by replacing TAAAGTT with mutant TTGAAAT which can be identified with the change in identification number of valid sequence VS 1,10 as shown in Table 4.12 with no alteration in any of the invalid sequence as shown in Table The process of inversion mutation is shown in Figure The inversion mutation results in a change of polypeptide sequence which might change the shape or function of the protein, depending on where in the sequence occurs. Table 4.13 Discriminator-D2 outputs for invalid sequence of HUMAN-1 before and after inversion mutation Invalid Sequence (IS) Identification Number Before Mutation Identification Number After Mutation IS 1, IS 1, IS 1, Figure 4.18 Inversion mutation

Match the Hash Scores

Match the Hash Scores Sort the hash scores of the database sequence February 22, 2001 1 Match the Hash Scores February 22, 2001 2 Lookup method for finding an alignment position 1 2 3 4 5 6 7 8 9 10 11 protein 1 n c s p t a.....

More information

DNA is normally found in pairs, held together by hydrogen bonds between the bases

DNA is normally found in pairs, held together by hydrogen bonds between the bases Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Basic concepts of molecular biology

Basic concepts of molecular biology Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it Life The main actors in the chemistry of life are molecules called proteins nucleic acids Proteins: many different

More information

DNA.notebook March 08, DNA Overview

DNA.notebook March 08, DNA Overview DNA Overview Deoxyribonucleic Acid, or DNA, must be able to do 2 things: 1) give instructions for building and maintaining cells. 2) be copied each time a cell divides. DNA is made of subunits called nucleotides

More information

Chapter 10: Gene Expression and Regulation

Chapter 10: Gene Expression and Regulation Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must

More information

BIOLOGY. Monday 14 Mar 2016

BIOLOGY. Monday 14 Mar 2016 BIOLOGY Monday 14 Mar 2016 Entry Task List the terms that were mentioned last week in the video. Translation, Transcription, Messenger RNA (mrna), codon, Ribosomal RNA (rrna), Polypeptide, etc. Agenda

More information

Single alignment: FASTA. 17 march 2017

Single alignment: FASTA. 17 march 2017 Single alignment: FASTA 17 march 2017 FASTA is a DNA and protein sequence alignment software package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985.[1] FASTA is pronounced

More information

11 questions for a total of 120 points

11 questions for a total of 120 points Your Name: BYS 201, Final Exam, May 3, 2010 11 questions for a total of 120 points 1. 25 points Take a close look at these tables of amino acids. Some of them are hydrophilic, some hydrophobic, some positive

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari Introduction to Cellular Biology and Bioinformatics Farzaneh Salari Outline Bioinformatics Cellular Biology A Bioinformatics Problem What is bioinformatics? Computer Science Statistics Bioinformatics Mathematics...

More information

Basic concepts of molecular biology

Basic concepts of molecular biology Basic concepts of molecular biology Gabriella Trucco Email: gabriella.trucco@unimi.it What is life made of? 1665: Robert Hooke discovered that organisms are composed of individual compartments called cells

More information

Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity.

Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity. Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity of Alignment Dr.D.Chandrakala 1, Dr.T.Sathish Kumar 2, S.Preethi 3, D.Sowmya

More information

Algorithms in Bioinformatics ONE Transcription Translation

Algorithms in Bioinformatics ONE Transcription Translation Algorithms in Bioinformatics ONE Transcription Translation Sami Khuri Department of Computer Science San José State University sami.khuri@sjsu.edu Biology Review DNA RNA Proteins Central Dogma Transcription

More information

BIOLOGY LTF DIAGNOSTIC TEST DNA to PROTEIN & BIOTECHNOLOGY

BIOLOGY LTF DIAGNOSTIC TEST DNA to PROTEIN & BIOTECHNOLOGY Biology Multiple Choice 016074 BIOLOGY LTF DIAGNOSTIC TEST DNA to PROTEIN & BIOTECHNOLOGY Test Code: 016074 Directions: Each of the questions or incomplete statements below is followed by five suggested

More information

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall Biology Biology 1 of 39 12-3 RNA and Protein Synthesis 2 of 39 12 3 RNA and Protein Synthesis Genes are coded DNA instructions that control the production of proteins. Genetic messages can be decoded by

More information

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall Biology Biology 1 of 39 12-3 RNA and Protein Synthesis 2 of 39 Essential Question What is transcription and translation and how do they take place? 3 of 39 12 3 RNA and Protein Synthesis Genes are coded

More information

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function.

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function. HASPI Medical Biology Lab 0 Purpose Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function. Background http://mssdbio.weebly.com/uploads/1//7/6/17618/970_orig.jpg

More information

1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below.

1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below. 1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below. A-G-T-A-C-C-G-A-T A-G-T-G-A-T This type of alteration of the genetic information is an

More information

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these

More information

EE550 Computational Biology

EE550 Computational Biology EE550 Computational Biology Week 1 Course Notes Instructor: Bilge Karaçalı, PhD Syllabus Schedule : Thursday 13:30, 14:30, 15:30 Text : Paul G. Higgs, Teresa K. Attwood, Bioinformatics and Molecular Evolution,

More information

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Bioinformatics ONE Introduction to Biology Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Biology Review DNA RNA Proteins Central Dogma Transcription Translation

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with

More information

What is necessary for life?

What is necessary for life? Life What is necessary for life? Most life familiar to us: Eukaryotes FREE LIVING Or Parasites First appeared ~ 1.5-2 10 9 years ago Requirements: DNA, proteins, lipids, carbohydrates, complex structure,

More information

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural

More information

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are? 2 strands, has the 5-carbon sugar deoxyribose, and has the nitrogen base Thymine. The actual process of assembling the proteins on the ribosome is called? DNA translation Adenine pairs with Thymine, Thymine

More information

Neurospora mutants. Beadle & Tatum: Neurospora molds. Mutant A: Mutant B: HOW? Neurospora mutants

Neurospora mutants. Beadle & Tatum: Neurospora molds. Mutant A: Mutant B: HOW? Neurospora mutants Chapter 10: Central Dogma Gene Expression and Regulation Mutant A: Neurospora mutants Mutant B: Not made Not made Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are

More information

Chapter 14: From DNA to Protein

Chapter 14: From DNA to Protein Chapter 14: From DNA to Protein Steps from DNA to Proteins Same two steps produce all proteins: 1) DNA is transcribed to form RNA Occurs in the nucleus RNA moves into cytoplasm 2) RNA is translated in

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

DNA Structure and Replication, and Virus Structure and Replication Test Review

DNA Structure and Replication, and Virus Structure and Replication Test Review DNA Structure and Replication, and Virus Structure and Replication Test Review What does DNA stand for? Deoxyribonucleic Acid DNA is what type of macromolecule? DNA is a nucleic acid The building blocks

More information

DNA Sequence Alignment based on Bioinformatics

DNA Sequence Alignment based on Bioinformatics DNA Sequence Alignment based on Bioinformatics Shivani Sharma, Amardeep singh Computer Engineering,Punjabi University,Patiala,India Email: Shivanisharma89@hotmail.com Abstract: DNA Sequence alignmentis

More information

Genomics and Database Mining (HCS 604.3) April 2005

Genomics and Database Mining (HCS 604.3) April 2005 Genomics and Database Mining (HCS 604.3) April 2005 David M. Francis OARDC 1680 Madison Ave Wooster, OH 44691 e-mail: francis.77@osu.edu Introduction: Computers have changed the way biologists go about

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools CAP 5510: Introduction to Bioinformatics : Bioinformatics Tools ECS 254A / EC 2474; Phone x3748; Email: giri@cis.fiu.edu My Homepage: http://www.cs.fiu.edu/~giri http://www.cs.fiu.edu/~giri/teach/bioinfs15.html

More information

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks Rationale of Genetic Studies Some goals of genetic studies include: to identify the genetic causes of phenotypic variation develop genetic tests o benefits to individuals and to society are still uncertain

More information

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! Protein Synthesis/Gene Expression Why do we need to make proteins? To build parts for our body as

More information

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases. Bioinformatics is the marriage of molecular biology with computer

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool 14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren

More information

3. Use the codon chart to translate the mrna sequence into an amino acid sequence.

3. Use the codon chart to translate the mrna sequence into an amino acid sequence. Honors Biology 317 The Beads of Translation: Using Beads to translate DNA into a Polypeptide Bracelet Objectives: 1. Using plastic beads, construct a representation of "standard" sequence of amino acids

More information

(Very) Basic Molecular Biology

(Very) Basic Molecular Biology (Very) Basic Molecular Biology (Very) Basic Molecular Biology Each human cell has 46 chromosomes --double-helix DNA molecule (Very) Basic Molecular Biology Each human cell has 46 chromosomes --double-helix

More information

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling Molecular Modeling 2018 -- Lecture 8 Local structure Database search Multiple alignment Automated homology modeling An exception to the no-insertions-in-helix rule Actual structures (myosin)! prolines

More information

13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription.

13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription. 13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription. The Role of RNA 1. Complete the table to contrast the structures of DNA and RNA. DNA Sugar Number of Strands Bases

More information

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013 Deoxyribonucleic Acid DNA The Secret of Life DNA is the molecule responsible for controlling the activities of the cell It is the hereditary molecule DNA directs the production of protein In 1953, Watson

More information

What is necessary for life?

What is necessary for life? Life What is necessary for life? Most life familiar to us: Eukaryotes FREE LIVING Or Parasites First appeared ~ 1.5-2 10 9 years ago Requirements: DNA, proteins, lipids, carbohydrates, complex structure,

More information

Jay McTighe and Grant Wiggins,

Jay McTighe and Grant Wiggins, Course: Integrated Science 3/4 Unit #3: (DNA & RNA) Instructions for Life Stage 1: Identify Desired Results Enduring Understandings: Students will understand that Nearly all human traits, even many diseases,

More information

Chapter 13: RNA and Protein Synthesis. Dr. Bertolotti

Chapter 13: RNA and Protein Synthesis. Dr. Bertolotti Chapter 13: RNA and Protein Synthesis Dr. Bertolotti Essential Question How does information flow from DNA to RNA to direct the synthesis of proteins? How does RNA differ from DNA? RNA and protein synthesis

More information

DNA & PROTEIN SYNTHESIS REVIEW

DNA & PROTEIN SYNTHESIS REVIEW Name: Block: DNA & PROTEIN SYNTHESIS REVIEW 1. Give the purpose of each of the following steps in the process of protein synthesis. a) Ribosome moving along a mrna: (1 mark) b) Adenine bonding to thymine:

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Figure 1: Genetic Mosaicism

Figure 1: Genetic Mosaicism I. Gene Mutations a) Germinal Mutations: occur w/in the DNA of stem cells that ultimately form gametes. These are the only mutations that can be transmitted to future generations. b) Somatic Mutations:

More information

Transcription and Translation

Transcription and Translation Biology Name: Morales Date: Period: Transcription and Translation Directions: Read the following and answer the questions in complete sentences. DNA is the molecule of heredity it determines an organism

More information

A New Approach of Protein Sequence Compression using Repeat Reduction and ASCII Replacement

A New Approach of Protein Sequence Compression using Repeat Reduction and ASCII Replacement IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 5 (Mar. - Apr. 2013), PP 46-51 A New Approach of Protein Sequence Compression using Repeat Reduction

More information

Biology Celebration of Learning (100 points possible)

Biology Celebration of Learning (100 points possible) Name Date Block Biology Celebration of Learning (100 points possible) Matching (1 point each) 1. Codon a. process of copying DNA and forming mrna 2. Genes b. section of DNA coding for a specific protein

More information

Evolutionary Genetics: Part 1 Polymorphism in DNA

Evolutionary Genetics: Part 1 Polymorphism in DNA Evolutionary Genetics: Part 1 Polymorphism in DNA S. chilense S. peruvianum Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik Color code Color code: Red = Important result or definition

More information

CHapter 14. From DNA to Protein

CHapter 14. From DNA to Protein CHapter 14 From DNA to Protein How? DNA to RNA to Protein to Trait Types of RNA 1. Messenger RNA: carries protein code or transcript 2. Ribosomal RNA: part of ribosomes 3. Transfer RNA: delivers amino

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Central Dogma of Molecular Biology Proteins: workhorse molecules of biological systems Proteins are synthesized from the genetic blueprints

More information

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press

More information

Challenging algorithms in bioinformatics

Challenging algorithms in bioinformatics Challenging algorithms in bioinformatics 11 October 2018 Torbjørn Rognes Department of Informatics, UiO torognes@ifi.uio.no What is bioinformatics? Definition: Bioinformatics is the development and use

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna). 1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna). Which statement best describes the error shown in the diagram? (A) The mrna strand contains the uracil

More information

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below). Protein Synthesis Instructions The purpose of today s lab is to: Understand how a cell manufactures proteins from amino acids, using information stored in the genetic code. Assemble models of four very

More information

Problem: The GC base pairs are more stable than AT base pairs. Why? 5. Triple-stranded DNA was first observed in 1957. Scientists later discovered that the formation of triplestranded DNA involves a type

More information

Genes are coded DNA instructions that control the production of proteins within a cell. The first step in decoding genetic messages is to copy a part

Genes are coded DNA instructions that control the production of proteins within a cell. The first step in decoding genetic messages is to copy a part Genes are coded DNA instructions that control the production of proteins within a cell. The first step in decoding genetic messages is to copy a part of the nucleotide sequence of the DNA into RNA. RNA

More information

Helps DNA put genetic code into action RNA Structure

Helps DNA put genetic code into action RNA Structure 13.1 RNA Helps DNA put genetic code into action RNA Structure Single Stranded Nucleotides building blocks to RNA Ribose (5C sugar) Phosphate Group Nitrogenous base: Adenine, Uracil Guanine, Cytosine Disposable

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ CH 12 Review Multiple Choice Identify the choice that best completes the statement or answers the question. 1. How many codons are needed to specify three amino acids? a. 6 c. 3 b. 12

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Warm-Up: Check your Answers

Warm-Up: Check your Answers Warm-Up 1. What are the 3 components of a nucleotide? 2. What are the 4 nitrogen bases that are found in DNA? 3. What type of bonds are found between 2 nitrogen bases? 4. During DNA replication, what breaks

More information

A self- created drawing, collage, or computer generated picture of your creature.

A self- created drawing, collage, or computer generated picture of your creature. Protein Synthesis Project 40 pts Part I Due Date Part II Due Date Objectives: I can demonstrate how the genes in DNA are transcribed through RNA polymerase into mrna and translated by ribosomes, mrna,

More information

Name: Family: Date: Monday/Tuesday, March 9,

Name: Family: Date: Monday/Tuesday, March 9, Name: Family: Date: Monday/Tuesday, March 9,10 2015 Select the best answer for each question: Part 1: Multiple Choice (2 points each) 1. Protein Synthesis involves which two processes? a. DNA Replication

More information

DNA life s code. Importance of DNA. DNA Structure. DNA Structure - nucleotide. DNA Structure nitrogen bases. Linking Nucleotides

DNA life s code. Importance of DNA. DNA Structure. DNA Structure - nucleotide. DNA Structure nitrogen bases. Linking Nucleotides Importance of life s code molecule that makes up genes and determines the traits of all living things Controls by: producing proteins Proteins are important because All structures are made of protein Skin

More information

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

The combination of a phosphate, sugar and a base forms a compound called a nucleotide. History Rosalin Franklin: Female scientist (x-ray crystallographer) who took the picture of DNA James Watson and Francis Crick: Solved the structure of DNA from information obtained by other scientist.

More information

Protein Synthesis: Transcription and Translation

Protein Synthesis: Transcription and Translation Review Protein Synthesis: Transcription and Translation Central Dogma of Molecular Biology Protein synthesis requires two steps: transcription and translation. DNA contains codes Three bases in DNA code

More information

March 26, 2012 NUCLEIC ACIDS AND PROTEIN SYNTHESIS

March 26, 2012 NUCLEIC ACIDS AND PROTEIN SYNTHESIS NUCLEIC ACIDS AND PROTEIN SYNTHESIS MAIN MAIN TOPICS TOPICS TO TO BE BE COVERED COVERED THIS THIS UNIT: UNIT: I. I. EVIDENCE EVIDENCE OF OF DNA DNA AS AS THE THE GENETIC GENETIC CODE CODE II. II. DNA DNA

More information

Just one nucleotide! Exploring the effects of random single nucleotide mutations

Just one nucleotide! Exploring the effects of random single nucleotide mutations Dr. Beatriz Gonzalez In-Class Worksheet Name: Learning Objectives: Just one nucleotide! Exploring the effects of random single nucleotide mutations Given a coding DNA sequence, determine the mrna Based

More information

Sections 12.3, 13.1, 13.2

Sections 12.3, 13.1, 13.2 Sections 12.3, 13.1, 13.2 Background: Watson & Crick recognized that base pairing in the double helix allows DNA to be copied, or replicated Each strand in the double helix has all the information to remake

More information

Write: Unit 5 Review at the top.

Write: Unit 5 Review at the top. Warm-up Take out a sheet of paper: Write: Unit 5 Review at the top. As each question goes on the board, write that question down and answer it. When answers come up, either write correct next to what you

More information

Bundle 5 Test Review

Bundle 5 Test Review Bundle 5 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? _Nucleic

More information

Chapter 12-3 RNA & Protein Synthesis Notes From DNA to Protein (DNA RNA Protein)

Chapter 12-3 RNA & Protein Synthesis Notes From DNA to Protein (DNA RNA Protein) Chapter 12-3 RNA & Protein Synthesis Notes From DNA to Protein (DNA RNA Protein) I. Review A. Cells copy their DNA (in S phase of Interphase)-Why? Prepare for Cell Division (Mitosis & Cytokinesis) Genes

More information

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13 http://www.explorelearning.com Name: Period : Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13 Vocabulary: Define these terms in complete sentences on a separate piece of paper: amino

More information

Key Concept Translation converts an mrna message into a polypeptide, or protein.

Key Concept Translation converts an mrna message into a polypeptide, or protein. 8.5 Translation VOBLRY translation codon stop codon start codon anticodon Key oncept Translation converts an mrn message into a polypeptide, or protein. MIN IDES mino acids are coded by mrn base sequences.

More information

produces an RNA copy of the coding region of a gene

produces an RNA copy of the coding region of a gene 1. Transcription Gene Expression The expression of a gene into a protein occurs by: 1) Transcription of a gene into RNA produces an RNA copy of the coding region of a gene the RNA transcript may be the

More information

Understanding Sources of Variation. Part 1: Variation Overview (

Understanding Sources of Variation. Part 1: Variation Overview ( Name: Per. Date: Understanding Sources of Variation Part 1: Variation Overview (http://learn.genetics.utah.edu/content/variation/sources/) After watching the variation presentation, answer the following

More information

Ribonucleic Acid (RNA) and Protein Synthesis

Ribonucleic Acid (RNA) and Protein Synthesis Ribonucleic Acid (RNA) and Protein Synthesis Section 12-3 Summary The /RNA connection and RNA are partners in the business of making proteins. is a specialist. It provides stable, permanent storage of

More information

What Are the Chemical Structures and Functions of Nucleic Acids?

What Are the Chemical Structures and Functions of Nucleic Acids? THE NUCLEIC ACIDS What Are the Chemical Structures and Functions of Nucleic Acids? Nucleic acids are polymers specialized for the storage, transmission, and use of genetic information. DNA = deoxyribonucleic

More information

Review? - What are the four macromolecules?

Review? - What are the four macromolecules? Review? - What are the four macromolecules? Lipids Carbohydrates Protein Nucleic Acids What is the monomer of nucleic acids and what do nucleic acids make up? Nucleotides; DNA and RNA 12-1 DNA DNA Stands

More information

Homework 4. Due in class, Wednesday, November 10, 2004

Homework 4. Due in class, Wednesday, November 10, 2004 1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors

More information

DNA & Genetics. Chapter Introduction DNA 6/12/2012. How are traits passed from parents to offspring?

DNA & Genetics. Chapter Introduction DNA 6/12/2012. How are traits passed from parents to offspring? Section 5.3 DNA & Genetics Chapter Introduction How are traits passed from parents to offspring? Chromatin- DNA in the nucleus loose strands Chromosome- When DNA gets organized before cell division Gene-

More information

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein Scoring Alignments Genome 373 Genomic Informatics Elhanan Borenstein A quick review Course logistics Genomes (so many genomes) The computational bottleneck Python: Programs, input and output Number and

More information

Gene Expression REVIEW Packet

Gene Expression REVIEW Packet Name Pd. # Gene Expression REVIEW Packet 1. Fill-in-the-blank General Summary Transcription & the Big picture Like, ribonucleic acid (RNA) is a acid a molecule made of nucleotides linked together. RNA

More information

Central Dogma. 1. Human genetic material is represented in the diagram below.

Central Dogma. 1. Human genetic material is represented in the diagram below. Central Dogma 1. Human genetic material is represented in the diagram below. 4. If 15% of a DNA sample is made up of thymine, T, what percentage of the sample is made up of cytosine, C? A) 15% B) 35% C)

More information

10/19/2015 UNIT 6: GENETICS (CH 7) & BIOTECHNOLOGY (CH 8) GENETIC PROCESSES: MUTATIONS GENETIC PROCESSES: HEREDITY

10/19/2015 UNIT 6: GENETICS (CH 7) & BIOTECHNOLOGY (CH 8) GENETIC PROCESSES: MUTATIONS GENETIC PROCESSES: HEREDITY GENETIC PROCESSES: HEREDITY Heredity definition: The passing on of information from an organism to its offspring (through genes) Chromosome definition: Typically a circular (in prokaryotes) or linear (in

More information

6. Which nucleotide part(s) make up the rungs of the DNA ladder? Sugar Phosphate Base

6. Which nucleotide part(s) make up the rungs of the DNA ladder? Sugar Phosphate Base DNA Unit Review Worksheet KEY Directions: Correct your worksheet using a non blue or black pen so your corrections can be clearly seen. DNA Basics 1. Label EVERY sugar (S), phosphate (P), and nitrogen

More information

DNA REPLICATION REVIEW

DNA REPLICATION REVIEW Biology Ms. Ye DNA REPLICATION REVIEW 1. Number the steps of DNA replication the correct order (1, 2, 3): Name Date Block Daughter strands are formed using complementary base pairing DNA unwinds The DNA

More information

Protein Architecture: Conserved Functional Domains

Protein Architecture: Conserved Functional Domains PROTOCOL Protein Motif Analysis compiled by John R. Finnerty Protein Architecture: Conserved Functional Domains Proteins are like machines in that different parts of the protein perform different sub-functions,

More information

Tutorial for Stop codon reassignment in the wild

Tutorial for Stop codon reassignment in the wild Tutorial for Stop codon reassignment in the wild Learning Objectives This tutorial has two learning objectives: 1. Finding evidence of stop codon reassignment on DNA fragments. 2. Detecting and confirming

More information

Lecture 19A. DNA computing

Lecture 19A. DNA computing Lecture 19A. DNA computing What exactly is DNA (deoxyribonucleic acid)? DNA is the material that contains codes for the many physical characteristics of every living creature. Your cells use different

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

April Hussey, Andreas Zwick & Jerome C. Regier University of Maryland (AH), USA University of Maryland Biotechnology Institute (AZ, JCR), USA

April Hussey, Andreas Zwick & Jerome C. Regier University of Maryland (AH), USA University of Maryland Biotechnology Institute (AZ, JCR), USA Degen1 v1.2 April Hussey, Andreas Zwick & Jerome C. Regier University of Maryland (AH), USA University of Maryland Biotechnology Institute (AZ, JCR), USA Comments or questions about this script should

More information