concluded that, for the natural code, single-base substitution in the first position

Size: px
Start display at page:

Download "concluded that, for the natural code, single-base substitution in the first position"

Transcription

1 THE GENETIC CODE AND ERROR TRANSMISSION BY C. ALFF-STEINBERGER LABORATOIRE DE BIOPHYSIQUE DE L' INSTITUT DE BIOLOGIE MOLECULAIRE, UNIVERSITA DE GENhVE, SWITZERLAND Communicted y V. Prelo, July 14, 1969 Astrct.-The mino cid sustitutions resulting from single-se sustitution in the nturl genetic code hve een compred with those resulting from single-se sustitutions in computer-generted rndom codes. Considering the mino cid properties of moleculr weight, polr requirement, numer of dissociting groups, pk', isoelectric point, nd -helix forming ility, it is concluded tht, for the nturl code, single-se sustitution in the first position of the codon tends to result in the sustitution of n mino cid more similr to the originl mino cid thn would e expected from rndom code. In the nturl code, the second position of the codon plys the lrgest role in determining the properties of the mino cid. Introduction.-The genetic code is shown in Tle 1. The code, viewed strctly, is lnguge in which the 64 possile comintions of the four RNA ses, tken ny three t time, ech specify either single mino cid or punctution, such s peptide chin termintion. Since there is so much degenercy in the third position, it is cler tht rndom se sustitution in the third position frequently results in no chnge in the mino cid sequence. If se chnge occurs in the first or second se position of the codon, then n mino cid different from the originl one is sustituted. In this pper, we consider the following questions: Is the structure of the code such tht single-se sustitution in either position I or position II tends to result in the sustitution of n mino cid similr to the originl one? Or, in the terminology of informtion theory, is the genetic code n error-minimiing code? Hs the evolution of the genetic code een determined, in prt, y selection for this type of error minimition? We hve pursued this prolem theoreticlly, compring the error-trnsmitting property of the nturl code with tht of computer-generted rndom codes. Description of Method. -Ech mino cid, A, is chrcteried quntittively y certin property, Q(A). For exmple, if the property of the cids under considertion is the numer of hydroxyl groups, then Q(SER) = 1, Q(GLU) =, etc. Two mino cids A nd A' re "similr" to the extent tht Q(A)-Q(A') is smll numer, for those properties which re importnt. We defer, until the next section, the discussion of which specific properties of the mino cids re considered. In this section, we descrie the clcultion of n "error trnsmission index" for generl property Q. Since we wish to consider seprtely single-se sustitution in the first, second, nd third positions, we will clculte three error-trnsmission indices. These indices re denoted T1, T11, nd TI11, the romn numerl suscript indicting tht the se sustitution considered occurs in the first, second, or third se position, respectively. Let us, for convenience in descriing the sums performed, refer to the four ses y numer: U = 1, C = 2, A = 3, nd G = 4. Let the mino cid, A, specified y the se triplet (L,M,N), where L, M, nd N my ech hve vlues 1, 2, 3, or 4, e denoted 584

2 VOL. 64, 1969 GENETICS: C. ALFF-STEINBERGER 585 A(L,M,N). For exmple, for the nturl code, A (1,1,1) = PHE, A(2,1,3) = LEU, A(3,1,4) = MET, A(3,2,1) = THR. We define TI = E E E E IQ[A(L,M,N)]- Q[A(L',MN)]I L=1 M=1 N=1 L= TIl = EI Q[A(LM,N)] -Q[A(LM',N)II L=1 M=1 N=1 M'= T11, = E E E E fq[a(lm,n)] - Q[A(L,M,N')]I (1) L=1 M=1 N=1 N'=1 In performing the ove sums, the three codons which do not specify n mino cid rise prolem. These codons hve een treted in two wys, either () y skipping the terms in which they occur, nd dividing ech inner sum in eqution 1, e.g., the one over the primed index, y the numer of terms ctully in the sum, or () y sustituting the mino cid inserted y the suppressor strin1 2 in the cse of the ochre nd mer triplets, skipping the terms in which the remining UGA triplet occurs, nd normliing ech inner sum s in (). Both pproches give similr results. For the nturl code, A(L,- M,N) is frequently identicl to A(L,M,N'), nd TIII is thus seen to e smll. Similrly, to the extent tht single-se sustitutions in the first position result in the sustitution of n mino cid with vlue of Q similr to the originl one, T, will e found to e "smll." However, we hd no priori expecttion of the mgnitudes of T, nd T11 for the nturl code, nd so, for comprison, we hve constructed rndom codes nd clculted their error trnsmission indices for severl mino cid properties. The computer-generted rndom codes hve ll een constrined to hve the sme numer of codons per cid s the nturl codes, nd lso to hve three codons which do not specify n mino cid. Two sets of rndom codes hve een constructed. One set of codes is completely rndom, within the ove constrints. The other set of codes is further constrined to hve the sme degenercy in the third se s does the nturl code, e.g., the three codons for ILU lwys hve the sme first nd second se s the codon for MET. Two hundred codes of ech type were generted. We refer to the first set s "rndom codes with third se degenercy." The second set of rndom codes ws constructed ecuse it is likely tht there re other mechnisms which mke the third TABLE 1. Genetic code.* Second se U PHE C SER A TYR G CYS U u PHE SER TYR CYS C LEU SER OCHRE... A LEU SER AMBRE TRY G LEU PRO HIS ARG U c LEU PRO HIS ARG LEU PRO GLN ARG A Q LEU PRO GLN ARG G m I ILU THR ASN SER U w A ILU THR ASN SER C ILU THR LYS ARG A MET THR LYS ARG G VAL ALA ASP GLY U G VAL ALA ASP GLY C VAL ALA GLU GLY A VAL ALA GLU GLY G * The genetic code shown is tken from Crick.1

3 586 GENETICS: C. ALFF-STEINBERGER PROC. N. A. S. position degenercy desirle. In view of the prole simultneous evolution of the vrious prts of the protein-synthesiing pprtus, the structure of the selected code might e in prt determined y(levelolpments in other structures nd processes. Two exmples of other mechnisms which might ffect the structure of the code re () the "wole," proposed y Crick,3 in which lternte hydrogen ondiiig is possile for the third se, nd () the fct tht errors in trnsltion re most frequent in the third position.4 Results.-Ech of the six mino cid properties listed in Tle 2 hs een tken to e the Q of eqution 1. The error-trnsmission indices otined of the nturl genetic code re shown in Tle 3. As is expected, T1j1 is consistently much smller thn T1 or TII. It is interesting to note tht in five out of six exmples, T, is smller thn Ti. This feture of the nturl code is discussed in more detil elow. We will now compre the error trnsmission indices of the nturl code with those of the rndom codes for ech of the vrious properties in turn. (1) Q = Moleculr weight: Figure 1 shows the distriution of the errortrnsmission indices of the rndom codes, otined when the Q of eqution (1) is tken to e the mino cid moleculr weight. The three error-trnsmission indices of the nturl code re indicted y rrows. For the nturl code, the The interesting third se error-trnsmission index is very smll, s is expected. TABLE 2. Amino cid properties.* Numer of -helix Amino Moleculr Polr dissociting Isoelectric forming cid weight requirement groups pk' point ility ALA ARG ASP ASN CYS GLU GLN GLY HIS ILU LEU LYS MET PHE PRO SER THR TRY TYR VAL * The vlues for the polr requirements of the mino cids were tken from Woese.6 The vlues of pkl nd the isoelectric point were tken from Edsll.11 TABLE 3. Error trnsmission indices of the nturl genetic code. Amino cid property, Q Ti TI1 Tiii Moleculr weight Polr requirement Numer of dissociting groups pk' Isoelectric point helix forming ility

4 VOL. 64, 1969 GENETICS: C. ALFF-STEINBERGER 587 I f 4) E T1. E Rndom Code Error Trnsmission Index - - Rndom Code Error Trnsmission Index - - FIG. 1.-() shows the distriution of error trnsmission indices otined for the rndom codes without third se degenercy, when Q of eqution 1 is tken to e the moleculr weight. () shows the corresponding distriution for the rndom codes with third se degenercy. In ll cses (Figs. 1-6), the vlues of the nturl code error trnsmiissiq1 indices re indicted y rrows leled Ti, Tii, Tiir, for first, second, nd third se sustitution, respectively. Ech rndom code without third se degenercy provides three error trnsmission indices. Ech rndom code with third se degenercy provides two error trnsmission indices of interest; the third is, of course, lwys equl to TIII. result is tht the first se index for the nturl code is lso significntly smller thn the verge of those otined for the rndom codes. Tht is, in the nturl code, single-se sustitution in the first position leds to fr smller chnge in the mino cid moleculr weight thn would e expected either from completely rndom code or from rndom code in which the third se degenercy is mintined. (2) Q = Polr requirement: The term "polr requirement" ws defined y Woese.5 For pyridene solvents, Woese plots log [1 - Rr/RFl s the ordinte, nd log (mole frction of H2) s the sciss. A series of stright lines is otined, one for ech mino cid. The polr requirement of ny mino cid is defined to e the slope of its line. Woese5, I hs noted tht mino cids with the sme second se hve similr polr requirements. Figure 2 shows the distriution of the error-trnsmission indices for the rndom codes, gin with the three nturl code indices indicted y rrows. The error-trnsmission index for the first se of the nturl code is significntly smller thn the verge of the indices otined for the rndom codes. Thus, single-se sustitution in the first position of the nturl code leds to the sustitution of n mino cid whose polr requirement is more nerly equl to tht of the originl mino cid thn would e expected from rndom code. (3) Q = Numer of dissociting groups: "Numer of dissociting groups" refers to the numer of pk' vlues, or numer of inflections in the titrtion curve of the mino cid. Figure 3 shows the distriution of error-trnsmission indices otined for the rndom codes. Agin, the index for the first position of the nturl code is smller thn most indices otined for the rndom codes. It is thus concluded tht single-se sustitution in the first position of the nturl

5 588 GENETICS: C. ALFF-STEINBERGER PROC. N. A. S. 4 (n TXT 4E C). E o 15 Rndom Code Error Trnsmission Index - -_ Rndom Code Error Trnsmission Index - -_ FIG. 2.-() shows the distriution of error trnsmission indices, otined for the rndom codes without third se degenercy, when Q of eqution 1 is tken to e the polr requirement. () shows the corresponding distriution for the rndom codes with third se degenercy. 2F f - 15[ (A 1o 5 - o _ - TE Rndom Code Error Trnsmission Index -_ Rndom Code Error Trnsmission Index - _ FIG. 3.-() shows the distriution of error trnsmission indices, otined for the rndom codes without third se degenercy, when Q of eqution 1 is tken to e the numer of dissociting groups. () shows the corresponding distriution for the rndom codes with third se degenercy. code codon leds to the sustitution of n mino cid whose numer of dissociting groups is more similr to tht of the originl mino cid thn would e expected from rndom code. (4) Q = pk,: Figure 4 shows the distriution of the error-trnsmission indices for the rndom codes. Agin, for the nturl code, the index otined for single-se sustitution in position I is much smller thn the verge of the indices otined from the rndom codes. Similrly, it is concluded tht singlese sustitution in the first position of the nturl code generlly leds to the sustitution of n mino cid whose pk' vlue is more similr to tht of the originl mino cid thn would e expected from rndom code. (5) Q Isoelectric point: Figure 5 shows the distriutions of the = error trnsmission indices of the rndom codes. In this cse, for the nturl code,

6 VOL. 64, 1969 GENETICS: C. ALFF-STEINBERGER 589 T1r o ' * - : I T 2- T P 1 1_Z S Rndom Code Error Trnsmission Index--- Rndom Code Error Trnsmission Index- -- FIG. 4.-() shows the distriution of error trnsmission indices otined for the rndom codes without third se degenercy, when Q of eqution 1 is tken to e pkl. () shows the corresponding distriution for the rndom codes with third se degenercy. 1t FIG. 5.-() shows the dis- 8 triution of error trnsmission indices otined for the rndom o T1 codes without third se degenercy, when Q of eqution 1 4 is tken to e the isoelectric E point. 2_ 5 1 Rndom Code Error Trnsmission Index -_ 5 4 () shows the corresponding distriution for the rndom 3 T codes with third se degener- TI\ cy. 2 TX 1 E 5 1 Rndom Code Error Trnsmission Index --_ the indices for oth the second nd first se positions re smller thn most of the rndom code indices. Here, the conclusion is tht the nturl code is such tht single-se sustitution in either the first or the second position leds, in generl, to the sustitution of n mino cid with n isoelectric point more

7 59 GENETICS: C. ALFF-STEINBERGER PROC. N. A. S. similr to tht of the originl mino cid thn would e expected from rndom code. (6) Q = -Helix-forming ility: This lst prmeter considered is the most ill-defined nd specultive, ut it is included here since the effect of the mino cid sustitution on the peptide chin configurtion is clerly importnt. Schellmn nd Schellmn7 refer to the work of Blout et l.,8 who hve mde theoreticl division of the mino cids into "-helix forming" nd "non-helix forming." The mino cids which re tken to e "non--helix forming" re () glycine, () proline, (c) those with two groups ttched to the f3 cron, i.e., vline nd isoleucine, nd (d) those with heterotom on the i3 cron, i.e., serine, threonine, nd cysteine. These re ssigned Q = -1. The 13 other "-helix-forming" mino cids re ssigned Q = + 1. Figure 6 shows the distriution of the rndom code error-trnsmission indices. The nturl code error-trnsmission index for the first se position is smller thn most rndom code indices. Therefore, the nturl code structure is such tht single-se sustitution in the first se position generlly leds to the sustitution of n mino cid with more similr -helix forming properties thn would e expected from rndom eode. A seprte series of clcultions hs een performed in which it is ssumed tht only pyrimidine +-* pyrimidine nd purine *-* purine sustitutions occur. The results otined for this group of error-trnsmission indices re very similr: for ech of the six mino cid properties considered ove, single-se sustitution (in this cse restricted to the sustitutions U <-+ A nd C +-+ G) in the first position of the nturl code leds to the sustitution of more similr mino cid thn would e expected from rndom code. Conclusion.-Compring the nturl code with computer-generted rndom codes, the min conclusion is tht, for ech of the six properties discussed ove T1~~~~~~~~~~~3 8 t 8 T U) U) E 2 -TN 5 5 Rndom Code Error Trnsmission Index- - -_ Rndom Code Error Trnsmission Index -- _ FIG. 6.-() shows the distriution of error trnsmission indices otined for the rndom codes without third se degenercy, when Q of eqution 1 is tken to e the "-helix forming ility." () shows the corresponding distriution for the rndom codes with third se degenercy.

8 VOL. 64, 1969 GENETICS: C. ALFF-STEINBERGER 591 nd listed in Tle 2, the structure of the nturl code is such tht single-se sustitution in the first position of the codon tends to result in the sustitution of n mino cid similr to the originl one. A relted result is tht the second position of the codon crries the most informtion, i.e., plys the lrgest role in determining the properties of the mino cid. In generl, the six mino cid properties re only wekly correlted with ech other. For exmple, it is not prole tht rndom code hving smll errortrnsmission index for moleculr weight will lso hve smll error-trnsmission index for polr requirement. This lck of correltion implies tht the nturl code hs evolved so s to minimie error trnsmission in the first position simultneously for severl mino cid properties. It would seem tht the structure of the code hs een in prt determined y its role of informtion trnsmission. Sonneorn8 hs suggested this, stressing the voidnce of "lethl muttions." However, his theory mkes no distinction mong the se positions within codon. It is difficult to conceive of mechnism y which the DNA-replicting enyme, or mutgenic gent, could mke distinction etween vrious ses depending on which position they might occupy within codon. The proility of single-se sustitution during repliction is more likely to depend on the se itself, rther thn on its position. A se-dependent pttern, rther thn position-dependent pttern, of singlese sustitution during repliction might effect the code structure. For exmple, tendency towrd purine-purine sustitutions might influence the code structure such tht similr mino cids would tend to hve codons with purines in corresponding positions. Such pttern would not necessrily led to discrepncies in the vlues of T, nd T11. A distinction etween ses, depending on their position within the codon, is more likely to e mde during trnscription or trnsltion. Woese4 9 hs suggested tht the reltive frequency of trnsltion errors might e reflected in the code structure. The uthor wishes to thnk R. Epstein, E. Kellenerger, nd MI. Yngid for vlule discussions. The hospitlity of the Lortoire de Biophysique ws very gret nd is pprecited. Kpln, S., A.. W. Stretton, nd S. Brenner, J. Mol. Biol., 14, 528 (1965). 2 Gorini, L., nd J. R. Beckwith, Ann. Rev. Microiol., 2, 41 (1966). 3Crick, F. H. C., J. Mol. Biol., 19, 548 (1966). 4 Woese, C. R., The Genetic Code (New York: Hrper nd Row, 1967). 5 Woese, C. R., D. H. Dugre, W. C. Sxinger, nd S. A. Dugre, these PROCEEDINGS, 55, 966 (1966). 6Woese, C. R., D. H. Dugre, S. A. Dugre, M. Kondo, nd W. C. Sxinger, Cold Spring Hror Symp. Qunt. Bio., 31, 723 (1966). 7 Schellmn, J. A., nd C. G. Schellmn, in The Proteins, 2nd edition, ed. H. Neurth (New York: Acdemic Press, 1964), vol. 2, pp Sonneorn, T. M., in Evolving Genes nd Proteins, ed. Bryson nd Vogel (New York: Acdemic Press, 1964), pp Woese, C. R., these PROCEEDINGS, 54, 1546 (1965). 1 Crick, F. H. C., Sci. Am., 215, 55 (1966). "1 Edsll, J. T., in Proteins, Amino Acids nd Peptides s Ions nd Dipolr Ions, ed. Cohn nd Edsll (New York: Reinhold, 1943), p. 75.