2124 Corrections Correction. n the article "Nucleotide sequence and the encoded amino acids of human serum albumin mrna" by Achilles Dugaiczyk, Simon W. Law, and Olivia E. Dennison, which appeared in the January 1982 issue ofproc. NatL. Acad. Sci. USA (79, 71-75), the first paragraph of Discussion was garbled by a printer's error. The correct paragraph is printed below. Determining the complete nucleotide sequence ofthe cdna has permitted us to identify the pre- and the propeptides of human serum albumin. Actually, the amino acid sequence of the propeptide has been reported for carriers of an abnormal albumin, Christchurch (18), which is longer than normal human albumin by six amino acids at the NH2 terminus. This additional hexapeptide has the sequence Arg-Gly-Val-Phe-Arg-Gln (18) and differs only in the terminal position from the sequence we are presently reporting for the apparently normal protein, Arg- Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albumin Christchurch must be carriers of a CGA to CAA mutation, which changes the codon for Arg to Gln in the last position of the propeptide. The altered protein consequently ceases to be a substrate for the specific protease that removes propeptides from secretory proteins. t is interesting to note, however, that failure to remove such a propeptide does not prevent the protein from being secreted; at least this is true about albumin Christchurch, which has reached the bloodstream of its carriers (18). Proc. NatL Acad. Sci. USA 79 (1982) Correction. n the article "A proton gradient controls a calcium-release channel in sarcoplasmic reticulum" by Varda Shoshan, David H. MacLennan, and Donald S. Wood, which appeared in the August 1981 issue ofproc. NatL Acad. Sci. USA (78, 4828-4832), the authors request that the following correction be noted. n the experiment reported in Fig. 5 and discussed on pages 4830 and 4831, the concentration of Ca2+ used to inhibit Ca + release was, in fact, 100,uM not 3.3 jum as reported. Consequently, although Ca2' release was measurably reduced by Ca2' in the experiments of Fig. 5, the data neither support nor preclude the possibility that physiological Ca2+ levels ('10,uM) can inhibit Ca2' release.
Proc. Nati Acad. Sci. USA Vol. 79, pp. 71-75, January 1982 Biochemistry Nucleotide sequence and the encoded amino acids of human serum albumin mrna (cdna clones/codon usage/prepropeptide/triple-domain structure) ACHLLES DUGACZYK, SMON W. LAW*, AND OLVA E. DENNSON Department of Cell Biology, Baylor College of Medicine, Texas Medical Center, Houston, Texas 77030 Communicated by James F. Bonner, Septemnber 3, 1981 ABSTRACT The complete nucleotide sequence of human serum albumin mrna has been determined from recombinant cdna clones and from a primer-extended cdna synthesis on the mrna template. The sequence is composed of 2078 nucleotides, starting upstream from a potential ribosome binding site in the 5' untranslated region. t contains all the translated codons and extends into the poly(a) at the 3' terminus. Part of the translated sequence codes for a hydrophobic prepeptide, Met--Trp-Val- Thr-Phe-le-Ser-Leu-Leu-Phe-Leu-Phe-Ser-Ser-Ala-Tyr-Ser, followed by a basic propeptide, Arg-Gly-Val-Phe-Arg-Arg. These signal peptides are absent from mature normal serum albumin and, so far, have not been identified in their nascent state in humans. A remaining 1755 nucleotides of the translated mrna sequence code for 585 amino acids, which are in agreement, with few exceptions, with the published amino acid sequence for human serum albumin. The mrna sequence verifies and refines the repeating homology in the triple-domain structure of the serum albumin molecule. The gene for serum albumin, which codes for the major plasma protein, is ofparticular interest to study because it is regulated in development. n mammals, serum albumin is synthesized by the adult liver, and its synthesis increases from low levels early in development to a high plateau in adulthood. The embryonic liver and yolk sac, on the other hand, produce predominantly a-fetoprotein, but the synthesis decreases drastically after birth (1-3). This inverse relationship between the expression of the two genes makes it an attractive problem in developmental biology, particularly because the two genes are related. We have recently determined the complete sequence of mouse a-fetoprotein mrna (4). The deduced a-fetoprotein structure revealed extensive homology to mammalian serum albumin, indicating that the two proteins are encoded in the same gene family. Similar conclusions have been reached by others from studies on rat (5) and mouse (6) a-fetoprotein genes. n the present effort we extend our studies to the human genome and report the cloning and sequence determination of DNA complementary to the human serum albumin mrna. We deduce the unknown amino acid sequence of the signal peptide and verify a repeating homology in the triple-domain structure of the human serum albumin molecule. METHODS mrna. Human liver mrna was obtained by the procedure of Chirgwin et al (7) from fetal livers removed at 17-20 weeks of gestation. mmunoprecipitation of albumin-containing polysomes was performed according to Taylor and Tse (8). n vitro translation of mrna was carried out in a cell-free reticulocyte The publication costs ofthis article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. 1734 solely to indicate this fact. 71 A B C am- -~ FG. 1. Autoradiogram of proteins, labeled with [assimethionine in an in vitro reticulocyte translation system, and separated electrophoretically in a sodium dodecyl sulfate/acrylamide gel (9). Lane A, no mrna added to the translation system. Lane B, translation products of total human liver mrna. Lane C, translation products of human liver mrna obtained from immunoprecipitated albumin-producing polysomes. Lane D, part of the translation sample that is separated in lane B was immunoprecipitated with antibody prior to its electrophoretic separation on the gel. system, following the instruction of the supplier (New England Nuclear). The translation products were separated electrophoretically according to Laemmli (9). Cloning and Sequence Determination of DNA. Doublestranded cdna has been cloned in the Pst site of the plasmid pbr322, as described in detail previously (4, 10). DNA sequence was determined according to the procedure of Maxam and Gilbert (11). RESULTS Enriched mrna. The products of in vitro translation of human liver mrna are shown in Fig. 1. On the basis of this translation, mrna that was enriched for serum albumin sequences was estimated to be over 50% pure (Fig. 1, lane C). This enriched mrna was used to obtain a cdna probe to screen the recombinant clones for serum albumin sequences. * Present address: National Heart, Lung and Blood nstitute, National nstitutes of Health, Bethesda, MD 20205. D
72 Biochemistry: Dugaiczyk et al. Recombinant Plasmids pha36 and pha206. A restriction endonuclease map of the largest positive clone, pha36, is shown in Fig. 2, together with a restriction map of the primerextended plasmid clone pha206. The latter was obtained in a second transformation experiment after initiating the cdna synthesis from an internal primer. This primer was a 91-basepair-long DNA fragment, Msp (152)-Taq (182/3), isolated from pha36. The two plasmids, pha36 and pha206, share 0.15 kilobase of homologous DNA. Together, they encode the entire sequence for human serum albumin, starting with the CTT codon for Leu at position - 10 of the prepeptide and extending into the 3' untranslated region of poly(a). Sequence of the Albumin cdna. The entire nucleotide sequence of the serum albumin mrna, as determined from the cloned DNA in pha36 and pha206 and from the primer-extended cdna at the 5' terminus of the message, is shown in Fig. 3. The inferred amino acid sequence is also indicated. The mrna length is 2078 nucleotides, of which 38 represent the 5' untranslated region, 54 identify a prepeptide of 18 amino acids, 18 identify a propeptide of 6 amino acids, 1755 code for the known 585 amino acids of serum albumin, 189 make up the 3' untranslated region, and 24 are the poly(a) sequence. Nucleotides 5 to 15 (-34 to -24) in the 5' untranslated region (Fig. 3) are complementary to a 3'-terminal region of eukaryotic 18S RNA (13) and thus could represent a ribosome binding site: (5')... T-T-C C-T-T-C-T-G-T... albumin mrna (3')... G-A-G-G-A-A-G-G-C-G-U-C-C-m62A-m6A... 18S RNA The translated portion of the mrna sequence codes for the signal peptide and the main body of the albumin polypeptide chain. Because prepeptides are removed from nascent secretory proteins (such as albumin) in the endoplasmic reticulum, and the conversion ofproalbumin to albumin takes place in the Golgi vesicles, the presence and the sequence of the signal peptide for normal human serum albumin have not been reported previously. At the 3' end of the message, the putative polyadenylylation signal sequence A-A-T-A-A-A, is located 16 nucleotides upstream from the beginning of the poly(a) sequence. Another characteristic sequence located near the polyadenylylation site has been identified by Benoist et al. (14); the consensus sequence from several mrnas was T-T-T-T-C-A-C-T-G-C. A Proc. Nad Acad. Sci. USA 79 (1982) similar sequence, T-T-T-T-C-T-C-T-G-T, is located 19 nucleotides upstream from the A-A-T-A-A-A hexanucleotide in the human albumin mrna molecule (Fig. 3). Primary Structure of Human Serum Albumin. Amino acid sequences for human serum albumin have been published by Behrens et al. (15) and by Meloun et al (16). (See Fig. 4.) There are a few discrepancies between the two reports, often involving sequences surrounding cysteine residues. Perhaps the most serious in terms of the structure of the albumin molecule is the sequence around the 17th and 18th cysteines, because the presence (15) or absence (16) of other amino acids between the two cysteines would affect the polypeptide loops generated by the disulfide bonds ofthese two cysteines. Our results in this critical region are shown by the sequencing gel in Fig. 5, which permits an identification of residues 261 to 290. The 17th and 18th cysteines are residues 278 and 279, and there are clearly no other amino acids between them. Consequently, such a sequence arrangement establishes the near-perfect homology in the triple-domain structure of the albumin molecule (Fig. 4). Our sequence results are in better agmeement with those of Meloun et al (16), although several discrepancies remain. Amino acid positions 94 (Gln), 95 (), 97 (Gly), 170 (Gln), 464 (His), 465 (), and 501 () are specified (16) as, 94 (), 95 (Gln), 97 (), 170 (), 464 (), 465 (His), and 501 (Gln). Residues 364-370 (Fig. 3), Ala-Asp-Pro-His---Tyr, are specified (16) as His-Asp-Pro-Tyr---Ala. There are several.possible explanations for these differences, but we do not think the differences are due to erroneous DNA sequence determinations. DSCUSSON Determining the complete nucleotide sequence of the cdna has permitted us to identify the pre- and the propeptides of human serum albumin. Actually, the amino acid sequence of the propeptide. The altered protein consequently ceases to be a substrate for the specific'protease that removes propeptides from secretory proteins. t is interesting to note, however, that failure to remove such a propeptide does not prevent the protein from being secreted; at least this is 'true about albumin are presently reporting for the apparently normal protein, Arg- Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albumin Christchurch must be carriers of a CGA to CAA mutation, which changes the codon for Arg to Gln in the last position of HpaU (3658) J 1 Taq 1 5' Psol bombo (3611) 16/7 31 Hpa 1 (3548) Pat 182/3 (3611) Taq, H1%~~~~~Hn 1 57 131 144/5 162 107/8 Hinf Mbo Mbo Mu 1-1 [3.Pst s0 (3611) Hp&( A2 (3646) 270 325/6 37416 382 419/0 46011 Taq Mbo Taq Mlo HknU Mbo 1L11 H 26910 Kiobases l - 493/4 Taq Hha sat1 364 406/7 HM H*n41 531/2 472/3 479/0 pha36 O.1.2.3 A.5.6.7.8.9 1.0 1.1 12 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 FG. 2. Restriction endonuclease cleavage map of the overlapping human albumin cdna inserts in the recombinant plasmids pha36 and pha26.' The map shows the inserted albumin DNA (opent bar) and its orientation in-the pbr322 plasmid DNA (black bar). The mrna sequence is presented by the upper DNA strands, with the 5' and 3' termini as indicated. Numbers on restriction sites within the human-derived'dna correspond to amino- acid positions in the albumin molecule. Numbers such as 182/3 indicate that the restriction sequence overlaps two amino acids. The Taq (270) site is actually not cleaved in our system'by Taq, due to methylation of this sequence to T-C-G-m6A. The inserted sequence in pha206 starts with the'ctt codon for Leu at position -10 of the prepeptide. Numbers on restriction sites within the pbr322 sequence are taken from available sequence data from Sutcliffe (12). One of the two Pst sites in each recombinant plasmid, indicated by *, has not been reconstituted due to loss of a terminal A in the Pst'-linearized plasmid DNA. 666 term TAA Hhdlll * Pot (3611) 34 Hhfl (3668)
Biochemistry: Dugaiczyk et al. Proc. Nati. Acad. Sci. USA 79 (1982) 73-18 p r e -10 Met lys trp val thr phe ile ser leu leu phe leu phe ser (A)GCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACA ATG AAG TGG GTA ACC TTT ATT TCC CTT CTT m CTC m AGC (80) -1-6 p r o -1 1 10 20 ser ala tyr ser arg gly val phe arg arg asp ala his lys ser glu val ala his arg phe lys asp leu gly glu glu asn phe lys TCG GCT TAT TCC AGG GGT GTG m CGT CGA GAT GCA CAC AAG AGT GAG GTT GCT CAT CGGM AAA GAT TTG GGA GAA GAA AAT TTC AAA (170) 21 30 34 40 50 ala leu val leu ile ala phe ala gin tyr leu gin gin cys pro phe glu asp his val lys leu val am glu val thr glu phe ala GCC TTG GTG TTG ATT GCC TTT GCT CAG TAT CTT CAG CAG TGT CCA m GAA GAT CAT GTA AAA TTA GTG AAT GAA GTA ACT GAA TTT GCA (260) 51 53 60 62 70 75 80 lys thr cys val ala asp glu ser ala glu asn cys asp lys ser leu his thr leu phe gly asp lys leu cys thr val ala thr leu AAA ACA TGT GTT GCT GAT GAG TCA GCT GAA AAT TGT GAC AAA TCA CTT CAT ACC CTTm GGA GAC AAA TTA TGC ACA GTT GCA ACT CTT (350) 81 90 91 100 101 110 arg glu thr tyr gly glu met ala asp cys cys ala lys gin glu pro gly arg asn glu cys phe leu gin his lys asp asp asn pro CGT GAA ACC TAT GGT GAA ATG GCT GAC TGC TGT GCA AAA CM GAA CCT GGG AGA MT GM TGC TTC TTG CM CAC MA GAT GAC AAC CCA (440) 111 120 124 130 140 asn leu pro arg leu val arg pro glu val asp val met cys thr ala phe his asp asn glu glu thr phe leu lys lys tyr leu tyr MC CTC CCC CGA TTG GTG AGA CCA GAG GTT GAT GTG ATG TGC ACT GCTm CAT GAC MT GM GAG ACA TTT TTG AAA MA TAC TTA TAT (530) 141 150 160 168 169 170 glu ile ala arg arg his pro tyr phe tyr ala pro glu leu leu phe pbe ala lys arg tyr lys ala ala phe thr glu cys cys gin GAA ATT GCC AGA AGA CAT CCT TAC TTT TAT GCC CCG GAA CTC CTT TTCm GCT MA AGG TAT MA GCT GCT m ACA GAA TGT TGC CM (620) 171 177 180 190 200 ala ala asp lys ala ala cys leu leu pro lysleu asp glu lie arg asp glu gly lys ala ser ser ala lys gin arg leu lys cys GCT GCT GAT MA GCT GCC TGC CTG TTG CCA MG CTC GAT GM CTT CGG GAT GM GGG AAG GCT TCG TCT GCC MA CAG AGA CTC AAG TGT (710) 201 210 220 230 ala ser leu gin lys pbe gly glu arg ala phe lys ala trp ala val ala arg leu ser gin arg phe pro lys ala glu phe ala glu GCC AGT CTC CAA MA TTT GGA GM AGA GCT TTC MA GCA TGG GCA GTA GCT CGC CTG AGC CAG AGA TTT CCC AAA GCT GAG TTT GCA GAA (800) 231 240 245 246 250 253 260 val ser lys leu val thr asp leu thr lys val his thr glu cys cys his gly asp leu leu glu cys ala asp asp arg ala asp leu GTT TCC AAG TTA GTG ACA GAT CTT ACC MA GTC CAC ACG GAA TGC TGC CAT GGA GAT CTG CTT GM TGT GCT GAT GAC AGG GCG GAC CTT (890) 261 265 270 278 279 280 289 290 ala lys tyr se cys glu asn gin asp ser ile ser ser lys leu lys glu cys cys glu lys pro leu leu glu lys ser his cys ile GCC AAG TAT ATC TGT GAA MT CAA GAT TCG ATC TCC AGT AAA CTG MG GM TGC TGT GAA MA CCT CTG TTG GM AAA TCC CAC TGC ATT (980) 291 300 310 316 320 ala glu val glu asn asp glu met pro ala asp leu pro ser leu ala ala asp phe val glu ser lys asp val cys lys asn tyr ala GCC GAA GTG GM MT GAT GAG ATG CCT GCT GAC TTG CCT TCA TTA GCT GCT GAT TTT GTT GAA AGT AAG GAT GTT TGC AM MC TAT GCT(1070) 321 330 340 350 glu ala lys asp val phe leu gly met phe leu tyr glu tyr ala arg arg his pro asp tyr ser val val leu leu leu arg leu ala GAG GCA AAG GAT GTC TTC TTG GGC ATG m TTG TAT GAA TAT GCA AGA AGG CAT CCT GAT TAC TCT GTC GTG CTG CTG CTG AGA CTT GCC(1160) 351 360 361 369 370 380 lys thr tyr glu thr thr leu glu lys eys cys ala ala ala asp pro his glu cys tyr ala lys val phe asp glu phe lys pro leu MG ACA TAT GM ACC ACT CTA GAG MG TGC TGT GCC GCT GCA GAT CCT CAT GAA TGC TAT GCC MA GTG TTC GAT GM U AA CCT CTT(1250) 381 390 392 400 410 val glu glu pro gin asn leu ile lys gin asn cys glu leu phe glu gin leu gly glu tyr lys phe gin asn ala leu leu val arg GTG GAA GAG CCT CAG AAT TTA ATC MA CAA MT TGT GAG CTT m GAG CAG CTT GGA GAG TAC MA TTC CAG MT GCG CTG TTA GTT CGT(1340) 411 420 430 437 438 440 tyr thr lys lys val pro gin val ser thr pro thr leu val glu val ser arg asn leu gly lys val gly ser lys eys cys lys his TAC ACC MG MA GTA CCC CAA GTG TCA ACT CCA ACT CTT GTA GAG GTC TCA AGA MC CTA GGA MA GTG GGC AGC AM TGT TGT AM CAT(1430) 441 448 450 460 461 470 pro glu ala lys arg met pro eys ala glu asp tyr leu ser val val leu asn gin leu cys val leu his glu lys thr pro val ser CCT GM GCA AAA AGA ATG CCC TGT GCA GM GAC TAT CTA TCC GTG GTC CTG AAC CAG TTA TGT GTG TTG CAT GAG AAA ACG CCA GTA AGT(1520) 471 476 477 480 490 500 asp arg val thr lys cys cys thr glu ser leu val asn arg arg pro cys phe ser ala leu glu val asp glu thr tyr val pro lys GAC AGA GTC ACC AAA TGC TGC ACA GM TCC TTG GTG MC AGG CGA CCA TGC TTT TCA GCT CTG GM GTC GAT GM ACA TAC GTT CCC AAA(1610) 501 510 514 520 530 glu pbe asn ala glu thr phe thr phe his ala asp ile cys thr leu ser glu lys glu arg gin ile lys lys gin thr ala leu val GAG mu MT GCT GAA ACA TTC ACC TTC CAT GCA GAT ATA TGC ACA CTT TCT GAG MG GAG AGA CAA ATC MG AM CM ACT GCA CTT GTT(1700) 531 540 550 558 559 560 glu leu val lys his lys pro lys ala thr lys glu gin leu lys ala val met asp asp phe ala ala phe val glu lys cys cys lys GAG CTC GTG MA CAC MG CCC MG GCA ACA AM GAG CM CTG MA GCT GTT ATG GAT GAT TTC GCT GCT TTT GTA GAG MG TGC TGC MG(1790) 561 567 570 580 ala asp asp lys glu thr cys phe ala glu glu gly lys lys leu val ala ala ser gln ala ala leu gly leu ter ter GCT GAC GAT MG GAG ACC TGC m GCC GAG GAG GGT AM MA CTT GTT GCT GCA AGT CAA GCT GCC TTA GGC TTA TM CATCACAUMAAMG(1883) ter ter CATCTCAGCCTACCATGAGAATAAGAGAAAGAAAATGAAGATCAAAAGCTTATTCATCTGTTTTTCTTTTTCGTTGGTGTAAAGCCAACACCCTGTCTAAAAAACATAAATT CTTT (2oo2) TCATTTTGCCTCTTTTCTCTGTGCTTCMTMTAAMAATGGAAAGMTCTM... 20. M (2078) FG. 3. Nucleotide sequence of human serum albumin mrna as determined from cloned cdna. The sequence 5'-upstream of the Leu at -10, which is not contained in pha206, was determined as follows. A short DNA fragment, containing the Taq site (Arg at -1 in the propeptide), was obtained from the 5' end of the albumin DNA in pha206 (Fig. 2). The fragment was terminated by the Alu sequence (Ser at -5 in the prepeptide) and the Mnl sequence (-6), and it was 32P-labeled at the 5' terminus of the latter. This fragment was annealed with human liver mrna, and the resulting RNADNA template/primer was used in the reverse transcriptase reaction to synthesize a cdna copy extended into the 5' terminus of the mrna template. This cdna was separated from the deoxyribonucleoside triphosphates on a Sephadex G-50 column and used directly for sequence determination. The first A residue was determined from genomic DNA. The amino acid sequence shown is deduced from the nucleotide sequence. ter, Termination. the propeptide. The altered protein consequently ceases to be Christchurch, which has reached the bloodstream of its carriers a substrate for the specific protease that removes propeptides (18). from secretory proteins. t is interesting to note, however, that The nucleotide sequence of the albumin mrna has also perfailure to remove such a propeptide does not prevent the pro- mitted us to refine the repeated homology in the triple-domain tein from being secreted; at least this is true about albumin structure of the human albumin molecule. The fact that cys-
74 Biochemistry: Dugaiczyk et alp ji~~~~~~~h Proc. Natl. Acad. Sci. USA 79 (1982) - _ mma Domain A)~~~~~~~~~~~~~~~~1 Doai 1 ~~~~~~~~~~~ k Domain11E { FG. 4.A ioai _iae eune n iufd odn atr fhmnsrmabmn rpedmi tutr fitra oooyi n varw.tesqec ersnsordeetdt:telvu s codn obon(7.nt h erdretsmer ftedslie bridges throughout the molecule.
Biochemistry: Dugaiczyk et al 290 lle His Ser Leu Leu Pro 276 281 4 X r cys O k 2 Leu k Ser Ser 'le ;Ser Asp Gn Asn lle Tyr Ala 261 FG. 5. Autoradiogram of a DNA sequencing gel showing the region coding for the 17th and 18th cysteines of human serum albumin; the two cysteines are residues 278 and 279 and there are no other amino acids between them. teines 278 and 279 are not separated by other amino acids, as was originally thought (15), brings the structure of domain closer to the structures of the remaining two domains (Fig. 4). As nucleotide sequences of genes or mrnas become available, they prompt renewed speculations about codon usage, because some insight might be gained into rates of mutation or other evolutionary forces, from any nonrandom pattern of codons used in a given gene, or a given species. For example, analyzing published nucleotide sequence data of human a- and,b3globin genes, Modiano et al (19) noticed that when several degenerate codons specify one amino acid, the codon that gives rise to a termination codon in a one-step mutation is never used. These "dispensable pretermination codons" are avoided, it was argued (19), in order to reduce the risk to mutate to termination. Such an argument implicitly postulates that evolution is anticipatory. This argument, however, receives no support from the distribution of codons utilized on the serum albumin gene. Table 1 summarizes the prevalence ofcodons utilized for each of the amino acids in human serum albumin, including the presequence and prosequence. There is no evidence for discrimination against the seven dispensable pretermination codons: UUA, UUG, UCA, UCG, CGA, AGA, and GGA. We thank Dr. Paul C. MacDonald and Dr. Evan R. Simpson from the University of Texas Southwest Medical Center, Dallas, for kindly providing human fetal liver samples. We also thank Dr. Brian J. McCarthy and Dr. Arthur D. Riggs for critical reading of the manu- c A Proc. Natd Acad. Sci. USA 79 (1982) 75 Table 1. Codon usage in human serum albumin mrna U C A G Phe 25 Ser 3 Tyr 13 15 U u Phe 10 Ser 7 Tyr 6 20 C Leu 10 Ser 6 Ter 1 Ter 0 A Leu 13 Ser 3 Ter 0 Trp 2 G Leu 19 Pro 10 His 11 Leu 7 Pro 6 His 5 Leu 3 Pro 7 Gln 11 Leu: 12 Pro 1 Gln 9 ile fle ile Met 4 Thr 7 Asn 11 4 Thr 9 Asn 6 1 Thr 11 40 7 Thr 2 20 Val 12 Ala 31 G Val 7 Ala 14 Val 8 Ala 16 Val 16 Ala 2 Asp 25 Asp 11 38 23 Arg 3 Arg 1 Arg 3 Arg 2 Ser 6 Ser 3 Arg 13 Arg 5 U C A G U C A G Gly 3 U Gly 3 C Gly 6 A Gly 2 G The numbers indicate the number of times the individual codons are used in the coding region of the mrna. Ter, termination. script. The artwork is by David Scarff. The work was supported by the Baylor Center for Population Research and Reproductive Biology. 1. Abelev, G.. (1971) Adv. Cancer Res. 14, 295-358. 2. Gitlin, D., Perricelli, A. & Gitlin, G. M. (1972) Cancer Res. 32, 979-982. 3. Sell, S. & Gord, D. R. (1973) mmunochemistry 10, 439-442. 4. Law, S. W. & Dugaiczyk, A. (1981) Nature (London) 291, 201-205. 5. Jagodzinski, L. L., Sargent, T. D., Yang, M., Glackin, C. & Bonner, J. (1981) Proc. NatL Acad. Sci. USA 78, 3521-3525. 6. Gorin, M. B., Cooper, D. L., Eiferman, F., van de Rijn, P. & Tilghman, S. M. (1981)J. Biol Chem. 256, 1954-1959. 7. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J. & Rutter, W. J. (1979) Biochemistry 18, 5294-5299. 8. Taylor, J. M. & Tse, T. P. H. (1976) J. Biol Chem. 251, 7461-7467. 9. Laemmli, U. K. (1970) Nature (London) 227, 680-685. 10. Law, S., Tamaoki, T., Kreuzaler, F. & Dugaiczyk, A. (1980) Gene 10, 53-61. 11. Maxam, A. & Gilbert, W. (1980) Methods Enzymot 65, 499-560. 12. Sutcliffe, J. G. (1978) Nucleic Acids Res. 5, 2721-2728. 13. Azad, A. A. & Deacon, N. J. (1980) Nucleic Acids Res. 8, 4365-4376. 14. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980) Nucleic Acids Res. 8, 127-142. 15. Behrens, P. O., Spiekerman, A. M. & Brown, J. R. (1975) Fed. Proc. Fed. Am. Soc. Exp. Biol 34, 591 (abstr.). 16. Meloun, B., Moravek, L. & Kostka, V. (1975) FEBS Lett. 58, 134-137. 17. Brown, J. R. (1976) Fed. Proc. Fed. Am. Soc. Exp. Biol 35, 2141-2144. 18. Brennan, S. 0. & Carrell, R. W. (1978) Nature (London) 274, 908-909. 19. Modiano, G., Battistuzzi, G. & Motulsky, A. G. (1981) Proc. Natl Acad. Sci. USA 78, 1110-1114.