Shoshan, David H. MacLennan, and Donald S. Wood, which appeared in the August 1981 issue ofproc. NatL Acad. Sci. USA

Similar documents
Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Disease and selection in the human genome 3

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

ORFs and genes. Please sit in row K or forward

Project 07/111 Final Report October 31, Project Title: Cloning and expression of porcine complement C3d for enhanced vaccines

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Lecture 19A. DNA computing

Lecture 11: Gene Prediction

Codon Bias with PRISM. 2IM24/25, Fall 2007

Electronic Supplementary Information

Det matematisk-naturvitenskapelige fakultet

Primer Design Workshop. École d'été en géné-que des champignons 2012 Dr. Will Hintz University of Victoria

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

Hes6. PPARα. PPARγ HNF4 CD36

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis

Protein Structure Analysis

Supporting information for Biochemistry, 1995, 34(34), , DOI: /bi00034a013

PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells

Supplementary Figure 1A A404 Cells +/- Retinoic Acid

Supplementary. Table 1: Oligonucleotides and Plasmids. complementary to positions from 77 of the SRα '- GCT CTA GAG AAC TTG AAG TAC AGA CTG C

Arabidopsis actin depolymerizing factor AtADF4 mediates defense signal transduction triggered by the Pseudomonas syringae effector AvrPphB

PROTEIN SYNTHESIS Study Guide

Supplemental Data Supplemental Figure 1.

for Programmed Chemo-enzymatic Synthesis of Antigenic Oligosaccharides


Lezione 10. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer

Thr Gly Tyr. Gly Lys Asn

Supplemental Data. mir156-regulated SPL Transcription. Factors Define an Endogenous Flowering. Pathway in Arabidopsis thaliana

strain devoid of the aox1 gene [1]. Thus, the identification of AOX1 in the intracellular

Multiplexing Genome-scale Engineering

Expression of Recombinant Proteins

Dierks Supplementary Fig. S1

National PHL TB DST Reference Center PSQ Reporting Language Table of Contents

Supplementary Materials for

Add 5µl of 3N NaOH to DNA sample (final concentration 0.3N NaOH).

Supporting Information

Creation of A Caspese-3 Sensing System Using A Combination of Split- GFP and Split-Intein

Protein Synthesis. Application Based Questions

Supporting Online Information

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2006

Overexpression Normal expression Overexpression Normal expression. 26 (21.1%) N (%) P-value a N (%)

Supplemental material

Supplementary Information. Construction of Lasso Peptide Fusion Proteins

Supplemental Table 1. Mutant ADAMTS3 alleles detected in HEK293T clone 4C2. WT CCTGTCACTTTGGTTGATAGC MVLLSLWLIAAALVEVR

Supporting Information

Table S1. Bacterial strains (Related to Results and Experimental Procedures)

Genomic Sequence Analysis using Electron-Ion Interaction

Supplemental Data. Bennett et al. (2010). Plant Cell /tpc

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

Cat. # Product Size DS130 DynaExpress TA PCR Cloning Kit (ptakn-2) 20 reactions Box 1 (-20 ) ptakn-2 Vector, linearized 20 µl (50 ng/µl) 1

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

Introduction to Bioinformatics Dr. Robert Moss

FROM DNA TO GENETIC GENEALOGY Stephen P. Morse

DNA sentences. How are proteins coded for by DNA? Materials. Teacher instructions. Student instructions. Reflection

Converting rabbit hybridoma into recombinant antibodies with effective transient production in an optimized human expression system

Level 2 Biology, 2017

RPA-AB RPA-C Supplemental Figure S1: SDS-PAGE stained with Coomassie Blue after protein purification.

Evolution of protein coding sequences

Gene synthesis by circular assembly amplification

PCR analysis was performed to show the presence and the integrity of the var1csa and var-

Y-chromosomal haplogroup typing Using SBE reaction

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

evaluated with UAS CLB eliciting UAS CIT -N Libraries increase in the

Biomolecules: lecture 6

ΔPDD1 x ΔPDD1. ΔPDD1 x wild type. 70 kd Pdd1. Pdd3

Sampling Random Bioinformatics Puzzles using Adaptive Probability Distributions

ANCIENT BACTERIA? 250 million years later, scientists revive life forms

Supplementary Figure 1. Localization of MST1 in RPE cells. Proliferating or ciliated HA- MST1 expressing RPE cells (see Fig. 5b for establishment of

2

Supplemental Information. Human Senataxin Resolves RNA/DNA Hybrids. Formed at Transcriptional Pause Sites. to Promote Xrn2-Dependent Termination

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages

Quantitative reverse-transcription PCR. Transcript levels of flgs, flgr, flia and flha were

Legends for supplementary figures 1-3

SUPPORTING INFORMATION

Nucleic Acids Research

SUPPLEMENTARY INFORMATION

SUPPORTING INFORMATION FILE

Genomics and Gene Recognition Genes and Blue Genes

7.016 Problem Set 3. 1 st Pedigree

A Circular Code in the Protein Coding Genes of Mitochondria

Molecular Level of Genetics

Biomolecules: lecture 6

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

hcd1tg/hj1tg/ ApoE-/- hcd1tg/hj1tg/ ApoE+/+

Complexity of the Ruminococcus flavefaciens FD-1 cellulosome reflects an expansion of family-related protein-protein interactions

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins

Supplemental Table 1. Primers used for PCR.

Supporting Information

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

TRANSCRIPTION. Renáta Schipp

Additional Table A1. Accession numbers of resource records for all rhodopsin sequences downloaded from NCBI. Species common name

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones?

II 0.95 DM2 (RPP1) DM3 (At3g61540) b

Molecular cloning and nucleotide sequence of full-length cdna for sweet potato catalase mrna

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

Supplementary Figures

Transcription:

2124 Corrections Correction. n the article "Nucleotide sequence and the encoded amino acids of human serum albumin mrna" by Achilles Dugaiczyk, Simon W. Law, and Olivia E. Dennison, which appeared in the January 1982 issue ofproc. NatL. Acad. Sci. USA (79, 71-75), the first paragraph of Discussion was garbled by a printer's error. The correct paragraph is printed below. Determining the complete nucleotide sequence ofthe cdna has permitted us to identify the pre- and the propeptides of human serum albumin. Actually, the amino acid sequence of the propeptide has been reported for carriers of an abnormal albumin, Christchurch (18), which is longer than normal human albumin by six amino acids at the NH2 terminus. This additional hexapeptide has the sequence Arg-Gly-Val-Phe-Arg-Gln (18) and differs only in the terminal position from the sequence we are presently reporting for the apparently normal protein, Arg- Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albumin Christchurch must be carriers of a CGA to CAA mutation, which changes the codon for Arg to Gln in the last position of the propeptide. The altered protein consequently ceases to be a substrate for the specific protease that removes propeptides from secretory proteins. t is interesting to note, however, that failure to remove such a propeptide does not prevent the protein from being secreted; at least this is true about albumin Christchurch, which has reached the bloodstream of its carriers (18). Proc. NatL Acad. Sci. USA 79 (1982) Correction. n the article "A proton gradient controls a calcium-release channel in sarcoplasmic reticulum" by Varda Shoshan, David H. MacLennan, and Donald S. Wood, which appeared in the August 1981 issue ofproc. NatL Acad. Sci. USA (78, 4828-4832), the authors request that the following correction be noted. n the experiment reported in Fig. 5 and discussed on pages 4830 and 4831, the concentration of Ca2+ used to inhibit Ca + release was, in fact, 100,uM not 3.3 jum as reported. Consequently, although Ca2' release was measurably reduced by Ca2' in the experiments of Fig. 5, the data neither support nor preclude the possibility that physiological Ca2+ levels ('10,uM) can inhibit Ca2' release.

Proc. Nati Acad. Sci. USA Vol. 79, pp. 71-75, January 1982 Biochemistry Nucleotide sequence and the encoded amino acids of human serum albumin mrna (cdna clones/codon usage/prepropeptide/triple-domain structure) ACHLLES DUGACZYK, SMON W. LAW*, AND OLVA E. DENNSON Department of Cell Biology, Baylor College of Medicine, Texas Medical Center, Houston, Texas 77030 Communicated by James F. Bonner, Septemnber 3, 1981 ABSTRACT The complete nucleotide sequence of human serum albumin mrna has been determined from recombinant cdna clones and from a primer-extended cdna synthesis on the mrna template. The sequence is composed of 2078 nucleotides, starting upstream from a potential ribosome binding site in the 5' untranslated region. t contains all the translated codons and extends into the poly(a) at the 3' terminus. Part of the translated sequence codes for a hydrophobic prepeptide, Met--Trp-Val- Thr-Phe-le-Ser-Leu-Leu-Phe-Leu-Phe-Ser-Ser-Ala-Tyr-Ser, followed by a basic propeptide, Arg-Gly-Val-Phe-Arg-Arg. These signal peptides are absent from mature normal serum albumin and, so far, have not been identified in their nascent state in humans. A remaining 1755 nucleotides of the translated mrna sequence code for 585 amino acids, which are in agreement, with few exceptions, with the published amino acid sequence for human serum albumin. The mrna sequence verifies and refines the repeating homology in the triple-domain structure of the serum albumin molecule. The gene for serum albumin, which codes for the major plasma protein, is ofparticular interest to study because it is regulated in development. n mammals, serum albumin is synthesized by the adult liver, and its synthesis increases from low levels early in development to a high plateau in adulthood. The embryonic liver and yolk sac, on the other hand, produce predominantly a-fetoprotein, but the synthesis decreases drastically after birth (1-3). This inverse relationship between the expression of the two genes makes it an attractive problem in developmental biology, particularly because the two genes are related. We have recently determined the complete sequence of mouse a-fetoprotein mrna (4). The deduced a-fetoprotein structure revealed extensive homology to mammalian serum albumin, indicating that the two proteins are encoded in the same gene family. Similar conclusions have been reached by others from studies on rat (5) and mouse (6) a-fetoprotein genes. n the present effort we extend our studies to the human genome and report the cloning and sequence determination of DNA complementary to the human serum albumin mrna. We deduce the unknown amino acid sequence of the signal peptide and verify a repeating homology in the triple-domain structure of the human serum albumin molecule. METHODS mrna. Human liver mrna was obtained by the procedure of Chirgwin et al (7) from fetal livers removed at 17-20 weeks of gestation. mmunoprecipitation of albumin-containing polysomes was performed according to Taylor and Tse (8). n vitro translation of mrna was carried out in a cell-free reticulocyte The publication costs ofthis article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. 1734 solely to indicate this fact. 71 A B C am- -~ FG. 1. Autoradiogram of proteins, labeled with [assimethionine in an in vitro reticulocyte translation system, and separated electrophoretically in a sodium dodecyl sulfate/acrylamide gel (9). Lane A, no mrna added to the translation system. Lane B, translation products of total human liver mrna. Lane C, translation products of human liver mrna obtained from immunoprecipitated albumin-producing polysomes. Lane D, part of the translation sample that is separated in lane B was immunoprecipitated with antibody prior to its electrophoretic separation on the gel. system, following the instruction of the supplier (New England Nuclear). The translation products were separated electrophoretically according to Laemmli (9). Cloning and Sequence Determination of DNA. Doublestranded cdna has been cloned in the Pst site of the plasmid pbr322, as described in detail previously (4, 10). DNA sequence was determined according to the procedure of Maxam and Gilbert (11). RESULTS Enriched mrna. The products of in vitro translation of human liver mrna are shown in Fig. 1. On the basis of this translation, mrna that was enriched for serum albumin sequences was estimated to be over 50% pure (Fig. 1, lane C). This enriched mrna was used to obtain a cdna probe to screen the recombinant clones for serum albumin sequences. * Present address: National Heart, Lung and Blood nstitute, National nstitutes of Health, Bethesda, MD 20205. D

72 Biochemistry: Dugaiczyk et al. Recombinant Plasmids pha36 and pha206. A restriction endonuclease map of the largest positive clone, pha36, is shown in Fig. 2, together with a restriction map of the primerextended plasmid clone pha206. The latter was obtained in a second transformation experiment after initiating the cdna synthesis from an internal primer. This primer was a 91-basepair-long DNA fragment, Msp (152)-Taq (182/3), isolated from pha36. The two plasmids, pha36 and pha206, share 0.15 kilobase of homologous DNA. Together, they encode the entire sequence for human serum albumin, starting with the CTT codon for Leu at position - 10 of the prepeptide and extending into the 3' untranslated region of poly(a). Sequence of the Albumin cdna. The entire nucleotide sequence of the serum albumin mrna, as determined from the cloned DNA in pha36 and pha206 and from the primer-extended cdna at the 5' terminus of the message, is shown in Fig. 3. The inferred amino acid sequence is also indicated. The mrna length is 2078 nucleotides, of which 38 represent the 5' untranslated region, 54 identify a prepeptide of 18 amino acids, 18 identify a propeptide of 6 amino acids, 1755 code for the known 585 amino acids of serum albumin, 189 make up the 3' untranslated region, and 24 are the poly(a) sequence. Nucleotides 5 to 15 (-34 to -24) in the 5' untranslated region (Fig. 3) are complementary to a 3'-terminal region of eukaryotic 18S RNA (13) and thus could represent a ribosome binding site: (5')... T-T-C C-T-T-C-T-G-T... albumin mrna (3')... G-A-G-G-A-A-G-G-C-G-U-C-C-m62A-m6A... 18S RNA The translated portion of the mrna sequence codes for the signal peptide and the main body of the albumin polypeptide chain. Because prepeptides are removed from nascent secretory proteins (such as albumin) in the endoplasmic reticulum, and the conversion ofproalbumin to albumin takes place in the Golgi vesicles, the presence and the sequence of the signal peptide for normal human serum albumin have not been reported previously. At the 3' end of the message, the putative polyadenylylation signal sequence A-A-T-A-A-A, is located 16 nucleotides upstream from the beginning of the poly(a) sequence. Another characteristic sequence located near the polyadenylylation site has been identified by Benoist et al. (14); the consensus sequence from several mrnas was T-T-T-T-C-A-C-T-G-C. A Proc. Nad Acad. Sci. USA 79 (1982) similar sequence, T-T-T-T-C-T-C-T-G-T, is located 19 nucleotides upstream from the A-A-T-A-A-A hexanucleotide in the human albumin mrna molecule (Fig. 3). Primary Structure of Human Serum Albumin. Amino acid sequences for human serum albumin have been published by Behrens et al. (15) and by Meloun et al (16). (See Fig. 4.) There are a few discrepancies between the two reports, often involving sequences surrounding cysteine residues. Perhaps the most serious in terms of the structure of the albumin molecule is the sequence around the 17th and 18th cysteines, because the presence (15) or absence (16) of other amino acids between the two cysteines would affect the polypeptide loops generated by the disulfide bonds ofthese two cysteines. Our results in this critical region are shown by the sequencing gel in Fig. 5, which permits an identification of residues 261 to 290. The 17th and 18th cysteines are residues 278 and 279, and there are clearly no other amino acids between them. Consequently, such a sequence arrangement establishes the near-perfect homology in the triple-domain structure of the albumin molecule (Fig. 4). Our sequence results are in better agmeement with those of Meloun et al (16), although several discrepancies remain. Amino acid positions 94 (Gln), 95 (), 97 (Gly), 170 (Gln), 464 (His), 465 (), and 501 () are specified (16) as, 94 (), 95 (Gln), 97 (), 170 (), 464 (), 465 (His), and 501 (Gln). Residues 364-370 (Fig. 3), Ala-Asp-Pro-His---Tyr, are specified (16) as His-Asp-Pro-Tyr---Ala. There are several.possible explanations for these differences, but we do not think the differences are due to erroneous DNA sequence determinations. DSCUSSON Determining the complete nucleotide sequence of the cdna has permitted us to identify the pre- and the propeptides of human serum albumin. Actually, the amino acid sequence of the propeptide. The altered protein consequently ceases to be a substrate for the specific'protease that removes propeptides from secretory proteins. t is interesting to note, however, that failure to remove such a propeptide does not prevent the protein from being secreted; at least this is 'true about albumin are presently reporting for the apparently normal protein, Arg- Gly-Val-Phe-Arg-Arg-albumin (Fig. 3). Thus, carriers of albumin Christchurch must be carriers of a CGA to CAA mutation, which changes the codon for Arg to Gln in the last position of HpaU (3658) J 1 Taq 1 5' Psol bombo (3611) 16/7 31 Hpa 1 (3548) Pat 182/3 (3611) Taq, H1%~~~~~Hn 1 57 131 144/5 162 107/8 Hinf Mbo Mbo Mu 1-1 [3.Pst s0 (3611) Hp&( A2 (3646) 270 325/6 37416 382 419/0 46011 Taq Mbo Taq Mlo HknU Mbo 1L11 H 26910 Kiobases l - 493/4 Taq Hha sat1 364 406/7 HM H*n41 531/2 472/3 479/0 pha36 O.1.2.3 A.5.6.7.8.9 1.0 1.1 12 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 FG. 2. Restriction endonuclease cleavage map of the overlapping human albumin cdna inserts in the recombinant plasmids pha36 and pha26.' The map shows the inserted albumin DNA (opent bar) and its orientation in-the pbr322 plasmid DNA (black bar). The mrna sequence is presented by the upper DNA strands, with the 5' and 3' termini as indicated. Numbers on restriction sites within the human-derived'dna correspond to amino- acid positions in the albumin molecule. Numbers such as 182/3 indicate that the restriction sequence overlaps two amino acids. The Taq (270) site is actually not cleaved in our system'by Taq, due to methylation of this sequence to T-C-G-m6A. The inserted sequence in pha206 starts with the'ctt codon for Leu at position -10 of the prepeptide. Numbers on restriction sites within the pbr322 sequence are taken from available sequence data from Sutcliffe (12). One of the two Pst sites in each recombinant plasmid, indicated by *, has not been reconstituted due to loss of a terminal A in the Pst'-linearized plasmid DNA. 666 term TAA Hhdlll * Pot (3611) 34 Hhfl (3668)

Biochemistry: Dugaiczyk et al. Proc. Nati. Acad. Sci. USA 79 (1982) 73-18 p r e -10 Met lys trp val thr phe ile ser leu leu phe leu phe ser (A)GCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACA ATG AAG TGG GTA ACC TTT ATT TCC CTT CTT m CTC m AGC (80) -1-6 p r o -1 1 10 20 ser ala tyr ser arg gly val phe arg arg asp ala his lys ser glu val ala his arg phe lys asp leu gly glu glu asn phe lys TCG GCT TAT TCC AGG GGT GTG m CGT CGA GAT GCA CAC AAG AGT GAG GTT GCT CAT CGGM AAA GAT TTG GGA GAA GAA AAT TTC AAA (170) 21 30 34 40 50 ala leu val leu ile ala phe ala gin tyr leu gin gin cys pro phe glu asp his val lys leu val am glu val thr glu phe ala GCC TTG GTG TTG ATT GCC TTT GCT CAG TAT CTT CAG CAG TGT CCA m GAA GAT CAT GTA AAA TTA GTG AAT GAA GTA ACT GAA TTT GCA (260) 51 53 60 62 70 75 80 lys thr cys val ala asp glu ser ala glu asn cys asp lys ser leu his thr leu phe gly asp lys leu cys thr val ala thr leu AAA ACA TGT GTT GCT GAT GAG TCA GCT GAA AAT TGT GAC AAA TCA CTT CAT ACC CTTm GGA GAC AAA TTA TGC ACA GTT GCA ACT CTT (350) 81 90 91 100 101 110 arg glu thr tyr gly glu met ala asp cys cys ala lys gin glu pro gly arg asn glu cys phe leu gin his lys asp asp asn pro CGT GAA ACC TAT GGT GAA ATG GCT GAC TGC TGT GCA AAA CM GAA CCT GGG AGA MT GM TGC TTC TTG CM CAC MA GAT GAC AAC CCA (440) 111 120 124 130 140 asn leu pro arg leu val arg pro glu val asp val met cys thr ala phe his asp asn glu glu thr phe leu lys lys tyr leu tyr MC CTC CCC CGA TTG GTG AGA CCA GAG GTT GAT GTG ATG TGC ACT GCTm CAT GAC MT GM GAG ACA TTT TTG AAA MA TAC TTA TAT (530) 141 150 160 168 169 170 glu ile ala arg arg his pro tyr phe tyr ala pro glu leu leu phe pbe ala lys arg tyr lys ala ala phe thr glu cys cys gin GAA ATT GCC AGA AGA CAT CCT TAC TTT TAT GCC CCG GAA CTC CTT TTCm GCT MA AGG TAT MA GCT GCT m ACA GAA TGT TGC CM (620) 171 177 180 190 200 ala ala asp lys ala ala cys leu leu pro lysleu asp glu lie arg asp glu gly lys ala ser ser ala lys gin arg leu lys cys GCT GCT GAT MA GCT GCC TGC CTG TTG CCA MG CTC GAT GM CTT CGG GAT GM GGG AAG GCT TCG TCT GCC MA CAG AGA CTC AAG TGT (710) 201 210 220 230 ala ser leu gin lys pbe gly glu arg ala phe lys ala trp ala val ala arg leu ser gin arg phe pro lys ala glu phe ala glu GCC AGT CTC CAA MA TTT GGA GM AGA GCT TTC MA GCA TGG GCA GTA GCT CGC CTG AGC CAG AGA TTT CCC AAA GCT GAG TTT GCA GAA (800) 231 240 245 246 250 253 260 val ser lys leu val thr asp leu thr lys val his thr glu cys cys his gly asp leu leu glu cys ala asp asp arg ala asp leu GTT TCC AAG TTA GTG ACA GAT CTT ACC MA GTC CAC ACG GAA TGC TGC CAT GGA GAT CTG CTT GM TGT GCT GAT GAC AGG GCG GAC CTT (890) 261 265 270 278 279 280 289 290 ala lys tyr se cys glu asn gin asp ser ile ser ser lys leu lys glu cys cys glu lys pro leu leu glu lys ser his cys ile GCC AAG TAT ATC TGT GAA MT CAA GAT TCG ATC TCC AGT AAA CTG MG GM TGC TGT GAA MA CCT CTG TTG GM AAA TCC CAC TGC ATT (980) 291 300 310 316 320 ala glu val glu asn asp glu met pro ala asp leu pro ser leu ala ala asp phe val glu ser lys asp val cys lys asn tyr ala GCC GAA GTG GM MT GAT GAG ATG CCT GCT GAC TTG CCT TCA TTA GCT GCT GAT TTT GTT GAA AGT AAG GAT GTT TGC AM MC TAT GCT(1070) 321 330 340 350 glu ala lys asp val phe leu gly met phe leu tyr glu tyr ala arg arg his pro asp tyr ser val val leu leu leu arg leu ala GAG GCA AAG GAT GTC TTC TTG GGC ATG m TTG TAT GAA TAT GCA AGA AGG CAT CCT GAT TAC TCT GTC GTG CTG CTG CTG AGA CTT GCC(1160) 351 360 361 369 370 380 lys thr tyr glu thr thr leu glu lys eys cys ala ala ala asp pro his glu cys tyr ala lys val phe asp glu phe lys pro leu MG ACA TAT GM ACC ACT CTA GAG MG TGC TGT GCC GCT GCA GAT CCT CAT GAA TGC TAT GCC MA GTG TTC GAT GM U AA CCT CTT(1250) 381 390 392 400 410 val glu glu pro gin asn leu ile lys gin asn cys glu leu phe glu gin leu gly glu tyr lys phe gin asn ala leu leu val arg GTG GAA GAG CCT CAG AAT TTA ATC MA CAA MT TGT GAG CTT m GAG CAG CTT GGA GAG TAC MA TTC CAG MT GCG CTG TTA GTT CGT(1340) 411 420 430 437 438 440 tyr thr lys lys val pro gin val ser thr pro thr leu val glu val ser arg asn leu gly lys val gly ser lys eys cys lys his TAC ACC MG MA GTA CCC CAA GTG TCA ACT CCA ACT CTT GTA GAG GTC TCA AGA MC CTA GGA MA GTG GGC AGC AM TGT TGT AM CAT(1430) 441 448 450 460 461 470 pro glu ala lys arg met pro eys ala glu asp tyr leu ser val val leu asn gin leu cys val leu his glu lys thr pro val ser CCT GM GCA AAA AGA ATG CCC TGT GCA GM GAC TAT CTA TCC GTG GTC CTG AAC CAG TTA TGT GTG TTG CAT GAG AAA ACG CCA GTA AGT(1520) 471 476 477 480 490 500 asp arg val thr lys cys cys thr glu ser leu val asn arg arg pro cys phe ser ala leu glu val asp glu thr tyr val pro lys GAC AGA GTC ACC AAA TGC TGC ACA GM TCC TTG GTG MC AGG CGA CCA TGC TTT TCA GCT CTG GM GTC GAT GM ACA TAC GTT CCC AAA(1610) 501 510 514 520 530 glu pbe asn ala glu thr phe thr phe his ala asp ile cys thr leu ser glu lys glu arg gin ile lys lys gin thr ala leu val GAG mu MT GCT GAA ACA TTC ACC TTC CAT GCA GAT ATA TGC ACA CTT TCT GAG MG GAG AGA CAA ATC MG AM CM ACT GCA CTT GTT(1700) 531 540 550 558 559 560 glu leu val lys his lys pro lys ala thr lys glu gin leu lys ala val met asp asp phe ala ala phe val glu lys cys cys lys GAG CTC GTG MA CAC MG CCC MG GCA ACA AM GAG CM CTG MA GCT GTT ATG GAT GAT TTC GCT GCT TTT GTA GAG MG TGC TGC MG(1790) 561 567 570 580 ala asp asp lys glu thr cys phe ala glu glu gly lys lys leu val ala ala ser gln ala ala leu gly leu ter ter GCT GAC GAT MG GAG ACC TGC m GCC GAG GAG GGT AM MA CTT GTT GCT GCA AGT CAA GCT GCC TTA GGC TTA TM CATCACAUMAAMG(1883) ter ter CATCTCAGCCTACCATGAGAATAAGAGAAAGAAAATGAAGATCAAAAGCTTATTCATCTGTTTTTCTTTTTCGTTGGTGTAAAGCCAACACCCTGTCTAAAAAACATAAATT CTTT (2oo2) TCATTTTGCCTCTTTTCTCTGTGCTTCMTMTAAMAATGGAAAGMTCTM... 20. M (2078) FG. 3. Nucleotide sequence of human serum albumin mrna as determined from cloned cdna. The sequence 5'-upstream of the Leu at -10, which is not contained in pha206, was determined as follows. A short DNA fragment, containing the Taq site (Arg at -1 in the propeptide), was obtained from the 5' end of the albumin DNA in pha206 (Fig. 2). The fragment was terminated by the Alu sequence (Ser at -5 in the prepeptide) and the Mnl sequence (-6), and it was 32P-labeled at the 5' terminus of the latter. This fragment was annealed with human liver mrna, and the resulting RNADNA template/primer was used in the reverse transcriptase reaction to synthesize a cdna copy extended into the 5' terminus of the mrna template. This cdna was separated from the deoxyribonucleoside triphosphates on a Sephadex G-50 column and used directly for sequence determination. The first A residue was determined from genomic DNA. The amino acid sequence shown is deduced from the nucleotide sequence. ter, Termination. the propeptide. The altered protein consequently ceases to be Christchurch, which has reached the bloodstream of its carriers a substrate for the specific protease that removes propeptides (18). from secretory proteins. t is interesting to note, however, that The nucleotide sequence of the albumin mrna has also perfailure to remove such a propeptide does not prevent the pro- mitted us to refine the repeated homology in the triple-domain tein from being secreted; at least this is true about albumin structure of the human albumin molecule. The fact that cys-

74 Biochemistry: Dugaiczyk et alp ji~~~~~~~h Proc. Natl. Acad. Sci. USA 79 (1982) - _ mma Domain A)~~~~~~~~~~~~~~~~1 Doai 1 ~~~~~~~~~~~ k Domain11E { FG. 4.A ioai _iae eune n iufd odn atr fhmnsrmabmn rpedmi tutr fitra oooyi n varw.tesqec ersnsordeetdt:telvu s codn obon(7.nt h erdretsmer ftedslie bridges throughout the molecule.

Biochemistry: Dugaiczyk et al 290 lle His Ser Leu Leu Pro 276 281 4 X r cys O k 2 Leu k Ser Ser 'le ;Ser Asp Gn Asn lle Tyr Ala 261 FG. 5. Autoradiogram of a DNA sequencing gel showing the region coding for the 17th and 18th cysteines of human serum albumin; the two cysteines are residues 278 and 279 and there are no other amino acids between them. teines 278 and 279 are not separated by other amino acids, as was originally thought (15), brings the structure of domain closer to the structures of the remaining two domains (Fig. 4). As nucleotide sequences of genes or mrnas become available, they prompt renewed speculations about codon usage, because some insight might be gained into rates of mutation or other evolutionary forces, from any nonrandom pattern of codons used in a given gene, or a given species. For example, analyzing published nucleotide sequence data of human a- and,b3globin genes, Modiano et al (19) noticed that when several degenerate codons specify one amino acid, the codon that gives rise to a termination codon in a one-step mutation is never used. These "dispensable pretermination codons" are avoided, it was argued (19), in order to reduce the risk to mutate to termination. Such an argument implicitly postulates that evolution is anticipatory. This argument, however, receives no support from the distribution of codons utilized on the serum albumin gene. Table 1 summarizes the prevalence ofcodons utilized for each of the amino acids in human serum albumin, including the presequence and prosequence. There is no evidence for discrimination against the seven dispensable pretermination codons: UUA, UUG, UCA, UCG, CGA, AGA, and GGA. We thank Dr. Paul C. MacDonald and Dr. Evan R. Simpson from the University of Texas Southwest Medical Center, Dallas, for kindly providing human fetal liver samples. We also thank Dr. Brian J. McCarthy and Dr. Arthur D. Riggs for critical reading of the manu- c A Proc. Natd Acad. Sci. USA 79 (1982) 75 Table 1. Codon usage in human serum albumin mrna U C A G Phe 25 Ser 3 Tyr 13 15 U u Phe 10 Ser 7 Tyr 6 20 C Leu 10 Ser 6 Ter 1 Ter 0 A Leu 13 Ser 3 Ter 0 Trp 2 G Leu 19 Pro 10 His 11 Leu 7 Pro 6 His 5 Leu 3 Pro 7 Gln 11 Leu: 12 Pro 1 Gln 9 ile fle ile Met 4 Thr 7 Asn 11 4 Thr 9 Asn 6 1 Thr 11 40 7 Thr 2 20 Val 12 Ala 31 G Val 7 Ala 14 Val 8 Ala 16 Val 16 Ala 2 Asp 25 Asp 11 38 23 Arg 3 Arg 1 Arg 3 Arg 2 Ser 6 Ser 3 Arg 13 Arg 5 U C A G U C A G Gly 3 U Gly 3 C Gly 6 A Gly 2 G The numbers indicate the number of times the individual codons are used in the coding region of the mrna. Ter, termination. script. The artwork is by David Scarff. The work was supported by the Baylor Center for Population Research and Reproductive Biology. 1. Abelev, G.. (1971) Adv. Cancer Res. 14, 295-358. 2. Gitlin, D., Perricelli, A. & Gitlin, G. M. (1972) Cancer Res. 32, 979-982. 3. Sell, S. & Gord, D. R. (1973) mmunochemistry 10, 439-442. 4. Law, S. W. & Dugaiczyk, A. (1981) Nature (London) 291, 201-205. 5. Jagodzinski, L. L., Sargent, T. D., Yang, M., Glackin, C. & Bonner, J. (1981) Proc. NatL Acad. Sci. USA 78, 3521-3525. 6. Gorin, M. B., Cooper, D. L., Eiferman, F., van de Rijn, P. & Tilghman, S. M. (1981)J. Biol Chem. 256, 1954-1959. 7. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J. & Rutter, W. J. (1979) Biochemistry 18, 5294-5299. 8. Taylor, J. M. & Tse, T. P. H. (1976) J. Biol Chem. 251, 7461-7467. 9. Laemmli, U. K. (1970) Nature (London) 227, 680-685. 10. Law, S., Tamaoki, T., Kreuzaler, F. & Dugaiczyk, A. (1980) Gene 10, 53-61. 11. Maxam, A. & Gilbert, W. (1980) Methods Enzymot 65, 499-560. 12. Sutcliffe, J. G. (1978) Nucleic Acids Res. 5, 2721-2728. 13. Azad, A. A. & Deacon, N. J. (1980) Nucleic Acids Res. 8, 4365-4376. 14. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980) Nucleic Acids Res. 8, 127-142. 15. Behrens, P. O., Spiekerman, A. M. & Brown, J. R. (1975) Fed. Proc. Fed. Am. Soc. Exp. Biol 34, 591 (abstr.). 16. Meloun, B., Moravek, L. & Kostka, V. (1975) FEBS Lett. 58, 134-137. 17. Brown, J. R. (1976) Fed. Proc. Fed. Am. Soc. Exp. Biol 35, 2141-2144. 18. Brennan, S. 0. & Carrell, R. W. (1978) Nature (London) 274, 908-909. 19. Modiano, G., Battistuzzi, G. & Motulsky, A. G. (1981) Proc. Natl Acad. Sci. USA 78, 1110-1114.