Nucleic Acids Research

Similar documents
Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Codon Bias with PRISM. 2IM24/25, Fall 2007

Lecture 19A. DNA computing

Protein Synthesis. Application Based Questions

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

PROTEIN SYNTHESIS Study Guide

PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells

Biomolecules: lecture 6

ORFs and genes. Please sit in row K or forward

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

Level 2 Biology, 2017

Disease and selection in the human genome 3

DNA sentences. How are proteins coded for by DNA? Materials. Teacher instructions. Student instructions. Reflection

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

ANCIENT BACTERIA? 250 million years later, scientists revive life forms

Molecular Level of Genetics

How life. constructs itself.

Biomolecules: lecture 6

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones?

NAME:... MODEL ANSWER... STUDENT NUMBER:... Maximum marks: 50. Internal Examiner: Hugh Murrell, Computer Science, UKZN

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR

Protein Synthesis: Transcription and Translation

Chemistry 121 Winter 17

Chapter 3: Information Storage and Transfer in Life

A Zero-Knowledge Based Introduction to Biology

Just one nucleotide! Exploring the effects of random single nucleotide mutations

7.016 Problem Set 3. 1 st Pedigree

Project 07/111 Final Report October 31, Project Title: Cloning and expression of porcine complement C3d for enhanced vaccines

Human Gene,cs 06: Gene Expression. Diversity of cell types. How do cells become different? 9/19/11. neuron

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

Evolution of protein coding sequences

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins

Det matematisk-naturvitenskapelige fakultet

Fishy Amino Acid Codon. UUU Phe UCU Ser UAU Tyr UGU Cys. UUC Phe UCC Ser UAC Tyr UGC Cys. UUA Leu UCA Ser UAA Stop UGA Stop

Important points from last time

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013

Lecture 11: Gene Prediction

Describe the features of a gene which enable it to code for a particular protein.

Electronic Supplementary Information

Honors packet Instructions

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

Lezione 10. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

Supporting information for Biochemistry, 1995, 34(34), , DOI: /bi00034a013

p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION

Primer Design Workshop. École d'été en géné-que des champignons 2012 Dr. Will Hintz University of Victoria

Supplemental Data Supplemental Figure 1.

Expression of Recombinant Proteins

Supplementary. Table 1: Oligonucleotides and Plasmids. complementary to positions from 77 of the SRα '- GCT CTA GAG AAC TTG AAG TAC AGA CTG C

Protein Structure Analysis

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer

TRANSCRIPTION. Renáta Schipp

Figure S1. Characterization of the irx9l-1 mutant. (A) Diagram of the Arabidopsis IRX9L gene drawn based on information from TAIR (the Arabidopsis

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

DNA Begins the Process

Arabidopsis actin depolymerizing factor AtADF4 mediates defense signal transduction triggered by the Pseudomonas syringae effector AvrPphB

Genomics and Gene Recognition Genes and Blue Genes


Supplementary Figure 1A A404 Cells +/- Retinoic Acid

National PHL TB DST Reference Center PSQ Reporting Language Table of Contents

Table S1. Bacterial strains (Related to Results and Experimental Procedures)

FROM DNA TO GENETIC GENEALOGY Stephen P. Morse

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

Hes6. PPARα. PPARγ HNF4 CD36

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Phosphate and R2D2 Restrict the Substrate Specificity of Dicer-2, an ATP- Driven Ribonuclease

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2006

Supplementary Materials for

A 2D graphical representation of the sequences of DNA based on triplets and its application

Supplementary Information. Construction of Lasso Peptide Fusion Proteins

Genes and Proteins. Objectives

for Programmed Chemo-enzymatic Synthesis of Antigenic Oligosaccharides

RPA-AB RPA-C Supplemental Figure S1: SDS-PAGE stained with Coomassie Blue after protein purification.

Supplemental material

7.013 Problem Set

Cat. # Product Size DS130 DynaExpress TA PCR Cloning Kit (ptakn-2) 20 reactions Box 1 (-20 ) ptakn-2 Vector, linearized 20 µl (50 ng/µl) 1

Worksheet: Mutations Practice

Enduring Understanding

Supplement 1: Sequences of Capture Probes. Capture probes were /5AmMC6/CTG TAG GTG CGG GTG GAC GTA GTC

It has not escaped our notice that the specific paring we have postulated immediately suggest a possible copying mechanism for the genetic material

Thr Gly Tyr. Gly Lys Asn

Y-chromosomal haplogroup typing Using SBE reaction

Supporting Online Information

Multiplexing Genome-scale Engineering

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

Dierks Supplementary Fig. S1

MacBlunt PCR Cloning Kit Manual

ΔPDD1 x ΔPDD1. ΔPDD1 x wild type. 70 kd Pdd1. Pdd3

Supplemental Data. mir156-regulated SPL Transcription. Factors Define an Endogenous Flowering. Pathway in Arabidopsis thaliana

Supplementary Figure 1. Ratiometric fluorescence visualization of DNA cleavage by

strain devoid of the aox1 gene [1]. Thus, the identification of AOX1 in the intracellular

Creation of A Caspese-3 Sensing System Using A Combination of Split- GFP and Split-Intein

The Monster Mash A lesson about transcription and translation By Michelle Kelly, Donald Huesing, & Heather Miller

Transcription:

Volume 14 Number 16 1986 Nucleic Acids Research Nucleotide sequence of the Eschenchia coli replication gene dnazx Kuo-Chang Yin, Aleksandra Blinkowa and James R.Walker Department of Microbiology, University of Texas, Austin, TX 78712-1095, USA Received 16 June 1986; Accepted 17 July 1986 ABSTRACT The Escherichia coli 2.2 kilobase dnazx region contains one 1929 nucleotide reading frame which directs the synthesis of two protein products involved in DNA polymerization. The larger consists of 643 amino acids in a deduced 71,114 dalton chain which could be the T subunit of DNA polymerase. The smaller, the DNA polymerase Y subunit, is encoded by the same reading frame as the larger. The dnazx sequence contains a region homologous to ATP binding sites, suggesting that these replication factors are adenine nucleotide binding proteins. NTRODUCTON The Escherichia coli dnaz and dnax genes were originally thought to be separate but closely linked on the chromosome (1). However, Mullin et al. (2) and Kodaira et al. (3) cloned Z and X complementing activities on a 2.2 kilobase pair (kb) fragment, which has a maximum coding capacity of 80,000 daltons of protein in one reading frame, and showed that this region directs the synthesis of two proteins, one of about 75,000 daltons, the other of 56,500. There are two clues that the larger protein, the X product, might be cleaved to generate the smaller, the Z protein. First, peptide mapping after limited proteolysis showed that the smaller is part of the larger (3). Second, a dnaz+x+ plasmid directs the synthesis of both in minicells but only the larger in an in vitro coupled transcription-translation system, suggesting that a processing enzyme is lacking in the in vitro system (2). The 75,000 dalton dnax protein migrates during electrophoresis with purified DNA polymerase holoenzyme component T and it is possible, therefore, that the dnax protein is the T subunit (2,3). The 56,500 dalton dnaz protein was identified as holoenzyme subunityby demonstrations that extracts of a temperature-sensitive (Ts) dnaz imutant could be complemented by purified holoenzyme component Y (4,5) and that cloned dnaz fragments stimulated overproduction of y activity (4,6,7). To determine how the dnazx region directs the synthesis of both Z and X proteins, the 2.2 kb region has been sequenced. A single open reading frame encodes a (deduced) polypeptide of 71,114 daltons. This result rules out the possibility that different reading frames direct the synthesis of the two products and supports models in which the smaller dnaz protein is identical to a portion of the larger dnax protein. R L Press Limited, Oxford, England. 6541

METHODS Phage and Plasmid M13mp9 phage (8) was obtained from New England Biolabs. The pjh16 plasmid (2) consists of a 3.0 kb dnaz+x+ EcoR-Pst fragment cloned in pbr322. DNA Purification Plasmids and M13 replicative form DNA were extracted by the rapid alkaline procedure (9). M13 single strand DNA was prepared by phenol extraction (10). Enzymes Restriction enzymes, T4 DNA ligase, Bal3l nuclease, and the Klenow fragment of DNA Polymerase were obtained from New England Biolabs or Bethesda Research Labs and used in accordance with the supplier's instructions, except that ligations were conducted overnight at 100C. Nucleotide Sequencing Fragments cloned in M13mp9 were sequenced by the chain terminator method (11). The strategy for sequencing the dnazx region was based on the generation of overlapping fragments by Bal3l nuclease digestion (12). DNA of the dnaz+x+ plasmid pjh16 (2) (Fig. 1) was opened with Hind, digested with Bal3l nuclease for various intervals, and treated with Pst to create a set of fragments with one blunt end and one Pst end. For sequencing, these fragments were cloned into phage M13mp9 DNA which had been cut with Hinc (blunt end) and Pst. The sequence of the opposite strand was determined by opening plasmid pjh16 with Pst, Bal3l digestion, cleavage with Hind, and cloning the resulting fragments into M13mp9 cut with Hinc and Hind. RESULTS The Sma-Pvu fragment which contains both dnaz+ and X+ complementing activities consists of 2218 base pairs which contain only one long open reading frame (Fig. 2). A potential initiating ATG at position 142 is followed by an open reading frame, which would encode 71,114 daltons of protein, and a TGA terminator. The other five possible reading frames all contain multiple stop codons; the next longest potential polypeptide could contain 167 amino acids with a molecular weight of 18,400. Presumably 71,114 daltons is the correct size of the dnax polypeptide previously reported as 75,000 (2) to 83,000 daltons (13). Thus, both the Z and X products are generated from one reading frame. The amino acid composition of the deduced dnax protein reflects a slightly acidic protein of p 6.72 (Table 1). The amino acid composition of a purified preparation of y (the Z product) has been reported (4); however the reported composition differs considerably from the deduced composition of Z product, irrespective of whether Z overlaps the N-, as reported (3), or C-terminal ends of the X product. Analysis of the deduced amino acid sequence revealed a region likely to be an adenine nucleotide binding site. Walker et al. (14) reported that proteins which bind adenine nucleotides contain sequence A which is Gly-X-X- 6542

EH Sm N Bs Bs Pv P ~ ~ ~ ~ 0......... Ị.. 1.. 0 0.5 1.0 1.5 2.0 kb Fig. 1. Strategy for sequencing the dnazx region. The heavy line represents chromosomal DNA present in plasmid pjh16 (2). pjh16 was opened with Hind, treated with Bal31 nuclease, and cut with Pst. The resulting fragments were cloned into the Pst-Hinc sites of M13mp9 for sequencing the regions indicated by the rightward arrows. For sequencing the complementary strand, pjh16 was opened with Pst, digested with Bal3l nuclease, and cut with Hind. The resulting fragments were cloned into Hind-Hinc cut M13mp9 for sequencing the regions indicated by the leftward arrows. E, EcoR; H, Hind; Sm, Sima; N, Nde; B, BstE; Pv, Pvu; Ps, Pst; Hc, Hinc; B, BamH; H, Hind. Gly-X-Gly-Lys-Thr-X-X-X-X-X-X-le/Val, in most cases preceeded five or six residues upstream by a basic amino acid. Residues 45 to 59 of the deduced dnax product are Gly-X-X-Gly-X-Gly-Lys-Thr-X-X-X-X-X-X-Ala. n addition, a histidine is located six residues upstream. Although most such sequences contain isoleucine or valine in the last position, the dnax protein has an alanine in position 59. Another exception to the consensus sequences occurs in the rho protein adenine nucleotide binding site, which also has an alanine, rather than isoleucine or valine, as the last amino acid (15). nasmuch as the X protein probably originates from the N-terminal portion of the ZX gene (3), both the X and Z products contain nucleotide binding sites. t has previously been reported (16) that the T and y subunits of holoenzyme bind ATP and datp. Adenine nucleotide binding proteins sometimes contain a sequence B (14) to which there was no homology in the dnax sequence. The proposed dnazx reading frame is preceded by a potential promoter consisting of a -35 sequence separated from the -10 sequence by 17 base pairs (Fig. 2). A potential Shine Dalgarno AGAG sequence is located seven base pairs upstream of the ATG initiating codon. The codon usage predicts that dnazx is a weakly expressed region. As reviewed by Grosjean and Fiers (17), strongly expressed genes have codon distributions different from those of weakly expressed genes. The codons NNC are favored over NNU for phenylalanine, tyrosine, asparagine, and isoleucine in genes for abundant proteins. n weakly expressed genes, NNC codons are favored over NNU for proline, arginine, glycine, and alanine (Table 2) (17). The dnazx open reading frame distribution of NNC and NNU codons is typical of weakly expressed genes except for the asparagine codons (Table 2). Another characteristic of genes with high expression efficiencies is the 6543

10 20 30 40 50 60 70 80 90 * * * _ * _ * * ** * * CCC GCG CCA TTA ATT ATC GCC ACT CTT GTG CTG CCC ACC CTA CGC ACA GCA CM GAT GTG CAT TCA GCC TCC CCC TTC TCA CGG GCC TCT Pro Gly His *** 100 110 120 130 140 150 160 170 180 GTT ACC ATT ACC CCT TCG TCA ATC CAC CTT CCA CCC mt CAG AGC CTC CCA ATC AGT TAT CAG CTC TTA GCC CGA MA TCC CGC CCA CAA Met Ser Tyr Gln Val Leu Al& Arg Lys Trp Arg Pro Gln 190 200 210 220 230 240 250 260 270 ACC TTT CCT GAC GTC GTC GCC CAC GM CAT CTC CTC ACC GCA CTC GCC MC GCC TTC TCG TTA CCC CGT ATr CAT CAT GCT TAT CTT TTT Thr Phe Ala Asp Val Vai Cly Gln Glu His Vai Leu Thr Ala Leu Ala Asn Gly Leu Ser Leu Gly Arg le His His Ala r Leu Ph 280 290 300 310 320 330 340 350 360 TCC CCC ACC CCT GCCC CTGCA AM ACC tct ATC GCC CCA CTC CG CCC MC CCC CTA AAC TCC GM ACC GCC AST ACC CCG ACC CCC TGC Ser Gly Thr Arg Gly Val Gly Lye Thr Ser le Ala Arg Leu Leou Ala Lys Gly Leu Asn Cys Glu Thr Gly le Thr Ala Thr Pro Cys A A A*aA / A a A aaa a\a 370 380 390 400 410 420 430 440 450 * * * * * * * a a GCC GTG TCC GAT AAC TCT CCT GM ATC GAG GAG CCC CGC TTT CTC CAT CTC ATr CM ATC GAC GCC GCC TOC CCC ACC AMA GTT CM CAT Gly Val Cym Asp Asn Cym Arg Glu le Clu Gln Gly Arg Phe Val Asp Leu le Glu le Asp Ala Ala Ser Arg Thr Lye Val Clu Asp 460 470 480 490 500 510 520 530 540 * * * * * a a * * ACC CGC GAC CTG CTC CAT MC GTC CAG TAC CCT CC CCC CGT GCC CCT TTC AMA GTC TAT CTC ATC GAC GM GTG CAT ATC CGG TCG CCC Thr Arg Asp Leu Leu Asp Asn Val Gln Tyr Ala Pro Ala Arg Gly Arg Phe Lys Val Tyr Leu le Asp Glu Val Die Met Leou Ser Arg 550 560 570 580 590 600 610 620 630 CAC AGC TTT MC GCA CTG TTA MA ACC CTT CAA GAG CCG CCC GAG CAC CTT AAC TT CGG CTG CCC ACC ACC CAT CCA CAG AAA TTG CCC His Ser Phe Asn Ala Leu Leu Lys Thr Leu Glu Glu Pro Pro Glu His Val Lye Phe Leou Leu Ala Thr Thr Asp Pro Gin Lys Lou Pro 640 650 660 670 680 690 700 710 720 * * * * a a a * * GTC ACG ATT TTG TCA CCC TGT CTG CAAmTT CAT CTC MC GCG CTG GAT GTC GAG CAA ATT CGC CAT CAG CTT GAG CAC ATC CTC MC CAA Val Thr le Leu Ser Arg Cys Leu Gln Phe His Leu Lye Ala Leu Asp Val Glu Gln le Arg Him Gln Leu Glu Hi le Leu As Clu 730 740 750 760 770 780 790 800 810 * * * * * * a * a GM CAT ATC GCT CAC GAG CCG CGG CCC CT CAA TTG CTG CCA CCC GCC GCT GM GGC AGC CTG CGA GAT CCC TTA AGT CTG ACC GAC CAG Glu His le Ala His Clu Pro Arg Ala Leu Gln Leu Leu Ala Arg Ala Ala Glu Gly Ser Leu Arg Aap Ala Leu Ser Leou Thr Amp Cln 820 830 840 850 860 870 880 890 900 GCG ATT CCC AGC GGT GAC GGC CAG CTT TCA ACC CAG GCG CGTC ACT GCC ATG CTG GGT ACC CTT GAG GAC CAT CAG GCC CTG TCG CTC CTT Ala le Ala Ser Gly Asp Gly Gln Val Ser Thr Gln Ala Val Ser Ala Met Leu Gly Thr Leu Asp Asp Amp Gin Ala Leu Ser Leu Val 910 920 930 940 950 960 970 980 990 * * * a * * * * * GM CCG ATG GTC GAG GCC MC GGC GAG CGC GTA ATG GCG CTG ATT MT GM CCC GCCT CC CCT GGT ATC GAG TGG GM CCG TTG CTG GTG Glu Ala Met Val Glu Ala Asn Gly Glu Arg Val Met Ala Leu le Asn Glu Ala Ala Ala Arg Gly le Glu Trp Glu Ala Leu Leu Val 1000 1010 1020 1030 1040 1050 1060 1070 1080 GM ATG CTC GGC CTG TTG CAT CCT ATT GCG ATG CTA CM CTT TCGCCCGT GCA CTT GGC MC GAC ATG CCG CCC ATC GAG CC CCC ATC Glu Met Leu Gly Leu Leu His Arg le Ala Met Val Gln Leu Ser Pro Ala Ala Leu Gly Asn Asp Met Ala Ala le Glu Leou Arg Met 1090 1100 1110 1120 1130 1140 1150 1160 1170 CGT GM CTG GCG CGC ACC ATA CCG CCG ACG GAT ATT CAG CTT TAC TAT CAG ACG CTC TTG ATT GGT CGC MA GM TTA CCC TAT GCC CCC Arg Glu Leu Ala Arg Thr lie Pro Pro Thr Asp le Gln Leu Tyr Tyr Gln Thr Leu Leu le Gly Arg Lys Clu Leu Pro Tyr Ala Pro 1180 1190 1200 1210 1220 1230 1240 1250 1260 * * a * a * a * * GAC CCT CGC ATG GGC GTT GAG ATG ACC CTG CTG CCC GCC CTC GCA TTC CAT CCG CCT ATC CQ_G CTG CCT GAG CCA C GTC CCA CCA CAG Asp Arg Arg Met Gly Val Glu Met Thr Leu Leu Arg Ala Leu Ala Phe Hie Pro Arg Net Pro Lou Pro Glu Pro Glu Val Pro Arg Cln 1270 1280 1290 1300 1310 1320 1330 1340 1350 * * * * * a a * * TCC TTT GCA CCC GTC CCG CCA ACC GCA CTA ATC ACG CCA ACC CAG GTG CCCGCCG Ser Phe Ala Pro Val Ala Pro Thr Ala Val Met Thr Pro Thr Gln Val Pro Pro Gl G CAA TCA CCC GCC GAGCAG CCA CcC ACT CTA Pro Gl Ser Ala Pro CGl Cia Ala Pro Thr Val 6544

1360 1370 1380 1390 1400 1410 1420 1430 1440 CCG CTC CCG GMA ACC ACC AGC CAG GTC CTG GCG GCG CGC CAG CAG TTG CAG CGC GTG CAG GGA GCA ACC AMA GCA AA MG ACT GM CCC Pro Leu Pro Glu Thr Thr Sgr Gln Val Leu Ala Ala Arg Gln Gln Lau Glo Arg Val Gln Gly Ala Thr Lys Ala Lys Lys Ser Glu Pro 1450 1460 1470 1480 1490 1500 1510 1520 1530 GCA CCC GCT ACC CCC GCG CGG CCC GTG MT MC GCT GCG CTG GM AGA CTG GCT TCG GTC ACC GAT CCC CTT CAG GCG COT CCG GTG CCA Ala Ala Ala Thr Arg Ala Arg Pro Val Aan Asm Ala Ala Lau Glu Arg Lau Ala Ser Val Thr Asp Arg Val Gln Ala Arg Pro Val Pro 1540 1550 1560 1570 1580 1590 1600 1610 1620 TCG GCG CTG GM AM GCG CCA CCC MA MA GM CCC TAT CGC TGG AAG GCG ACC ACT CCC GTG ATG CAG CM AM GAA CTG CTC GCC ACG Ser Ala Lau Glu Lys Ala Pro Ala Ly Lya Glu Ala Tyr Arg Trp Lys Ala Thr Thr Pro Val Met ClG Gln Lys Clu Val Val Ala Thr 1630 1640 1650 1660 1670 1680 1690 1700 1710 CCC MG CCC CTG A AMA CCC CTC GM CAT GM AM ACG CCC GM CTG GCG GCG AMG CTA GCCG CCA GM CCC ATT CAG CCC GAC CCC TGG Pro Lya Ala Leu Lys Lys Ala Leu Glu Hai Glu Lys Thr Pro Glu Leu Ala Ala Lys Leu Ala Ala Glu Ala le Glu Arg Asp Pro Trp 1720 1730 1740 1750 1760 1770 1780 1790 1800 CCC GCA CAG GTC ACC CAA CTT TCG CTA CCA AMA CTG GTC GM CAG CT GCC TTA MT CCC TGC AMA GAG GAG ACC GAC AAC GCA CTA TCT Ala Ala Gln Val Ser Gln Leu Ser Leu Pro Lya Leu Val Glu Gln Val Ala Leu Aan Ala Trp Lys Glu Clu Sr Asp Aan Ala Val Cyc 1810 1820 1830 1840 1850 1860 1870 1880 1890 CTG CAT TTG CCC TCC TCT CAG CCG CAT TTG MC MC CCC GGT GCA CAG CAAMAM CTG GCT CM GCC TTC ACC ATC TTA AM GGT TCA ACG Leu His Leu Arg Ser Ser Gln Arg Hi. Leu Aan Asn Arg Gly Ala Gln Gln Lys Leu Ala Clu Ala Leu Ser Met Leu Lys Gly 8er Thr 1900 1910 1920 1930 1940 1950 1960 1970 1980 GTT CM CTG ACT ATC CTT CM CAT CAT MT CCC GCC CTC CCT ACC CCC CTC GAG TGC CCT CAC GCC ATA TAC GM CAA AAA CTT GCC CAG Val Glu Leu Thr le Val Clu Asp Asp Aan Pro Ala Val Arg Thr Pro Leu Glu Trp Arg Gln Ala le Tyr Glu Glu Lya Leu Ala Gln 1990 2000 2010 2020 2030 2040 2050 2060 2070 GCC CCC GAG TCC ATT ATT CCC CAT MT MT ATT CAG ACC CTG CGT CCG TTC TTC CAT CCC GAG CTC CAT CM GM ACT ATC CGC CCC ATT Ala Arg Glu Ser le le Ala Aap Aen Aon le Gln Thr Leu Arg Arg Phe Phe Asp Ala Glu Leu Asp CGu Glu Ser le Arg Pro le 2080 2090 2100 2110 2120 2130 2140 2150 2160 TGA TCC TM GCA CAG CTT ACC TTC GTC ATC CTT MC CTG ATT GAG AGA CM ACC TAT GTT TGC TM AGG CGC TCT GCC TMA CCT GAT GM 2170 2180 2190 2200 2210 * * * * * GCA AGC CCA GCA CAT GCA AGA AM MT GCA CM MT GCA GCA AGA CAT CCC CCA GCT G Fig. 2. Nucleotide sequence of the dnazx region. The deduced amino acid sequence of a 71,114 dalton product is shown from nucleotide 142 to 2070. Brackets designate -35 and -10 regions of a potential promoter. The dashed line indicates a potential ribosome binding site. The triangles designate a potential adenine nucleotide binding site (14); the open symbols represent more variable residues, the closed symbols the consensus amino acids. Arrows mark regions of dyad symmetry, first discovered by Hershey and Taylor (personal communication) who sequenced the apt gene immediately upstream of dnazx. The (deduced) apt product terminates with a histidine at nucleotide 10. tendency not to contain codons which correspond to certain minor species of trnas (17,18). The codons AUA (isoleucine), CGA/CGG/AGA/AGG (arginine), CUA (leucine) and GGA (glycine) are not used at all or are very rarely used in genes for abundant proteins, but are more often used in weakly expressed genes (17). The dnazx gene contains all of these except AGG (Table 2). 6545

Table 1. Amino acid composition of the deduced dnal protein. DSCUSSON The 2.2 kb dnazx region consists of one long open reading frame which, if translation begins at ATG, would encode a protein of 71,114 daltons. The ATG codon is preceded seven base pairs upstream by a potential AGAG Shine- Dalgarno sequence. A potential promoter consists of a TCGCCG -35 region separated by 17 base pairs from a TAGCAT -10 sequence. The deduced product of 71,114 daltons is likely to be the dnax product, previously reported as 75,000 to 83,000 daltons (2,3,13). This open reading frame directs the synthesis also of the Z protein, reported as 52,000 to 56,500 daltons (2,3). One model to explain the appearance of two products is based on the possibility that the larger protein is proteolytically cleaved to generate the smaller. A second model proposes that the dnax mrna is processed, or its synthesis is terminated prematurely, to generate a shorter messenger, the 6546

Table 2. Codon Distribution of the dnazx Gene8. UUU Phe 7 1.1% UCU Ser 2.3% UAU Tyr 6.9% UGT Cys 3.5% UUC Phe 4.6% UCC Ser 4.6% UAC Tyr 3.5% UGC Cys 3.5% UUA Leu 7 1.1% UCA Ser 4.6% UAA -- UGA --- UUG Leu 11 1.7% UCG Ser 8 1.2% UAG - UGG Trp 6.9% CUU Leu 9 1.4% CUC Leu 4.6% >CUA Leu 3.5% rccu Pro 2.3% ULCC Pro 3.5S% CCA Pro 9 1.4% CAD His 12 1.9% CAC His 4.6% CM Gln 10 1.6% CGU Arg 14 2-.2X% ~~~~~~~~~~~~ LCECArL 23 3.6%l >CGA Arg 4.6% >CUG Leu 45 7.0% CCG Pro 27 4.2% CAG Gln 30 4.7% >CGG Arg 5.8% AUU le 15 2.3% ACU Thr 3.5% AAUJ Asn 6.9% AGU Ser 5.8% AUC le 10 1.6% ACC Thr 21 3.3% AAC Asn 12 1.9X AGC Ser 7 1.1% >AUA le 2.3% ACA Thr 0 0% AAA Lys 21 3.3% >AGA Arg 1.2% AUG Met 15 2.3% ACG Thr 13 2.0% MG Lys 7 1.1% >AGG Arg 0 0% GUU Val 9 GUC Val 13 GUA Val 5 GUG Val 16 1.4% 2.0%.8% 2.5% rgcu Gc _ Ala 11 1.7% _ LGCC Ala 17 2.6%J GCA Ala 15 2.3% GCG Ala 43 6.7% GAU Asp 15 GAC Asp 12 GM Glu 35 GAG Glu 18 2.3% 1.9% 5.4% 2.8% fgg_g_1y FGGUFGly 7 1.1%1 LGCSGly12 l.9%j >GGA Gly 2.3% >GGG Gly 3.5% aeach square contains four columns for codons, amino acids, number of times the codon appears in the dnazx open reading frame, and the percentage contribution of the codon to the total number of codons. Rectangles with bold lines indicate NNC codons which are preferred over NNU codons in strongly expressed genes. Rectangles with broken lines indicate NNC codons which are preferred over NNU codons in weakly expressed genes. Arrows indicate codons underrepresented in strongly expressed genes but often present in weakly expressed genes. translation of which synthesizes the Z protein. A third model assumes that two different promoters direct the synthesis of different length messengers. f this model is correct, it must be assumed that the promoter for the Z transcript was inactive in the coupled in vitro transcription-translation system which generated the larger, but not the smaller protein (2) or that the shorter messenger was synthesized but not translated. A fourth model assumes that an internal methionine codon, such as the ATG at 529-531 (Fig. 2) serves to initiate translation of the Z protein. Two possibilities which 6547

can be ruled out by the sequence analysis include a two reading-frame model and read-through of a nonsense codon (which terminates Z protein) to generate a longer X protein. The possibility that this one gene directs the synthesis of two DNA Polymerase subunits is very interesting. There is strong evidence that the dnaz protein is the ysubunit. Wickner and Hurwitz (5) isolated the dnaz protein assaying wild-type extract activity in a heated dnaz(ts) extract. The native molecular weight of purified protein was 125,000 daltons. Huebscher and Kornberg (4) also purified the dnaz product using a complementation assay and demonstrated that the denatured protein has a molecular weight of 52,000 and that the native form is a dimer. The evidence that the protein isolated by these approaches was the product of the Z gene was the observations that plasmids and transducing phages which carried the dnaz gene caused the overproduction of dnaz complementing activity (4,6,7). Huebscher and Kornberg (4) also showed that purified dnaz protein and the Y subunit of DNA polymerase had similar molecular weights and activities. The other product, the X protein, could be theu subunit (2,3). T complexes with the a, s, and e subunits to form DNA Polymerase ', a form intermediate in complexity and processivity between the core and holoenzymes (13,19). The X protein and U migrate similarly in two-dimensional gel electrophoresis (3). However, there is some doubt about the identity of X and T because the defect of a dnax(ts) mutant extract was complemented in vitro by the 32,000 dalton 6 subunit of DNA polymerase (20). Morever, there is no functional assay for T, which prohibits the testing of mutant extracts for T thermolability. n addition, there are other proteins known to interact with DNA and to be approximately the same size as X but not yet identified with genes. These include an 81,000 molecular weight DNAdependent ATPase (ATPase V) which also migrates with T (21) and a 74,000 dalton topoisomerase (22,23). The adenine nucleotide binding site deduced for both the X and Z proteins tend to corroborate the report (16) that both thet andy subunits bind ATP and datp. The apt gene immediately upstream of dnazx encodes adenine phosphoribosyltransferase, an enzyme involved in adenine salvage (24,25; Hershey and Taylor, personal communication). Hershey and Taylor (personal communication) have sequenced the apt gene and the 3' region which extends toward dnazx. The sequence overlaps nucleotides 1 through 113 of Fig. 2 (with perfect agreesent). The (deduced) apt protein terminates with a histidine (nucleotides 8-10 of Fig. 2) meaning that both are transcribed from left to right (in Fig. 2). Only 131 nucleotides separate the apt and dnazx reading frames. Hershey and Taylor (personal communication) have determined that apt transcripts in vivo end at positions 100, 105, and 117 of Fig. 2, although it is unknown whether these represent sites of transcription termination or sites of srna cleavage. Hershey and Taylor also have pointed out two regions of dyad symmetry between apt and dnazx (nucleotides 24-56 and 65-89 of Fig. 2). The function of these symmetrical regions is unknown, 6548

although they are thought not to be involved in rho-independent termination (Hershey and Taylor, personal communcation). t is interesting that the second stem and loop region overlaps the potential promoter of dnazx (Fig. 2). ACKNOWLEDGEMENTS We thank Dr. Howard V. Hershey for communication of results before publication. This work was supported by American Cancer Society Grant NP169 and, in part, by PHS Grant GM34471 and Welch Foundation Grant F949. REFERENCES 1. Henson, J. M., Chu, H., rwin, C. A. and Walker, J. R. (1979) Genetics 92, 1041-1059. 2. Mullin, D. A., Woldringh, C. L., Henson, J. M. and Walker, J. R. (1983) Mol. Gen. Genet. 192, 73-79. 3. Kodaira, M., Biswas, S. B. and Kornberg, A. (1983) Mol. Gen. Genet. 192, 80-86. 4. Huebscher, U. and Kornberg, A. (1980) J. Biol. Chem. 255, 11698-11703. 5. Wickner, S. and Hurwitz, J. (1976) Proc. Natl. Acad. Sci. U.S.A. 73, 1053-1057. 6. Wickner, S. H., Wickner, R. B. and Raetz, C.R.H. (1976) Biochem. Biophys. Res. Commun. 70, 389-396. 7. Yasuda, S. and Takagi, T. (1983) J. Bacteriol. 154, 1153-1161. 8. Messing, J. and Vieira, J. (1982) Gene 19, 269-276. 9. Birnboim, H. C. and Doly, J. (1979) Nucleic Acids Res 7, 1513-1523. 10. Schreier, P. H. and Cortese, R. (1979) J. Mol. Biol. 129, 169-172. 11. Sanger, F. Nicklen, S. and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 5463-5467. 12. Poncz, M., Solowiejczyk, D., Ballantine, M., Schwartz, E. and Surrey, S. (1982) Proc. Natl. Acad. Sci. U.S.A. 79, 4298-4302. 13. McHenry, C. S. (1982) J. Biol. Chem. 257, 2657-2663. 14. Walker, J. E., Saraste, M., Runswick, M. J. and Gay, N. J. (1982) EMBO J. 1, 945-951. 15. Pinkham, J. L. and Platt, T. (1983) Nucl. Acids Res. 11, 3531-3545. 16. Biswas, S. B. and Kornberg, A. (1984) J. Biol. Chem. 259, 7990-7993. 17. kemura, T. (1981) J. Mol. Biol. 151, 389-409. 18. Fay, P. J., Johanson, K. O., McHenry, C. S. and Bambara, R. A. (1982) J. Biol. Chem. 257, 5692-5699. 19. Huebscher, U. and Kornberg, A. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 6284-6288. 20. Meyer, R. R., Brown, C. L. and Rein, D. C. (1984) J. Biol. Chem. 259, 5093-5099. 21. Dean, F., Krasnow, M. A., Otter, R., Matzuk, M. M., Spengler, S. J. and Cozzarelli, N. R. (1982) Cold Spring Harbor Symp. Quant. Biol. 47, 769-777. 22. Srivenugopal, K. S., Lockshon, D. and Morris, D. R. (1984) Biochemistry 23, 1899-1906. 23. Kocharyan, S. M., Livshits, V. A. and Sukhodolets, V. V. (1975) Genetika 11, 1417-1425. 24. Taylor, M. W., Simon, A. E. and Kothari, R. M. (1985) The APRT system in Molecular Cell Genetics. ed Michael Gottesman. John Wiley, p. 311-332. 6549