MOLECULAR BIOLOGY. Transcription

Size: px
Start display at page:

Download "MOLECULAR BIOLOGY. Transcription"

Transcription

1 MOLECULAR BIOLOGY Transcription U. N. Dwivedi Department of Biochemistry University of Lucknow, Lucknow and Smita Rastogi Department of Biotechnology, Integral University, Lucknow 20-Jul-2006 (Revised 25-Jan-2008) CONTENTS Introduction Transcription in prokaryotes (Synthesis of mrna/rrna/trna) Prokaryotic transcription apparatus RNA polymerase (RNA Pol) or DNA dependent RNA Polymerase Structure of RNA polymerase Synthesis of RNA in 5 3 direction Requirement of Mg ++ Significance of σ subunit of RNA Pol Functions of RNA polymerase Fidelity of RNA synthesis Promoters Overall process of prokaryotic transcription Initiation Elongation Termination Transcription in eukaryotes Eukaryotic transcription apparatus RNA polymerase or DNA dependent RNA Polymerase (RNA Pol) Eukaryotic promoters Enhancers Transcription Factors Elongation factors Overall process of eukaryotic transcription Post transcriptional processing Post transcriptional processing of mrna (maturation of mrna) Post transcriptional processing of mrna in prokaryotes Post transcriptional processing of mrna in Eukaryotes Alternative mrna processing Post transcriptional processing of trna and rrna (maturation of trna and

2 rrna) Post transcriptional processing of trna Post transcriptional processing of rrna Inhibitors of transcription RNA Pol binding inhibitors DNA specific inhibitors Reverse transcriptase (RT) (RNA directed DNA polymerase) Key words Synthesis of mrna, rrna and trna; Prokaryotic and eukaryotic RNA polymerases; Promoters; Transcription factors; Enhancers; Post transcriptional RNA processing: Capping; Splicing; Polyadenylation; Inhibition of transcription; Reverse transcriptase 2

3 Introduction DNA stores genetic information in a stable form that can be readily replicated. However, the expression of this genetic information requires its flow from DNA to RNA to protein. The first step i.e. the conversion of DNA sequence information into RNA sequence information or more precisely the process of RNA synthesis according to the instructions of DNA template is called transcription. Before studying the details of transcription, few points that need mention are: The two strands of double stranded DNA are coding strand and template strand. The coding strand of DNA has the same sequence as that of RNA transcript except for thymine (T) in place of uracil (U). The coding strand is also called sense or (+) strand. The template strand is also called antisense or (-) strand. The sequence of the template strand is the complement of the RNA transcript (Fig. 1). Coding, Plus (+), Sense strand 5' 3' Promoter 3' 5' Template, Minus (-), Antisense strand Double stranded DNA Fig. 1: Coding and non coding strands in a DNA The first nucleotide of a transcribed DNA sequence is denoted as + 1 and is called start site. The sequences towards the 5 side of start site are referred to as upstream sequences and denoted with minus sign. The sequences towards the 3 side are downstream sequences and denoted with plus sign. Thus, the second nucleotide downstream of + 1 site is + 2 and so on. The nucleotide preceding the start site is denoted as - 1 and so on. There is no 0 (zero) nucleotide. These designations refer to the coding strand of DNA. The coding strand for a particular gene may be located in either strand of a given DNA. Different parts of the genome can be transcribed to different extents, choice of which part to transcribe and how extensively can be regulated by regulatory elements. RNA synthesis occurs in 5 3 direction. Transcription in prokaryotes (Synthesisof mrna/rrna/trna) RNA synthesis in prokaryotes, like all biological polymerization reactions takes place in three stages: Initiation, Elongation and Termination. The transcription is initiated by the binding of RNA polymerase to a specific DNA sequence called promoter, in a defined orientation, leading to the transcription of the same strand from that promoter. In order to study the transcription process, detailed information of RNA polymerase and promoter is important. The following sections deal with the properties and functions of RNA Pol and prokaryotic promoter. 3

4 Prokaryotic transcription apparatus (1) RNA polymerase (RNA Pol) or DNA dependent RNA polymerase RNA Pol is present in all prokaryotic cells and was first discovered in 1960 by Samuel Weiss and Jerard Hurwitz. In E. coli (eubacteria), a single type of RNA Pol appears to be responsible for almost all the synthesis of RNA such as mrna, rrna and trna. Various bacteriophages also encode RNA Pol that synthesizes only phage-specific RNAs. The RNA Pol moves along the template, synthesizing RNA starting from the promoter (described below) until it reaches a sequence called terminator. This action defines a transcription unit that extends from the promoter to the terminator and the immediate product of transcription is called primary transcript. The primary transcript is, however, almost always unstable, and is either degraded or cleaved to give the mature products, viz, mrna / rrna / trna. (A) Structure of RNA polymerase E. coli RNA Pol is a large multisubunit enzyme with a molecular weight of ~ 500 kd. It is one of the largest enzymes in the bacterial cell. The dimensions of the enzyme are 90 X 95 X 160 Å. The core RNA Pol of E. coli contains four types of subunits with a structure consisting of α 2 ββ ω. The properties and functions of various subunits of RNA Pol are summarized in the Table 1. Another subunit called σ subunit binds only transiently to the core enzyme, forming a holoenzyme α 2 ββ ωσ. E. coli has several σ factors which are summarized in Table 2. σ 70 is used for general transcription while other σ factors are activated by specific environmental conditions. Thus, σ 32, σ 54, σ 28 or F, σ H, etc, are induced at the time of heat shock, nitrogen starvation, flagellar, shock respectively. E. coli RNA Pol has the overall shape of a crab claw, where the two pincers are made up predominantly of two large subunits, namely β and β (Fig. 2). Further structural analysis shows that RNA Pol there is a channel or groove that allow DNA, RNA and ribonucleotides into and out of the enzyme s active center cleft (Fig. 3). The channel for DNA lies at the interface of the β and β subunits. The NTP-uptake channel allows ribonucleotides to enter the active center. The RNA exit channel allows the growing RNA chain to leave the enzyme as it is synthesized during elongation. The downstream DNA (i.e. DNA ahead of the enzyme, yet to be transcribed) enters active center cleft in double stranded form through the downstream DNA channel (between the pincers). Within the active center cleft, the DNA strands separate from position +3. The non-template strand exits the active center cleft through the non-template strand (NT) channel and travels across the surface of the enzyme. The template strand, in contrast, follows a path through the active center cleft and exits through the template strand (T) channel. RNA Pol surrounds the DNA. The length of groove could hold 16 bp in bacterial enzyme and ~25 bp in eukaryotic enzyme. 4

5 S. No. Gene coding for subunit Table 1: Properties and functions of subunits of E. coli RNA pol Product (subunit) Size (kd) Number of amino acid residues Number of Function subunit per holoenzyme 1 rpo A α subunit Function uncertain, Probably involved in enzyme assembly, promoter recognition at UP element, binding of some activators 2 rpo B β subunit Catalytic center (Phosphodiester bond formation) 3 rpo C β subunit Catalytic center (DNA template binding) Phases of transcription during which the subunit is required All stages All stages All stages 4 rpo Z ω subunit ~ Unknown All stages 5 rpo D (rps D) σ 70 subunit Promoter specificity, recognition and binding, RNA synthesis initiation (Increases binding efficiency at promoter, decreases non-specific binding, converts closed promoter complex to open promoter complex) Only during initiation 5

6 Table 2: Types of E. coli σ factors CTAAA 15 GCCGATAA Flagellar S. Gene σ Promoter sequence Functions No factor -35 sequence Distance between -10 and -35 regions (bp) -10 sequence 1 rpo D σ 70 TTGACA TATAAT General 2 rpo H σ 32 CCCTTGA CCCGATNT Heat shock A 3 rpo N σ 54 CTGGNA 6 TTGCA Nitrogen starvation 4 fli A σ 28 or σ F 5 sig H σ H AGGANPu Pu GCTGAATCA Cytochrome biogenesis; Generation of potential nutrient sources; Transport, Cell wall metabolism important for competence and sporulation initiation RNA exit Upstream DNA Rudder RNA Pol movement DNA enters jaws Wall Bridge Nucleotides Fig. 2: Crab claw structure of RNA Pol 6

7 RNA exit channel Upst ream DNA Active site -35 β flap β pincer β pincer Downstream DNA T channel NT channel Fig. 3: Channel structure of RNA Pol The map of the E. coli σ 70 factor identifies four conserved regions, namely 1-4, which are further subdivided into sub-regions (Fig. 4). These sub-regions have different functions. The subregion 2.4 (also called -10 region or unwinding domain) confers specificity by recognizing -10 region of promoter, while subregion 4.2 (also called -35 region or recognition domain) provides binding energy by recognizing -35 region of the promoter. The details of other sub-regions are tabulated in Table 3. Recognizes -35 region Recognizes 'Extended -10' region Recognizes -10 region C N Responsible for melting 1 Fig. 4: Regions of σ factor and their functions 7

8 Table 3: Functions of regions and sub-regions of σ factor S. No. Region Function / Properties of some sub-regions 1 1 The region 1 comprising of sub-regions 1.1 and 1.2 is present at the N- terminal end of σ factor; This region is negatively charged and has regulatory function; In free form of σ factor, sub-region 1.1 plays an autoinhibitory role by occluding its DNA binding domains (i.e., 2.4 and 4.2); Association of σ factor with core enzyme changes conformation of σ factor leading to release of autoinhibition; In holoenzyme, sub-region 1.1 being negatively charged occupies positively charged region in active center cleft of RNA Pol, thereby this sub-region acts as DNA mimic; Upon melting of DNA, the sub-region 1.1 shifts by Å and hence clears the DNA entry channel allowing DNA entry 2 2 The sub-regions 2.1 and 2.2 are highly conserved part of σ factor; These are involved in interaction with core enzyme; The sub-region 2.3 resembles protein that binds single stranded nucleic acid and is involved in melting reaction; The sub-region 2.4 has α-helical structure that specifically recognizes -10 region (i.e., it determines specificity); It is also called -10 region or unwinding domain of σ factor 3 3 The sub-region 3.1 binds intervening DNA sequence (distance between -10 and -35 regions i.e. ~75 Å) When σ factor binds core enzyme, its N-terminal domain of sub-region 3.2 blocks RNA exit channel; It thus acts as molecular mimic of RNA; This is removed from RNA exit channel for elongation to occur; Act of ejection of this sub-region from RNA exit channel takes several attempts and leads to abortive initiations 4 4 The sub-region 4.2 has α-helical structure that specifically recognizes -35 region of promoter; It is also called -35 region or recognition domain of σ factor (B) Synthesis of RNA in 5 3 direction The results of labeling experiments with γ- 32 P substrates confirmed that RNA chains, like DNA chains grow in the 5 3 direction, which involves the movement of the enzyme RNA Pol in a 3 5 direction along the antisense DNA strand (template). So, the template DNA strand is copied in 3 5 direction and the 3 -OH group of the growing RNA chain attacks the α-p of the 8

9 incoming rntp. For transcription RNA Pol requires DNA template, ribonucleoside triphosphates (rntps; viz. ratp, rgtp, rctp and UTP), Mg ++. There is no requirement of any primer. The enzyme is most active when bound to double stranded DNA, but only one of the two strands serve as a template. The 3 -OH group of the growing RNA chain attacks the α-p of the incoming NTP and releases pyrophosphate. This reaction is thermodynamically favorable and the subsequent degradation of the pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis. The 5 triphosphate group of the first residue in a nascent (newly formed) RNA molecule is not cleaved to release PP i, but remains intact throughout the transcription process. Thus, the reaction is driven by the release and subsequent hydrolysis of PP i as summarized in Scheme 1. (NMP) n + NTP (NMP) n+1 + PP i RNA Incoming Lengthened Pyrophosphate ribonucleotide RNA where, NMP signifies ribonucleoside monophosphate; n represents number of NMPs Pyrophosphatase PP i 2P i Pyrophosphate Orthophosphate Scheme 1: Reaction catalyzed by RNA Pol RNA Pol requires that the initiating NTP be brought into its active site and held stably on its template whereas the next NTP is presented with correct geometry for chemistry of polymerization to occur. This is particularly difficult because RNA Pol starts most transcripts with A and that ribonucleotide binds the template nucleotide T with only two H-bonds. Thus, the enzyme has to make specific interactions with the initiating NTP, holding it rigidly in the correct orientation to allow chemical attack on the incoming NTP. The requirement for such specific interactions between the enzyme and the initiating NTP probably explains why most transcripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and thus only chains beginning with A are held in a manner suitable for efficient initiation. It is believed that the interactions are provided by various parts of the RNA Pol holoenzyme, including part of σ. Consistent with this, in experiments using an RNA Pol containing a σ 70 derivative lacking this part of σ, initiation requires much higher than normal concentrations of initiating it. (C) Requirement of Mg ++ The active site of the core enzyme is made up of regions from β and β subunits which is found at the base of the pincers within a region called the active center cleft and contains two metal ions 9

10 (Mg ++ ) in its active form, consistent with the proposed two metal ion catalytic mechanism for nucleotide addition proposed for all types of polymerases. One metal ion remains bound to the enzyme whereas the other appears to come in with the nucleoside tri phosphate and leave with the pyrophosphate. The β and β subunits extensively interact with one another, particularly at the base of the channel where the active site Mg ++ ion is located. The β subunit binds a Zn ++ ion via four cysteine residues that are invariant in prokaryotes but not in eukaryotes. Three conserved aspartate residues (Asp) of the enzyme participate in binding these metal ions. (D) Significance of σ subunit of RNA Pol The transient binding of σ factor to the core enzyme is concerned specifically with promoter recognition. σ factor has domains that recognize the promoter sequence. σ alone does not bind to DNA because N-terminal region of σ behaves as an autoinhibition domain. It occludes the DNA binding domains when free thereby suppressing the activities of the DNA binding regions. When σ subunit binds to core enzyme (α 2 ββ ω), it changes the conformation of σ factor so that the inhibition is released and the DNA binding domains can contact DNA. Comparisons of the crystal structures of core enzyme and holoenzyme show that σ factor lies largely on the surface of the core enzyme. It has an elongated structure that extends past the DNA binding site. The σ subunit binds transiently to the core enzyme and directs the RNA Pol holoenzyme to specific binding sites on DNA where transcription begins. σ factor participates in initiation of RNA synthesis by formation of open complex. In contrast to -35 region, which simply provides binding energy to secure polymerase to the promoter, -10 region has a more elaborate role in transcription initiation, because it is within that element that DNA melting is initiated in the transition from the closed to open complex. Thus, the sub-region 2.4 (unwinding domain or -10 domain) of σ that interacts specifically with the -10 region of promoter is doing more than simply binding DNA, while the specific interactions of sub-region 4.2 (recognition domain or -35 domain) with -35 sequence of promoter just provides binding energy. In keeping with this expectation, the α-helix involved in recognition of the -10 region contains several aromatic amino acids that can interact with bases on the non-template strand in a manner that stabilizes the melted DNA Unwinding increases negative supercoiling of DNA. When the holoenzyme forms an open complex on DNA, the N-terminal σ domain is displaced from the active site. It swings Å away and the two DNA binding regions separate by 15 Å, presumably to acquire a more elongated conformation appropriate for contacting DNA. The σ factor dissociates from the rest of the RNA Pol when RNA chain reaches 8-9 nucleotides in length. It is not necessary for elongation phase. When σ factor is released from core enzyme, it reverts to a general affinity for all DNA, irrespective of sequence, that suits it to continue transcription. It therefore becomes immediately available for use by another core enzyme. A change in association between σ and holoenzyme changes binding affinity for DNA so that core enzyme can move along DNA. RNA Pol encounters a dilemma in reconciling its needs for initiation with those for elongation. Initiation requires tight binding only to particular sequences (promoters), while elongation requires close association with all sequences that the enzyme encounters during transcription. This dilemma is solved by the reversible association between σ 10

11 factor and core enzyme. σ factor is either released following initiation or changes its association with core enzyme so that it no longer participates in DNA binding. There is only 30% of the amount of σ factor present in the cell compared with core enzyme complexes. Therefore onethird of the polymerase complexes can exist as holoenzyme at any one time. Because there are fewer molecules of σ than of core enzyme, the utilization of core enzyme requires that σ recycles. This occurs immediately after initiation in about one third of cases, presumably σ and core dissociate at some later point in the other cases. Irrespective of the exact timing of its release from core enzyme, σ factor is involved only in initiation. After the release of σ factor from the RNA Pol, the core enzyme moves along the DNA synthesizing the growing RNA strand. The σ factor can then complex with a further core enzyme complex and reinitiate transcription. (E) Functions of RNA polymerase RNA Pol performs multiple functions in the process of transcription: Binding to DNA and recognition of promoters All sequence specific contacts that the holoenzyme makes with the DNA (with the -10 and -35 regions as well as the so-called extended-10 region just upstream of the -10 region) are mediated by the σ subunit via conserved residues. The binding of σ causes the core enzymes pincers to come together so as to narrow the channel between them by ~10 Å. The outer surface of the holoenzyme is almost uniformly negatively charged, whereas those surfaces presumed to interact with nucleic acids, particularly the inner walls of the main channel, are positively charged. Melting of DNA This melting occurs between positions -11 and +3, in relation to the transcription start site. The double helix reforms at -11 in the upstream DNA behind the enzyme. The β and β subunits contact DNA at many points downstream of the active site. They make several contacts with the coding strand in the region of the transcription bubble, thus stabilizing the separated single strands. The RNA is contacted largely in the region of the transcription bubble. As the enzyme moves along DNA, the base in the template strand at the start of the turn will be flipped to face the nucleotide entry site. The RNA-DNA hybrid is 9 bp long and the 5 end of RNA is forced to leave the DNA when it hits a protein called rudder. Once DNA has been melted, the individual strands have a flexible structure in the transcription bubble. This enables DNA to take its turn in the active site. But before transcription starts, the DNA double helix is a relatively rigid straight structure. This straight structure enters the polymerase without being blocked by the wall due to conformational shift that occur in enzyme. Adjacent to the wall is a clamp. In the free form of RNA Pol, this clamp swings away from the wall to allow DNA to follow a straight path through the enzyme. After DNA has been melted to create the transcription bubble, the clamp must swing back into position against the wall. 11

12 Selection of correct ribonucleotides It selects the correct ribonucleotide triphosphate and catalyzes the formation of a phosphodiester bond. This process is repeated many times as the enzyme moves unidirectionally along the DNA template. RNA Pol is completely processive, i.e., a transcript is synthesized from start to end by a single RNA Pol molecule. Stabilization of single stranded regions It itself stabilizes single stranded regions. Elongation It is involved in elongation. When RNA Pol forms initial elongation complex after the first 10 bp have been synthesized, the RNA Pol may lose σ factor and lose contacts from -35 and -55. At bp, general elongation complex is formed and covers bp. The elongating RNA Pol is a processive machine that synthesizes and proofreads RNA. DNA passes through the elongating enzyme in a manner very similar to its passage through the open complex. Thus, double stranded DNA enters the front of the enzyme between the pincers. At the opening of the catalytic cleft, the strands separate to follow different paths through the enzyme before exiting via their respective channels and reforming a double helix behind the elongating polymerase. Ribonucleotides enter the active site through their defined channel and are added to the growing RNA chain under the guidance of the template DNA strand. Only eight or nine nucleotides of the growing RNA chain remain base paired to the DNA template at any given time, the remainder of the RNA chain is peeled off and directed out of the enzyme through the RNA exit channel. RNA chain elongation requires that the double stranded DNA template be opened up at the point of RNA synthesis so that the template strand can be transcribed to its complementary RNA strand. In doing so, the RNA chain only transiently forms a short length of RNA- DNA hybrid duplex, as is indicated by the observation that transcription leaves the template duplex intact and yields single stranded RNA. The unpaired bubble of DNA in the open initiation complex apparently travels along the DNA with RNA Pol. There are two ways this might occur: (i) If the RNA Pol followed the template strand in its helical path around the DNA, the DNA would build up little supercoiling because the DNA duplex would never be unwound by more than about a turn. However, the RNA transcript would wrap around the DNA, once per duplex turn. This model is implausible since it is unlikely that its DNA and RNA could be readily untangled. The RNA would not spontaneously unwind from the long and often circular DNA in any reasonable time and no known topoisomerase can accelerate this process. (ii) If the RNA Pol moves in a straight line while the DNA rotates, the RNA and DNA will not become entangled. Rather, the DNAs helical turn are pushed ahead of the advancing transcription bubble so as to more tightly wind the DNA ahead of the bubble (which promotes positive supercoiling) and the linking number of the entire DNA remains unchanged). This model is supported by the observations that the transcription of plasmids in E. coli causes their positive supercoiling in Gyrase mutants (which cannot relax positive supercoils) and their negative supercoiling in topoisomerase I mutants (which cannot relax negative supercoils). Infact, by tethering RNA Pol to a glass surface and allowing it to transcribe DNA that had been fluorescently labeled at one end, Kazuhiko Kinosita demonstrated, through fluorescence microscopy (using techniques similar to those showing that the F 1 F 0 ATPase is a rotary 12

13 engine) that single DNA molecules rotated in the expected direction during transcription. Proofreading In addition, RNA Pol carries out two proofreading functions as well. The first of these is called pyrophosphorilytic editing. In this, the enzyme uses its active site, in a simple back reaction, to catalyze the removal of an incorrectly inserted ribonucleotide, by reincorporation of PP i. The enzyme can then incorporate another ribonucleotide in its place in the growing RNA chain. Note that the enzyme can remove either correct or incorrect bases in this manner, but spends longer hovering over mismatches than matches and so removes the former more frequently. In the second proofreading mechanism, called hydrolytic editing, the polymerase back tracks by one or more nucleotides and cleaves the RNA product, removing the error containing sequence. Hydrolytic editing is stimulated by Gre factors, which as well as, enhancing hydrolytic editing function, also serve as elongation stimulating factors. That is, they ensure that polymerase elongates efficiently and helps overcome arrest at sequences that are difficult to transcribe. This combination of functions is comparable to those imposed on the eukaryotic RNA Pol II by the transcription factor TFIIs. Another group of proteins, the Nus proteins, joins polymerase in the elongation phase and promotes, in still rather undefined ways, the process of elongation and termination. Termination It detects termination signals that specify where a transcript ends. The length of RNA- DNA hybrid is determined by a structure within the enzyme that forces the RNA-DNA hybrid to separate, allowing the RNA chain to exit from the enzyme and the DNA chain to rejoin its DNA partner. The RNA product does not remain base paired to the template DNA strand, rather the enzyme displaces the growing chain only a few nucleotides behind where each ribonucleotide is added. Because this release follows so closely behind the site of polymerization, multiple RNA Pol molecules can transcribe the same gene at the same time, each following closely along behind another. Thus, a cell synthesizes large numbers of transcripts from a single gene (or other DNA sequence) in a short time. Thus, RNA Pol has the facility to unwind and rewind DNA, to hold the separated strands of DNA and the RNA product, to catalyze the addition of ribonucleotides to the growing RNA chain and to adjust the difficulties in progressing by cleaving the RNA product and restarting RNA synthesis (with the assistance of some accessory factors). (F) Fidelity of RNA synthesis Unlike DNA Pol, RNA Pol lacks a separate proof reading 3 5 exonuclease active site and hence error rate is high. Thus, in contrast with DNA Pol, RNA Pol does not correct the nascent polynucleotide chain. Consequently, the fidelity of transcription is much lower than that of replication. The error rate of RNA synthesis is of the order of one mistake per 10 4 or 10 5 nucleotides, about 10 5 times as high as that of DNA synthesis. The much lower fidelity of RNA synthesis can be tolerated because mistakes are not transmitted to progeny. Moreover, for most genes, many RNA transcripts are synthesized from a single gene and all RNAs are eventually degraded and replaced. A few defective transcripts are unlikely to be harmful to the cell than a mistake in the permanent information in DNA. 13

14 (2) Promoters The promoter is the region of DNA where RNA polymerase binds to initiate transcription. The information for promoter function is provided directly by the DNA sequence; its structure is the signal for transcription. The promoter surrounds the first base pair that is transcribed into RNA, the start point. As the promoters are present on the same DNA molecule as genes being transcribed or regulated, these are called cis-acting elements. E. coli has about 2000 promoter sites in its 4.6 X 10 6 bp genome. There are different types of promoters in E. coli, but most prevalent one is σ 70 promoter (standard promoter), which is dealt with in detail in the following discussion. (A) Consensus sequences A comparison of many prokaryotic promoter sequences reveals RNA Pol binding sites. An essential nucleotide sequence, called conserved sequence, should be present in all the promoters. However, conserved sequence need not be necessarily conserved at every single position, some variation is permitted. Putative DNA recognition sites can be defined in terms of an idealized sequence that represents the base most often present at each position. A consensus sequence is defined by aligning all known examples so as to maximize their homology. For a sequence to be accepted as a consensus each particular base must be reasonably predominant at its position and most of the actual examples must be related to the consensus by rather few (1-2) substitutions. The sequence of promoter in E. coli lack any extensive conservation of sequence over the 60 bp associated with RNA Pol. The sequence of much of the binding site is irrelevant. But some short stretches within the promoter are conserved and they are critical for its function. Bacterial promoters have following features: (i) Start point (+1 position): The initiating (+1) nucleotide is usually (>90% of the time) a purine nucleotide (A or G; A occurs more often than G). It is common for the start point to be the central base in the poorly conserved CAT or CGT sequence, but the conservation of the base triplet is not great enough to regard it as an obligatory signal. (ii) -10 sequence or Pribnow Box: The most conserved sequence recognizable in almost all promoters is a 6 bp long AT rich motif centered at ~10 nucleotides upstream of the start site. Because of its position, it is named as -10 sequence. This is also known as Pribnow Box (named after David Pribnow, who pointed out its existence in 1975). The center of the hexamer generally is close to 10 bp upstream of the start point; the distance varies in known promoters from -18 to - 9. Its consensus is 5 TATAAT and its average can be summarized in the form 5 T 80 A 95 T 45 A 60 A 50 T 96 3 where the subscript denotes the % occurrence of the most frequently found base, which in this case varies from 45-96%. If the frequency of occurrence indicates likely importance in binding RNA Pol, we would expect the initial highly conserved TA and the final almost completely conserved T in the -10 region to be the most important bases. The region is AT rich and hence low energy is required for strand separation at this region. A mutation in this region has been implicated to affect melting reaction. (iii) -35 sequence: A 6 bp long sequence centered at ~35 nucleotides upstream of the start site. The consensus is 5 TTGACA. In more detailed form the conservation is 5 T 82 T 84 G 78 A 65 C 54 A 45 14

15 3 where, the subscript denotes the % occurrence of the most frequently found base, which in this case varies from 45-84%. (iv) Distance between -10 and -35 sequences: The distance between these conserved sequences (-10 and -35 regions) is also very critical. It is between bp in 90% of the promoters (a separation of 17 nucleotides is optimal). In the exceptions it is as little as 15 nucleotides and as large as 20 nucleotides. However, the actual sequence of this intervening DNA is unimportant. The distance represents a single turn of the helix, thereby providing appropriate separation for simultaneous interaction of σ factor with the two motifs (-10 and -35 sequences). The promoters with the -10 and -35 sequences as 5 TATAAT and 5 TTGACA respectively are called standard promoters. These are recognized by σ 70 subunit of RNA Pol. Individual promoters usually differ from the consensus at one or more positions. A typical bacterial promoter is represented in Fig TTGACA bp....tataat. Purine region [Recognition Domain] - 10 region [Pribnow Box] [Unwinding Domain] + 1 [Start site] Fig. 5: Constitution of a typical bacterial promoter (v) Some other conserved sequences of σ 70 promoters: σ 70 promoters of some genes have additional consensus sequences such as: (a) Upstream promoter elements or UP elements: Richard Gourse discovered that promoters of certain highly expressed genes (for eg. genes encoding rrna, the rrn genes) contain a third AT rich recognition element, called UP (upstream promoter) element and occurs between positions 40 and 60. This UP element binds the C-terminal domain (CTD) of the RNA Polymerase α subunit. UP elements stimulate transcription at promoters that contain them by providing additional specific interaction site between the RNA Pol and DNA. The efficiency with which an RNA Pol binds to a promoter and initiates transcription is determined in large measure by these sequences, the spacing between them and their distance from the transcription start site. The sequence of UP element is: 5 NNAAAA/TA/TTA/TTTTTNNAAAANN (b) Extended-10 element: Another class of σ 70 promoters lack a -35 region and instead has a so called extended -10 element. This comprises a standard -10 region with an additional short sequence element at its upstream end. These elements are recognized by the σ region of RNA Pol. Extra contacts made between polymerase and this additional sequence element compensate for the absence of a -35 region, for eg. gal genes of E. coli use such a promoter. 15

16 Various combinations of bacterial promoter elements are shown in Fig. 6. ~ 17 bp UP element Extended-10 (B) Promoter efficiency Fig. 6: Combinations of bacterial promoter elements Promoters differ markedly in their efficacy. Depending upon the relatedness to the consensus sequences of the -10 and -35 sequences, the promoters are classified as strong promoters and weak promoters. Promoters with sequences closer to the consensus are generally stronger than those that match lesser. Strength of the promoter signifies the number of transcripts it can initiate in a given time. Genes with strong promoters are transcribed frequently, as often as every 2 minutes in E. coli. In contrast, genes with very weak promoters are transcribed about once in 10 minutes. Mutation of a single base in either -10 or -35 sequences can alter promoter activity. Mutations in the -35 region usually affect initial binding of RNA Pol and mutations in the -10 region usually affect the melting reaction. (C) Supercoiling is an important feature regulating efficiency of promoters Efficiency of some promoters is emphasized by the effects of supercoiling. Negative supercoiling increases the efficiency of some promoters by assisting the melting reaction by both prokaryotic and eukaryotic RNA Pol. As RNA Pol transcribes DNA unwinding and rewinding occurs. This requires that either the entire transcription complex rotates about the DNA or the DNA itself must rotate about its helical axis. The twin domain model for transcription illustrates the consequences of the rotation of the DNA. As RNA Pol pushes forward along the double helix, it generates positive supercoils (more tightly wound DNA) ahead and leaves negative supercoils (partially unwound DNA) behind. For each helical turn traversed by RNA Pol, +1 turn is generated ahead and -1 turn behind. Transcription therefore has a significant effect on the (local) structure of DNA. As a result, the enzyme gyrase, which introduces negative supercoils and topoisomerase I, which removes negative supercoils, are required to rectify the situation in 16

17 front of and behind the polymerase, respectively. Inappropriate superhelicity in the DNA being transcribed halts transcription. Quite possibly the torsional tension in the DNA generated by negative superhelicity behind the transcription bubble is required to help drive the transcriptional process, whereas too much such tension prevents the opening and maintenance of the transcription bubble. The dependence of a promoter on supercoiling is determined by its sequence. This would predict that some promoters have sequences that are easier to melt and are therefore less dependent on supercoiling, while others have more difficult sequences and have a greater need to be supercoiled. An alternative is that the location of the promoter might be important if different regions of the bacterial chromosome have different degrees of supercoiling. (D) Functions of promoter regions The function of -35 sequences is to provide the signal for recognition by RNA polymerase, while the -10 sequence allows the promoter-polymerase complex to convert from closed to open form. Thus, -35 sequence comprise a recognition domain while the -10 sequence comprises unwinding domain of the promoter. The consensus sequence of the -10 site consists exclusively of AT base pairs, which assists the initial melting of DNA into single strands. The lower energy needed to disrupt AT base pairs as compared to GC base pairs, means that a stretch of AT pairs demands the minimum amount of energy for strand separation. A typical promoter relies on its -35 and -10 sequences to be recognized by RNA Pol, but one or the other of these sequences can be absent from some (exceptional) promoters. In at least some of these cases, RNA Pol alone cannot recognize the promoter, and the reaction also requires ancillary proteins, which overcome the deficiency in intrinsic interaction between RNA Pol and the promoter. (E) Alternative promoter sequences There are several alternative promoter sequences that are recognized by different σ subunits. These promoters have sequences that differ from the consensus sequence of a conventional or standard promoter. Some examples are listed in Fig. 7. Heat Shock 5. CCCTTGAA bp CCCGATNT...3 Nitrogen 5....CTGGNA.6 bp.ttgca 3 starvation Flagella 5 CTAAA 15 bp GCCGATAA...3 where, N can be any nucleotide Fig. 7: Alternative promoter sequences 17

18 Termination Elongation Initiation (3) Overall process of prokaryotic transcription The process of transcription can be divided in three steps namely, Initiation, Elongation and Termination (Fig. 8). RNA Pol +1 DNA Promoter recognition Promoter Promoter binding (closed complex) Promoter melting (open complex) Initial transcription RNA Elongation after abortive initiations & promoter clearance RNA Elongation Termination, release of RNA & RNA Pol DNA + RNA + RNA Pol Fig. 8: The overall process of transcription in prokaryotes 18

19 (A) Initiation Transcription begins with the insertion of the first ribonucleotide (usually a purine). The end of initiation is signified by promoter clearance, where the RNA Pol moves ahead (along the DNA template) from the promoter site without dissociating, freeing the promoter for further initiation events. Promoter clearance occurs only if the open promoter complex is stable and this usually follows a number of abortive initiations where short transcripts are generated. This is a general property of RNA Pol and appears to be required for denovo strand synthesis. Initiation is usually the rate-limiting step in transcription and is the primary level of gene regulation in both prokaryotes and eukaryotes. The pathway of transcription initiation consists of two major parts, binding and initiation, and each part has multiple steps, which are summarized below. RNA Pol recognizes the promoter region, leads to local unwinding at the site bound by RNA Pol and causes some abortive initiations. During this phase the RNA Pol remains stationary at the site of binding (i.e. promoter) and its conformation remains essentially the same. During this phase, the first ~8-9 nucleotides are added. The initiation phase ends when the enzyme succeeds in extending the RNA chain and clears the promoter. Regulatory proteins that bind to specific sequences near promoter sites and interact with RNA polymerase also markedly influence the frequency of transcription of many genes. The initiating reaction is simply the coupling of two NTPs in the reaction given below: ppp A + ppp N pppapn + PP i Bacterial RNAs have 5 -triphosphate groups as was demonstrated by the incorporation of radioactive label into RNA when it was synthesized with [γ- 32 P] ATP. In such a case, only the 5 terminus of the RNA can retain the label because the internal phosphodiester groups of RNA are derived from the α-phosphate groups of NTPs. Initiation in transcription is further divided into discrete phases of DNA binding and initiation of RNA synthesis, which are described below: (i) Template and promoter recognition and formation of closed binary complex: The holoenzyme-promoter reaction starts by forming a closed binary complex. Closed means that the DNA remains duplex. Initially, the σ subunit of the enzyme RNA Pol (σ subunit is involved in promoter selection) binds loosely and reversibly to duplex DNA and searches for the promoter sequence. This is the closed binary complex or closed promoter complex or closed promoterpolymerase complex. In E. coli, RNA Pol binding occurs within a region stretching ~50 bp before the transcription start site to ~20 bp beyond it. Because the formation of closed binary complex is reversible, it is usually described by equilibrium constant (K B ). There is a wide range in values of the equilibrium constant for forming the closed complex. Formation of the closed complex is readily reversible and RNA Pol can as easily dissociate from the promoter as make the transition to the open complex. (ii) Formation of open binary complex or isomerization: The transition from the closed promoter complex (in which DNA is double helical) to the open promoter complex (in which 19

20 a DNA segment is unwound) is an essential event in transcription. In the bacterial enzyme bearing σ 70, this transition often termed isomerization, does not require energy derived from ATP hydrolysis and is instead the result of a spontaneous conformational change in the DNAenzyme complex to a more energetically favorable form. Isomerization is essentially irreversible and once complete, typically guarantees that transcription will subsequently initiate (though regulation can still be imposed after this point in some cases). Although RNA Pol can search for promoter sites when bound to double helical DNA, a segment of the helix must be unwound before synthesis can begin. A region of duplex DNA must be unpaired so that the nucleotides on one of its strands become accessible for base pairing with incoming ribonucleotides. When the correct sequence is recognized by RNA Pol holoenzyme, the DNA at the promoter site is intact and locally unwound (DNA melting). The series of events leading to formation of an open complex is called tight binding. Due to tight binding, the interaction between the RNA Pol holoenzyme and DNA becomes irreversible and the closed complex undergoes a transition to open complex. Thus, the closed complex is converted into an open complex by melting of a short region of DNA within the sequence bound by the enzyme. This characterizes the open binary complex, open promoter complex or open promoterpolymerase complex. Here, DNA strands separate locally over a distance of ~17 bp of DNA (from within the -10 region to position +2 or +3), which corresponds to 1.6 turns of the B-DNA helix. This opening frees the template strand to be available for base pairing with ribonucleotides. Unwinding increases the negative supercoiling of DNA. Negative supercoiling of circular DNA favors transcription of genes because it facilitates unwinding. For strong promoters, conversion into an open binary complex is irreversible, so this reaction is described by a rate constant (k 2 ). This reaction is fast. σ factor is involved in the DNA melting reaction. (iii) Formation of ternary complex (unstable) and Abortive initiations: The next step is to incorporate the first two nucleotides and then catalyze a phosphodiester bond formation between them. This generates a ternary complex that contains RNA as well as DNA and enzyme. The ribonucleotides are aligned on the template strand and joined together. The initiating ribonucleotide is usually a purine (A or G). RNA Pol makes specific interactions with the initiating purine, holding it rigidly in correct orientation to allow chemical attack on incoming NTP. The requirement for such specific interactions between the enzyme and the initiating NTP probably explains why most transcripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and thus only chains beginning with A are held in a manner suitable for efficient initiation. It is believed that the interactions are provided by various parts of the RNA Pol holoenzyme, including part of σ. Consistent with this, in experiments using an RNA Pol containing a σ 70 derivative lacking this part of σ, initiation requires much higher than normal concentrations of initiating it. The region containing RNA Pol, DNA and nascent RNA is called a transcription bubble (called so because it contains a locally melted bubble of DNA) or transcription complex. Formation of ternary complex is described by the rate constant k i ; this is even faster than the rate constant k 2. Further nucleotides can be added without any enzyme movement to generate an RNA chain of up to 9 bases. Thus, RNA Pol forms an unstable ternary complex comprising of DNA-RNA hybrid helix (i.e. DNA template and short RNA) and RNA Pol holoenzyme. This RNA-DNA 20

21 helix is thus ~8 bp long, which corresponds to about one complete turn of the double helix. The RNA-DNA hybrid also rotate each time a nucleotide is added so that 3 -OH end of RNA stays at the catalytic site of RNA Pol. Incorporation of first 9-10 ribonucleotides is a rather inefficient process. After each base is added, there is a certain probability that the enzyme will release the chain. At this stage the enzyme often releases short transcripts (each have less than ~10 ribonucleotides) and then starts synthesis of RNA again. Abortive initiations (i.e. synthesis of short RNA) probably involve synthesizing an RNA chain that fills the active site. If the RNA is released, the initiation is aborted and must start again. A cycle of abortive initiation usually occurs to generate a series of very short oligonucleotides. Initiation is accomplished when the enzyme manages to move along the template to the next region of the DNA into the active site. The occurrence of a cycle of abortive initiations before the enzyme moves to the next phase is a general property of RNA Pol and appears to be required for denovo strand synthesis. (iv) Formation of ternary complex (stable) and Promoter clearance: Once an RNA Pol holoenzyme succeeds in synthesizing a nascent RNA chain of ~9-10 bases, i.e. when initiation succeeds, σ is no longer necessary. The enzyme makes the transition to the elongation ternary complex of core polymerase, DNA and nascent RNA. This involves a conformational change in polymerase that help it to grip the template more firmly converting the ternary complex to the elongation form. This conformational change is followed by movement of the RNA Pol away from the promoter site, without dissociating, thereby freeing the promoter (i.e. promoter clearance) for further initiation events. Thus, promoter clearance occurs only if the open complex is stable (stable ternary complex) and usually follows a number of abortive initiations. This signifies the end of the initiation phase and the transition to the elongation phase leading to the extension of RNA chain beyond 10 bases. The efficiency of promoter clearance is modulated by the nature of the first fifty or so bases in the transcribed region. The minimum value of the promoter clearance time (i.e. the time taken by the RNA Pol to leave the promoter so that another RNA Pol can initiate) is 1-2 sec, within which the RNA Pol establishes the maximum frequency of initiation as <1 event per sec. (B) Elongation When the first ~9 nucleotides have been added, the transcribed template strand is scrunched in the active site. The active site can hold a transcript of 6-9 nucleotides. The transcription bubble moves along DNA and the RNA chain is extended in the 5 3 direction (Fig. 9). As the RNA Pol holoenzyme clears the initiation site and enters the elongation phase of transcription, the σ subunit may either dissociate or remains associated with the core enzyme. It was discovered that σ factor is released after initiation. However, this may not be strictly true. Direct measurements of elongating RNA Pol complexes show that ~70% of them retain σ factor. Such a third of elongating polymerases lack σ, the original conclusion is certainly correct that it is not necessary for elongation. In those cases where it remains associated with core enzyme, the nature of the association has almost certainly changed. The core enzyme without σ binds more strongly to the DNA template. From this point onwards, the core enzyme undertakes RNA chain elongation beyond 10 bases. The core enzyme then 21

22 moves along the template strand, opening (or unwinding) the DNA helix ahead of the site of polymerization (i.e. front or leading edge) so as to expose a new segment of the template in single stranded condition. During this time, subsequent ribonucleotides are added to the 3 end of the growing RNA chain. Elongation involves the movement of the transcription bubble (a distance of 170 Å / second, corresponding to a rate of elongation of ~50 nucleotides / sec) by a disruption of DNA structure, in which the template strand of the transiently unwound region is paired with the nascent RNA at the growing point. As in the initiation phase, about 17 bp of DNA are unwound at a time throughout the elongation phase. It has been found that the RNA- DNA hybrid and the unwound region of DNA stay rather constant as RNA Pol moves along the DNA template, thereby indicating that the unwound DNA reseals (or rewinds) at the same rate behind (i.e. rear or trailing edge) the RNA Pol. The RNA-DNA hybrid must also rotate each time a nucleotide is added so that the 3 -OH end of the RNA stays at the catalytic site. When the RNA chain extends to bases, the enzyme makes a further transition to form the complex that undertakes elongation and now it covers bp (depending on the stage in elongation cycle). Double helical DNA Unwound DNA (17 bp opened) Coding strand Rewinding Nascent RNA 5'ppp RNA -DNA hybrid RNA polymerase 3' elongation site Unwinding Template strand Movement of RNA polymerase (C) Termination Fig. 9: Transcription bubble Termination involves following steps: Cessation of formation of phosphodiester bonds Dissociation of RNA-DNA hybrid Rewinding of melted region of DNA Release of RNA Pol from DNA Sequences called terminators trigger the elongating polymerase to dissociate from the DNA and release the RNA chain it has made. E. coli has at least two classes of termination signals, one class relies on a protein factor called ρ (rho) and the other is ρ-independent. Both ρ dependent and independent terminators respond to a functioning signal that lies within the newly 22

23 synthesized RNA rather than in template DNA. In both types of termination, pausing by RNA Pol is important in order to allow time for actual termination event to occur. (i) ρ-independent (intrinsic) termination: Many terminators require a hairpin to form in the secondary structure of the RNA being transcribed. This indicates that termination depends on the RNA product and is not determined simply by scrutiny of DNA sequence during transcription. ρ-independent terminators (Intrinsic terminators) have two structural features: A hairpin in secondary structure The first feature is a region that produces an RNA transcript with self-complementary sequences, permitting the formation of a hairpin structure centered nucleotides before the projected end of the RNA strand. Formation of the hairpin structure in the RNA disrupts several AU base pairs in the RNA-DNA hybrid segment. This also pauses RNA Pol immediately after it has synthesized a stretch of RNA that folds into a hairpin and disrupts important interactions between RNA and the RNA Pol, thereby facilitating dissociation of the transcript. Hairpin usually contains a GC rich region near base of stem. The typical distance between hairpin and U rich region is 7-9 bases. There are ~1100 sequences in E. coli genome that fit this criterion, suggesting that half of the genes have intrinsic terminator. A region that is rich in U residues at the very end of the unit The hairpin only works as an efficient terminator when it is followed by a stretch (4 or more AU) of AU base pairs. This is because under those circumstances, at the time the hairpin forms, the growing RNA chain will be held on the template at the active site by only AU base pairs. As AU base pairs, the weakest of all base pairs, (weaker even than AT base pairs), are more easily disrupted by the effects of the stem loop on the transcribing polymerase and so the RNA will more readily dissociate (Fig. 10). Rho independent termination mrna GC rich region Template strand Formation of Hairpin GC rich region UUUUUUU UUUUUUU Coding strand + GC rich UUUUUUU mrna Double stranded DNA Fig. 10: Rho (ρ) independent (intrinsic) termination 23

24 (ii) ρ-dependent termination: As already discussed, RNA Pol needs no help to terminate transcription at a hairpin followed by several U residues. At other sites, however, termination requires the participation of additional factor. This discovery was prompted by the observation that some RNA molecules synthesized in vitro by RNA Pol acting alone are longer than those made in vivo. The missing factor, a protein that caused the correct termination, was isolated and named rho (ρ), also called rho transcription terminator factor. Additional information about the action of the rho was obtained by adding this termination factor to an incubation mixture at various times after the initiation of RNA synthesis. RNAs with sedimentation coefficients of 10S, 13S and 17S were obtained when rho was added at initiation, a few seconds after initiation and 2 minutes after initiation, respectively. If no rho was added, transcription yielded a 23S RNA product. It is evident that the template contains at least three termination sites that respond to rho (yielding 10S, 13S and 17S RNA) and one termination site that does not (yielding 23S RNA). Thus, specific termination at a site producing 23S RNA can occur in the absence of rho. However, ρ detects additional termination signals that are not recognized by RNA Pol alone (Fig. 11). DNA Template Initiation ρ (Rho) sites (Indicated by arrows) Termination in absence of ρ RNA Transcripts No Rho (23S species) Rho present at start of synthesis (10S species) Rho added 30 sec later (13S species) Rho added 2 min later (17S species) Fig. 11: Effect of Rho protein on the size of the transcript The ρ-dependent terminators lack the sequence of repeated A residues in the template strand but usually include a CA rich sequence called a rut (rho utilization) element. Optimally these sites consist of stretches of about 40 nucleotides that do not fold into a secondary structure i.e. they remain largely single stranded. They are also C rich. The second level of specificity is that rho fails to bind any transcript that is being translated i.e. transcript bound to ribonucleotides. In bacteria transcription and translation are coupled tightly, translation initiates on growing RNA transcript as soon as they start exiting polymerase, while they are still being synthesized. Thus, rho typically terminates only those transcripts still being transcribed beyond the end of a gene or operon. 24

25 ρ is a homo-hexameric terminator protein with a size of ~275 kd (each subunit size is 419 residues). The X-ray structure of ρ protein reveal that the six monomers form an open ring. The ring is not flat. The sixth subunit is further down in the plane of the page than the first. Its first and sixth subunits are separated by a gap of 12 Å and the helical pitch (rise along the helix axis) between them is 45 Å. The RNA transcript on which ρ acts, is believed to bind along the bottom of each subunit and then thread through the middle of the ring. Each ρ subunit consists of two domains that can be separated by proteolysis: Its N-terminal domain or RNA binding domain binds single stranded polynucleotides and its C-terminal domain or ATP-hydrolysis domain, which is homologous to the α and β subunits of the F 1 -ATPase, binds an NTP. It hydrolyzes ATP in the presence of single stranded RNA, probably through recognition of a specific structural feature rather than a consensus sequence. The RNA, which is only partially visible in the structure, binds to the so-called primary RNA binding sites on the N-terminal domains that face the interior of the helix and to the so-called secondary RNA binding sites on the C-terminal domain that have been implicated in mrna translocation and unwinding. The ρ protein has an ATP-dependent RNA-DNA helicase activity. It binds to nascent RNA at specific binding sites or recognition sequences (Fig. 12). It then uses its RNA-dependent ATPase activity to provide the energy to translocate along the RNA in the 5 3 direction to a sequence that is rich in C and poor in G residues preceding the actual termination site. C is by far the most common base (41%) and G is the least common base (14%). As a general rule, the efficiency of ρ-dependent terminators increases with the length of C-rich or G-poor region. Rho hydrolyzes ATP in presence of single stranded RNA, probably through recognition of a specific structural feature rather than a consensus sequence. Coding strand RNA -DNA hybrid Template strand ATP + H O 2 ADP + Pi 5'ppp RNA Rho protein RNA polymerase Mechanism of the termination of transcription by rho protein Fig. 12: Mechanism of Rho dependent termination 25

26 Proteins, in addition to ρ, mediate and modulate termination. For eg. Nus A protein enables RNA Pol in E. coli to recognize a characteristic class of termination sites. In E. coli, specialized termination signals called attenuators are regulated to meet the nutritional needs of the cell. Transcription in eukaryotes Robert Roeder and William Rutter discovered that eukaryotic transcription machinery is much more complex as compared to that of prokaryotes, as large number of polypeptides are associated with the eukaryotic transcription machinery. The mechanism of eukaryotic transcription is, however, similar to that in prokaryotes. Unlike in bacteria, eukaryotic genome is packaged into the chromatin structure (nucleosomal structure) and therefore is inaccessible to the transcription machinery. Prior to transcription of a specific gene, its chromatin structure is modified to become more accessible to the transcription apparatus. The two most well understood mechanisms of chromatin modifications are: (i) (ii) Specific modifying complexes: Many eukaryotic gene activator proteins modify chromatin structures by recruiting histone acetyltransferases. Nucleosome remodeling by chromatin remodeling complexes. Acetylation and remodeling prepares the gene promoter to initiation assembly of RNA Pol, other accessory proteins and gene specific transcription factors to initiate the transcription process. In transcription, only some regions of the genome are transcribed and the regions chosen vary in different cells or in the same cell at different times i.e. one to several thousand transcripts can be made of a given region in a single cell. Eukaryotic transcription apparatus Eukaryotic transcription machinery involves three RNA polymerases, number of general transcription factors, several elongation factors and large repertoire of gene specific transcription factors and activators. Furthermore, the entire transcription machinery is coupled with an enormously complex signal transduction cascade that integrates the external stimuli with the transcription machinery. (1) RNA polymerase or DNA dependent RNA polymerase (RNA Pol) Eukaryotic cells have three kinds of nuclear RNA polymerases, RNA Pol I, II and III. These are distinct complexes but have certain subunits in common. Each RNA Pol is large and has 12 or more different subunits. In S. cerevisiae, RNA Pol I, II and III have 14, 12 and 17 subunits respectively. While some of these subunits are exclusive for one RNA Pol, others are either identical or structurally related. Each polymerase has a specific function and is recruited to a specific promoter sequence. They differ in their template specificity and location in the nucleus. Although all eukaryotic RNA Pols are homologous to one another and to prokaryotic RNA Pol, RNA Pol II contains a unique carboxyl terminal domain called tail. Another major distinction 26

27 among the polymerases lies in their responses to the fungal toxin α-amanitin, a cyclic octapeptide that contains several modified amino acids. The activities of different RNA Pols are distinguished by their different sensitivities to the toxin. Properties of different eukaryotic RNA polymerases have been summarized in Table 4. In addition to these three different nuclear RNA Pols, eukaryotic cells contain separate polymerases in mitochondria and chloroplast. These small (~100 kd) single subunit RNA Pols, which resemble those encoded by certain bacteriophages are much simpler than the nuclear RNA Pols, although they catalyze the same reaction. Table 4: Properties of different eukaryotic RNA polymerases S. No. Properties RNA Pol I RNA Pol II RNA Pol III 1 Location Nucleoli Nucleoplasm Nucleoli 2 Function (cellular transcripts) 3 Sensitivity to α- amanitin fungal toxin (cyclic octapeptide) 4 Number of subunits 5 Polymerase activity / cell 6 Class of genes transcribed Synthesis of precursors of most rrna (5.8S, 18S and 28S) Insensitive Synthesis of precursors of mrna and some small nuclear RNAs (snrnas) Very sensitive (Strongly inhibited); binds tightly and inhibit elongation phase % 20-40% 10% Synthesis of precursors of 5S rrna and trna and small nuclear and cytosolic RNAs Moderately sensitive (inhibited by high concentrations) Class I Class II Class III (A) RNA Pol I (RNA Pol A) RNA Pol I (Pol I or Pol A) is located in the nucleoli. It is responsible for continuous synthesis of rrna during interphase. The continuous transcription of multiple gene copies of the RNAs is essential for sufficient production of the processed rrnas, which are packaged into ribosomes. Human cells contain 5 clusters of around 40 copies of rrna gene situated on different chromosomes. Each rrna cluster is known as a nucleolar organizer region, since the nucleolus contains large loops of DNA corresponding to the gene clusters. After a cell emerges from mitosis, rrna synthesis restarts and tiny nucleoli appear at the chromosomal locations of the rrna genes. Each rrna gene produces a 45S rrna transcript called pretranscript or preribosomal RNA or pre-rrna, which is ~13000 nucleotide long. During active rrna 27

28 synthesis, the pre-rrna transcripts are packed along the rrna genes and may be visualized in electron microscope as Christmas tree structures. In these structures, the RNA transcripts are densely packed along the DNA and stick out perpendicularly from the DNA. The 45S pretranscript is cleaved to give one copy each of 28, 18, 5.8S rrnas, which are 5000, 2000 and 160 nucleotides long respectively. (B) RNA Pol II (RNA Pol B) Among the three RNA polymerases, RNA Pol II (Pol II or Pol B) is functionally most versatile as it transcribes the mrnas and some specialized RNAs such as most of the small nuclear RNAs (snrnas). RNA Pol II is central to eukaryotic gene expression and has been studied extensively. RNA Pol II is located in the nucleoplasm. This enzyme can recognize thousands of promoters that vary greatly in sequence. Although this RNA Pol II is strikingly more complex than its bacterial counterpart, the complexity masks a remarkable conservation of structure, function and mechanism. (i) Structure: RNA Pol II is somewhat larger than and has several subunits that have no counterpart in Thermus aquaticus / bacterial RNA Pol. Pol II is a huge enzyme with a molecular mass of up to 600 kd. The enzyme contains two nonidentical large (>120 kd) subunits comprising ~65% of its mass that are homologs of the prokaryotic RNA Pol β and β subunits and up to 12 additional small (<50 kd) subunits, two of which are homologs of prokaryotic RNA Pol α subunits and one of which is a homolog of prokaryotic RNA Pol ω subunit. Of these small subunits, five are identical in all three eukaryotic RNA Pols and two others (the RNA Pol α homologs) are identical in RNA Pol I and III. Thus, 10 of the 12 RNA Pol II subunits are either identical or closely similar to subunits of RNA Pol I and III. Moreover, the sequences of these subunits are highly conserved (~50% identical) across species from yeast to humans (and to a less extent between eukaryotes and bacteria). In fact, in all ten cases tested, a human RNA Pol II subunit could replace its counterpart in yeast without loss of cell viability. Roger Kornberg determined the X-Ray crystallographic structure of RNA Pol II in yeast. Overall the shape of yeast RNA Pol II enzyme resembles a crab claw, which is similar to bacterial Taq RNA Pol. The yeast enzyme has positions and core folds similar to their homologous subunits in bacterial RNA Pol. The two pincers of the crab claw (RNA Pol II) are made up predominantly of the RPB1 and RPB2. The active site, which is made up of regions from both these subunits, is found at the base of the pincers within a region called the active center cleft. The highly conserved helical segment of RBP1 called bridge bridges the two pincers forming the enzyme s cleft. This helix is straight in all X-Ray structures of RNA Pol II yet determined, but it is bent in that of Taq RNA Pol. A massive (~59 kd) portion of RPB1 and RPB2 named the clamp swings down over the DNA to trap it in the cleft. A portion of RPB2 called the wall directs the template strand out of the cleft in a ~90 turn. A loop called the rudder extends from the clamp. There are various channels that allow DNA, RNA and ribonucleotides into and out of the enzyme s active center cleft. Various subunits of RNA Pol II are summarized below (Table 5). (a) RBP1 having C-terminal Domain (CTD) and RBP2: RBP1 is the largest subunit and exhibits a high degree of homology to the β subunit of a bacterial RNA Pol. It contains the 28

29 active site of the enzyme RNA Pol II. It has an unusual feature, a long carboxyl terminal domain (CTD) called tail. The tail consists of many highly conserved repeats of a heptad amino acid sequence Tyr-Ser-Pro-Thr-Ser-Pro- Ser (YSPTSPS). There are 27 repeats in the yeast enzyme (18 exactly matching the consensus), 52 (21 exact) in the mouse enzyme and 53 in human enzyme. This CTD is separated from the main body of the enzyme by an unstructured linker sequence. These repeats are essential for viability. The CTD sequence may be subjected to phosphorylation at Ser and Tyr. Five of the 7 residues in these particularly hydrophilic repeats bear OH groups and at least 50 of them, predominantly those on Ser residues, are subject to reversible phosphorylation by CTD kinases and CTD phosphatases. In vitro studies have shown that RNA Pol II initiates transcription only when the CTD is unphosphorylated. Phosphorylation of CTD occurs during transcription elongation as RNA Pol leaves the promoter. Charge-charge repulsions between nearby phosphate groups probably cause a highly phosphorylated CTD to project as far as 500 Å from the globular portion of RNA Pol II. The phosphorylated CTD provides the binding sites for numerous auxillary factors that have essential roles in the transcription process. The CTD has been shown to be an important target for differential activation of transcription elongation. Such so-called tail is absent in bacterial enzyme. RBP2 is structurally similar to the bacterial β subunit. (b) RBP3 and RBP11: These two subunits show some structural homology to the bacterial α subunits. (c) Rbp4 and Rbp7: Genetic studies have demonstrated that some of the Pol II specific subunits are dispensable. Thus, two subunits, Rbp4 and Rbp7, are not essential for activity and are present in RNA Pol II in less than stoichiometric amounts. Rbp7 has a 102-residue segment that is 30% identical to a portion of σ 70 of E. coli. These subunits are absent in yeast (Saccharomyces cerevisiae) RNA Pol II. (d) RBP6: RPB6 is homologous to the ω subunits of bacterial RNA Pol. Although Pol II has the smallest number of subunits, it transcribes the largest and most diverse array of promoters. A number of other proteins, which are not part of the Pol II complex, are used by RNA Pol II as subsidiary proteins, thereby contributing to its functional diversity. (ii) Nucleotide addition and RNA Pol II translocation: RNA Pol II binds two Mg ++ ions at its active site in the vicinity of 5 conserved acidic residues, which suggests that RNA Pol catalyze RNA elongation via a two-metal ion catalytic mechanism for nucleotide addition similar to that proposed for all types of polymerase. As is the case with Taq RNA Pol, the surface of the RNA Pol II is almost entirely negatively charged except for the DNA binding cleft and the region about the active site, which are positively charged. (C) RNA Pol III (also called as RNA Pol C) RNA Pol III occurs in the nucleoplasm and synthesizes the precursors of 5SrRNA, trna, U6snRNAs and a variety of other small nuclear and cytosolic RNAs. It has 16 or more subunits. 29

30 RNA Pol III transcribes the 5S rrna component of large ribosomal subunit. This is the only rrna subunit to be transcribed separately. Like the other rrna genes, which are transcribed by RNA Pol I, the 5S rrna genes are tandemly arranged in a gene cluster. In humans, there is a single cluster of around 2000 genes. Less is known about signals and ancillary factors involved in termination for eukaryotic polymerases. Each class of polymerase uses a different mechanism. Genetic studies have demonstrated that in contrast to Pol I and Pol II, all subunits of Pol III are essential. Table 5 summarizes various prokaryotic and eukaryotic RNA polymerase subunits. Table 5: Comparison of prokaryotic and eukaryotic subunits of RNA polymerases S. No. Prokaryotic Eukaryotic Bacterial Archaeal RNA Pol I RNA Pol II RNA Pol III 1 β A / A RPA1 RPB1 RPC1 2 β B RPA2 RPB2 RPC2 3 α D RPC5 RPB3 RPC5 4 α L RPC9 RPB11 RPC9 5 ω K RPB6 RPB6 RPB6 [+6 others] [+9 others] [+7 others] [+12 others] Note: The subunits in each column are listed in order of decreasing molecular weight. (2) Eukaryotic promoters Unlike bacterial promoters, which have relatively simple structures, eukaryotic promoters are highly complex in nature. The various promoters are described in the following sections: (A) RNA Pol I promoter Since, the numerous rrna genes in a given eukaryotic cell have essentially identical sequences, its RNA Pol I only recognizes one promoter. Yet, in contrast to the case for RNA Pol II and III, RNA Pol I promoters are specific, i.e., an RNA Pol I only recognizes its own promoter and those of closely related species. Pol I promoters vary greatly in sequence from one species to another. Thus, e.g. mammalian RNA Pol I has a bipartite promoter consisting of two transcription control regions: (i) Core promoter element: It refers to minimal set of sequence element required for accurate transcription initiation. It spans positions -31 to +6. It includes transcription start site and hence overlaps the transcribed region. It has a short conserved sequence element, a short AT rich sequence around start point called initiator sequence (Inr). This sequence is essential for transcription (Fig. 13). 30

31 (ii) Upstream control element (UCE) or Upstream promoter element (UPE): It is located between residues -187 and -107 bp upstream from the start site (Fig. 13). The element is GC rich. The UCEs are ~85% identical and ~50-80 bp long. The sequence is bound by specific transcription factors, which then recruit RNA Pol I to the transcription start site. The UCE is thus responsible for an increase in efficiency of transcription by 10- to 100-fold compared to that from the core element alone. Pre-rRNA gene Upstream control element (UCE) (~50-80 bp long; GC rich) about -100 Transcription start site (+1) Core promoter Fig. 13: RNA Pol I promoter (B) RNA Pol II Promoter The promoters recognized by RNA Pol II are considerably longer, complex and more diverse than those of prokaryotic genes. Like RNA Pol I, RNA Pol II promoter consists of core promoter and regulatory regions, which are described below: (Fig. 14). (i) Core promoter (Basal elements): The eukaryotic core promoter refers to the minimal set of sequence elements required for accurate transcription initiation by the Pol II machinery. A core promoter is ~40 nucleotides long, extending either upstream or downstream of the transcription start site. Four elements found in Pol II core promoters are TATA box, BRE, Inr and DPE. Typically, a promoter includes only two or three of these four elements. Many Pol II promoters have a few sequence features in common, including a TATA box (eukaryotic consensus sequence TATAAA) near base pair -30 and an Inr sequence (initiator) near the RNA start site at +1. However, few Pol II promoters lack a TATA box or a consensus Inr element or both. The sequence elements summarized here are more variable among the Pol II promoters of eukaryotes than among E. coli promoters. (a) TATA box or Hogness box: An A/T rich sequence (TATAA/TAA/T) called TATA box is located -25 to -30 bp upstream of the transcription start site. The consensus sequence (homologous segment, TATA box) is T 82 A 97 T 93 A 85 A 63 /T 37 A 83 A 50 /T 37 and the subscripts indicate the % occurrence of corresponding base. This TATA box resembles the -10 region of prokaryotic promoters (TATAAT), although they differ in their locations relative to the transcription start site (-27 vs -10). This conserved region was first discovered by Goldberg Hogness and is also called (GH) box or Hogness box. 31

32 The TATA box is the major assembly point for the proteins of the preinitiation complexes of Pol II. The deletion of the TATA box does not necessarily eliminate transcription; rather it generates heterogeneities in the transcriptional start site, thereby indicating that the TATA box participates in selecting this site. (b) TFIIB recognition element (BRE): Immediate upstream of the TATA box is the TFIIB recognition element, which is targeted by TFIIB. The consensus sequence is: G/CG/CG/ACGCCC. (c) Initiator sequence (Inr): The initiator element (Inr) is located around the transcription start site (+1). The consensus sequence of Inr is: C/TC/TANT/AC/TC/T. Many initiator elements have a C at position -1 and an A at +1. The DNA is unwound at the initiator sequence and the transcription start site is usually within or very near this sequence. (d) Downstream promoter element (DPE): Further downstream in the transcribed element is downstream promoter element having the consensus sequence: A/GGA/TCGTG BRE TATA Inr DPE Binding sites for: TFIIB TBP of TFIID TFIID TFIID Consensus G/CG/CG/ACGCCC YYANT/AYY sequences: TATAA/TAA/T A/GGA/TCGTG where, N represents any nucleotide and Y is pyrimidine nucleotide Fig. 14: RNA Pol II promoter Promoters contain different combinations of conserved elements. No element is common to all the promoters. The elements found in any individual promoter differ in number, location and orientation. Some eukaryotic genes contain an initiator element instead of a TATA box. Other promoters have neither a TATA box nor an initiator element. These genes are generally transcribed at low rates and initiation may occur at different start sites over a length of up to 200 bp. These genes often contain a GC rich bp region within the first bp upstream from start site (described below). (ii) Upstream regulatory elements (URE): The basal elements primarily determine the location of the start point, but also sponsor initiation only at a rather low level. Thus, the 32

33 basal elements are not sufficient for strong promoter activity. Additional elements called upstream regulatory elements located between -40 and -200 bp (present on template strand) upstream of transcription start site are important in order to increase the low activity of basal promoters. These sequences are important in regulating Pol II promoters and vary greatly in type and number. They serve as binding sites for a wide variety of proteins that affect the activity of Pol II. These elements are found in many genes, which vary widely in their levels of expression in different tissues. The examples are: (a) GC box: The structural genes expressed in all tissues, eg. House keeping genes or constitutive genes (genes that are continuously expressed rather than regulated), have one or more copies of the sequence 5 -GGGCGG-3 located upstream from their transcription start sites. They are located at about -90 position, however, the positions of these upstream sequences vary from one promoter to another. Often multiple copies are present in the promoter and they occur in either orientation. The structural genes that are selectively expressed in one or a few types of cells often lack these GC rich sequences. (b) CAAT box: The gene region extending between -50 and -110 also contains promoter elements. They can occur in either orientation. For instance, many eukaryotic structural genes, including those encoding the various globins, have a conserved sequence of consensus 5 -GGNCAATCT-3 (the CAAT box) located between about -70 and -90 whose alteration greatly reduces the transcription rate of the gene. Globin genes have, in addition, a conserved CACCC box upstream from CCAAT box that has also been implicated in transcriptional initiation. The CAAT and GC boxes in eukaryotes differ from that of the similar regions in prokaryotes. The positions of these upstream sequences vary from one promoter to another, in contrast with the quite constant location of the -35 region in prokaryotes. The CAAT box and the GC box can be effective when present on the template strand, unlike the -35 region, which must be present on the coding strand. These differences between prokaryotes and eukaryotes reflect fundamentally different mechanisms for the recognition of cis acting elements. The -10 and -35 sequences in prokaryotic promoters correspond to binding sites for RNA Pol and its associated σ factor. In contrast, the TATA, CAAT, GC boxes and other cis acting elements in eukaryotic promoters are recognized by proteins other than RNA Pol itself. Although the promoter conveys directional information (transcription proceeds only in the downstream direction), the GC and CAAT boxes seem to be able to function in either orientation. They can function at distances that vary considerably from the start point. This implies that the elements function solely as DNA binding sites to bring transcription factors into the vicinity of the start point; the structure of a factor must be flexible enough to allow it to make protein-protein contacts with the basal apparatus irrespective of the way in which its DNA-binding domain is oriented and its exact distance from the start point. GC and CAAT boxes thus play a strong role in determining the efficiency of the promoter, but do not influence its specificity. 33

34 (C) RNA Pol III promoter The promoters recognized by RNA Pol III are well characterized. Interestingly, some of the sequences required for the regulated initiation of Pol III are located within the gene itself, whereas others are in more conventional locations upstream of the RNA start site (Fig. 15). (i) 5S rrna genes: The genes for 5S rrna are organized in a tandem cluster. The promoters of genes transcribed by RNA Pol III can be located entirely within the transcribed region (i.e. internal) of the gene. These sequences are therefore conserved sequences in both 5S rrna and DNA. Donald Brown established this through the construction of a series of deletion mutants of a Xenopus borealis 5S RNA gene. The 5S rrna promoter contains the following conserved sequences, which are depicted in Fig. 15. (a) C box: It is located bases downstream from the transcription start site. (b) A box: It is located at around bases downstream of the transcription start site. The sequence of the Box A is: 5 -TGGCNNAGTGG-3. Transcription start site (+1) Box A Conserved sequences: TGGCNNAGTGG Box C Fig. 15: RNA Pol III promoter for 5S rrna (ii) trna genes: RNA Pol III promoters of trna genes contain two highly conserved sequences within the DNA encoding the trna (internal transcription control regions), namely Box A and Box B. These regions lie downstream from the transcription start site i.e. after the transcription start site and within the transcription unit (Fig. 16). (a) Box A: It is located around bases downstream of transcription start site. The sequence of the Box A is: 5 -TGGCNNAGTGG-3. (b) Box B: It is located downstream of transcription start site. The sequence of Box B is: 5 -GGTTCGANNCC-3. As both of these sequences lie within the gene, these are conserved in both trna and DNA. Thus, these sequences also encode important sequences in the trna itself, called the D-loop and the TψC loop. 34

35 Transcription start site (+1) +55 Box A Conserved sequences: TGGCNNAGTGG Box B GGTTCGANNCC Fig. 16: RNA Pol III promoter for trna (iii) Alternative RNA Pol III promoters: A number of RNA Pol III promoters are regulated by upstream as well as downstream promoter sequences. Further studies have shown, however, that the promoters of other RNA Pol IIItranscribed genes lie entirely upstream of their start sites. These upstream sites also bind transcription factors that recruit RNA Pol III. These promoters require only upstream sequences including the TATA box and other sequences found in RNA Pol II promoters. Some promoters such as the U6 small nuclear RNA (U6 snrna) and small RNA genes from the Epstein-Barr virus use only regulatory sequences upstream from their transcription start sites. The coding region of the U6 snrna has a characteristic A box. However, this sequence is not required for transcription. The U6 snrna upstream sequence contains sequences typical of RNA Pol II promoters, including a TATA box at bases -30 to -23. These promoters also share several other upstream transcription factor binding sequences with many URNA genes, which are transcribed by RNA Pol II. These observations suggest that common transcription factors can regulate both RNA Pol II and RNA Pol III genes. (3) Enhancers Promoters are not the only types of cis acting sequences. Transcription from many eukaryotic promoters can be stimulated by control elements that are located many thousands of base pairs away from the transcription start site. This was first observed in the genome of the DNA virus SV40. A sequence of around 100 bp from SV40 DNA can significantly increase transcription from a basal promoter even when it is placed far upstream or downstream. Such distal sequences are called enhancers. The enhancer elements thus constitute the distal part of the promoter and can be located either upstream or downstream of the transcription start site. Enhancers are common in eukaryotes and rare in prokaryotes (exception: present with σ 54 factor). Enhancers have the following general characteristics and functions: Enhancer sequences are short sequence elements. They are generally a few hundred base pair long ( bp) and contain multiple sequence elements, which contribute to the 35

36 total activity of the enhancer. They consist of sets of elements, similar to upstream promoter, but density of sequences is more i.e. these are more compactly organized as compared to upstream promoter. Like promoters, they are cis-acting regulatory elements. They are able to function over long distance of more than 1000 bp whether from an upstream or downstream position relative to start site. They are therefore also called longrange regulatory elements. In contrast, promoters are small range elements. They can modulate (activate) transcription of the cognate genes when placed in either orientation with respect to linked genes. They are active even when placed in reverse orientation. They thus contain bidirectional elements and are orientation-independent (Fig. 17). Upstream enhancer activates promoter 5 E P Transcription Downstream enhancer activates promoter 5 P E Transcription Fig. 17: Activation of transcription by enhancer is orientation and direction independent Interestingly, the positions of enhancers relative to promoters are not fixed and they can vary substantially. They can modulate (activate) transcription of the cognate genes even when moved away from its original location either upstream or downstream of the coding sequence. Thus, in natural genomes, enhancers can be located within genes also. They are thus position-independent. Enhancers contain the same sequence elements that are found at promoter. The density of sequence components is greater in the enhancer than in the promoter. They may be ubiquitous or tissue / cell type-specific. They may be active in only certain cells. Enhancers play key roles in regulating gene expression in a specific tissue or developmental stage. A given enhancer binds regulators at a given time and place. Alternative enhancers bind different groups of regulators and control expression of the same gene at different times and places in response to different signals. They exert strong activation of transcription of a linked gene from the correct start site. They exert preferential stimulation of the closest of two tandem promoters. These DNA sequences, although not promoter themselves, can enormously increase the effectiveness of promoters. 36

37 Enhancer sequences are targeted by a number of sequence-specific DNA binding proteins called gene specific transcription factors and activators. The assembly or clustered group of activators at enhancer region is called enhancons. It is believed that enhancers can regulate transcription of a specific gene from a distant location by bending or looping out of the intervening DNA sequence (interstitial DNA between promoter and enhancer regions) so that the transcription factors bound to it can directly interact with the RNA Pol II machinery bound at promoter and influences its action. Activation at a distance raises a problem. When an activator binds at an enhancer, there may be several genes within its range, yet a given enhancer typically regulates only one gene. Other regulatory sequences called insulators or boundary elements are found between enhancers and some promoters. Insulators block activation of the promoter by activators bound at the enhancer. These elements, although still poorly understood, ensure activators do not work indiscriminately. Elements analogous to enhancers in yeast are called Upstream Activator Sequences (UASs). It, however, works only upstream of the promoter and cannot function when located downstream. (4) Transcription factors RNA Pol II requires an array of other proteins for its activity, called transcription factors in order to form the active transcription complex. In contrast to somewhat smaller prokaryotic RNA Pol holoenzymes, eukaryotic RNA Pols do not independently bind their target DNAs. Rather they are recruited to their target promoters through the mediation of very large and complicated complexes of transcription factors and their ancillary proteins. Eukaryotic system requires two types of transcription factors: (A) General transcription factors (B) Gene Specific transcription factors (A) General Transcription factors (GTFs) These are set of proteins, which bind to RNA Pol II promoters and together initiate transcription. They are collectively known as general transcription factors. These multisubunit factors are named as transcription factors TFIIA, TFIIB and TFIIC etc (TF stands for transcription factor and II refers to RNA Pol II). The general transcription factors collectively perform the functions similar to that performed by σ in bacterial transcription. However, these factors do not show any significant sequence homology to σ factor. They have been shown to assemble on basal promoters in a specific order and they may be subject to multiple levels of regulation. They help polymerase to bind to the promoter. The binding of a transcription factor to its cognate DNA sequence enables the RNA Pol to locate the proper initiation site. Such highly complex assembly of RNA Pols and associated proteins is absent in prokaryotes. The binding of the TFs to the promoter leads to the melting of DNA (comparable to the transition from closed to open complex in bacteria). They also help polymerase escape from the promoter and embark on elongation phase. 37

38 The general transcription factors, TFIIs, required at every Pol II promoter are highly conserved in all eukaryotes. The properties of various GTFs required by RNA Pols are summarized in Table 6. Table 6: Properties of RNA Pol II (yeast) promoters associated general transcription factors S. Transcription Number of Subunit (s) Properties / Function(s) No. protein subunits M r (D) 1 TBP (TFIID) TBP (38 kd) is part of TFIID (700 kd); TFIID also contains TBP associated factors (TAFs); TBP has saddle like structure and its concave surface recognizes TATA box in the minor groove; TBP is regulated by TAF II 230 that binds to its concave surface thereby preventing the binding of TBP to DNA. 2 TFIIA , 19000, Stabilizes binding of TFIIB and enhances transcription; Allows binding of TBP (as TFIID) to the promoter; Prevents binding of DR1 and DR2 inhibitors to TFIID; Removes inhibition of TBP by TAF II TFIIB Binds to TBP; Interacts with upstream of TATA box in major groove (at BRE) and downstream of TATA box in minor groove and allows asymmetric assembly of complex thereby allowing unidirectional transcription; Recruits Pol II-TFIIF complex 4 TFIIE , Heterotetramer of two subunits; Recruits TFIIH; Has ATPase and helicase activities; Stimulates kinase activity of TFIIH 5 TFIIF ,74000 Binds tightly to Pol II; Binds to TFIIB and prevents binding to Pol II to nonspecific DNA sequences; Acts as elongation factor later 6 TFIIH TFIIJ Not characterized Not characterized Largest; Two subunits have ATPase activity; One subunit has protein kinase activity; Unwinds DNA at promoter (helicase activity); Phosphorylates Pol II (within the CTD); Recruits nucleotide excision repair proteins for DNA repair Required for transcription (at least in vitro); Probably plays role in promoter clearance and elongation 38

39 Many RNA Pol II promoters, which do not contain a TATA box, have an initiator element overlapping their start site. It seems that at these promoters, TBP is recruited to the promoter by a further DNA binding protein, which binds to the initiator element. TBP then recruits the other transcription factors and RNA Pol in manner similar to that, which occurs in TATA box promoters. Similarly, transcription factors, TFI and TFIII, are required to stimulate the transcription by RNA Pol I and III, respectively. (B) Gene specific transcription factors Although RNA Pol II and its associated factors (TFIIs) play a major role in initiation of transcription of various mrna encoding genes, the extent of their transcription is modulated by another set of transcription factors called gene specific transcription factors. The term gene specific transcription factors is used because the combination of such factors may actually direct the transcription of one gene as opposed to others. Gene specific transcription factors play a major role in tissue specific gene expression and eliciting certain responses such as immune response, apoptosis, cell differentiation etc. Gene specific transcription factors are characterized by a DNA binding domain, which recognizes specific cis-regulatory sequences located in the proximal and the distal regions of the promoter. Following binding to the cognate sequence, gene specific transcription factors mediate their effect on RNA Pol II through another domain called transactivation domain. Transactivation domain communicates with the Pol II machinery through a group of proteins called mediators or activators. Activators do not bind DNA directly but act as bridging molecule between Pol II and the gene specific transcription factors. Many eukaryotic gene specific transcription factors have been characterized till date and a few of them are listed in Table 7. Table 7: Gene specific transcription factors and their functions S. No. Name Species Function 1 MyoD Human, Mouse etc. 2 NF kappa B Human, Mouse etc. 3 Glucocorticoid Human, Mouse etc., receptors Skeletal muscle specific gene expression Immune response, cytokine gene expression Activation of glucocorticoid, responsive genes One of the characteristics of the gene specific transcription factors is that they possess distinct structural motifs essential for DNA recognition and transactivation function. They are often classified on the basis of such structural features such as homeodomain, helix turn helix, helix loop helix and Zn finger. Quite often two gene specific transcription factors belonging to the same structural family dimerize and bind to the target sequence in a bipartite manner. One such eg. is the transcription factor AP-1 which is a dimer of Jun (39 kd) and Fos (65 kd) proteins. 39

40 They belong to the leucine zipper family and target the sequence TGACTCA. Gene specific transcription factors are often targeted by various signal-transducing kinases such as MAP kinase, which phosphorylates them to induce their activities. Some gene specific transcription factors are also localized in the cytoplasm in an inactive form and upon activation are translocated to the nucleus for activity. For eg. transcription factor NK kappa B remains bound to an inhibitory protein called I kappa B which retains it in the cytoplasm. Upon receiving appropriate signal, I kappa B is ubiquintylated and degraded, resulting in the release of NK kappa B that is then translocated to the nucleus. (5) Elongation factors During elongation, the activity of the Pol II is greatly enhanced by proteins called elongation factors. The transition from initiation to elongation phase involves the shedding of most of the initiation factors and mediator. In their place another set of factors is recruited. This exchange of initiation factors for those factors required for elongation and RNA processing involves phosphorylation of the CTD of RNA Pol II. Properties of various elongation factors are summarized in Table 8. (6) Overall process of eukaryotic transcription (A) Synthesis of precursor of mrna by RNA Pol II The process of transcription by Pol II can be described in terms of several phases - assembly and initiation, elongation and termination - each associated with characteristic proteins (Fig. 18). (i) Assembly and Initiation: The eukaryotic transcription involves the assembly of RNA Pol II and transcription factors at a promoter. The step-by-step pathway described below leads to active transcription in vitro. In the cell, many of the proteins may be present in larger, preassembled complexes, simplifying the pathways for assembly on promoters. Two major points of differences in the initiation phase of transcription in prokaryotes and eukaryotes are: melting requires ATP hydrolysis and secondly promoter escape occurs after phosphorylation of polymerase. The formation of preinitiation complex or basal transcription apparatus thus involves following steps: Binding of TBP: In the first step, TBP, a component of TFIID transcription factor, binds TATA box 10 5 times as tightly to the TATA box as to noncognate sequences. Both DPE and initiator sequences are also targeted by TFIID. TBP bound to TATA box is the center point of the initiation complex. This binding induces large conformational changes in the bound DNA. When TBP binds to TATA box, it distorts the DNA using a β-sheet inserted into the minor groove. This distortion generates a binding site for TFIIB, which in turn provides a platform for the recruitment of the Pol II and TFIIF. This complex is distinctly asymmetric. The asymmetry is crucial for specifying a unique start site and ensuring that transcription proceeds unidirectionally. 40

41 S. No. Table 8: Elongation factors involved in eukaryotic transcription Transcription protein No. of subunits Subunit (s) M r Properties / Function(s) Elongation factors required for elongation stage 1 Elongin (S III) , 18000, Involved in elongation; Enhances the elongation rate (2000 nucleotides per minute); Suppresses the pausing of RNA Pol II 2 ELL Name derived from Eleven-nineteen lysine rich leukemia; The gene for ELL is the site of chromosomal recombination events frequently associated with acute myeloid leukemia; Involved in elongation; Enhances the elongation rate (2000 nucleotides per minute); Suppresses the pausing of RNA Pol II 3 TFIIS (S II) Involved in elongation; Reduces the length of time for which the Pol II pauses at sequences that could slower its progress and hence Pol II does not transcribe all regions at constant rate; Stimulates proof reading activity of RNA Pol II 4 p-tefb , TFIIF 4 (2 each type) 30000, Positive Transcription Elongation Factor b; Involved in elongation; Phosphorylates Pol II (within the CTD) at Ser 2; Contains CDK9 protein kinase which also helps in phosphorylation; Recruits elongation factor TAT-SF 1 ; Phosphorylates and activates elongation factor hspt5; Also involved in RNA processing; Recruits capping enzyme and splicing machinery Binds tightly to Pol II Elongation factors used in processing 6 hspt5 - - Involved in RNA processing; Recruits and stimulates 5 capping enzyme 7 TAT-SF Involved in RNA processing; Recruits the components of the splicing machinery 41

42 Termination Promoter recognition, binding, melting & clearance p y y Promoter TATA box +1 TBP DNA TFIID TFIIA TFIIB TFIIF TFIIE TFIIH RNA synthesis P P P P P P RNA Pol II with CTD tail RNA Pol II movement Elongation Initiation DNA RNA + + RNA Pol II with CTD tail Fig. 18: Transcription by eukaryotic RNA Pol II 42

43 Binding of TFIIA: In the next step, TFIIA binds directly to TBP and stabilizes its interaction with DNA and thereby enhances transcription. TFIIA binding, although not always essential, can be important at non-consensus promoters where TBP binding is relatively weak. Binding of TFIIB: The formation of a closed complex begins when the TBP binds to the factor TFIIB, which also binds to DNA on either side of TBP. Recruitment of TFIIF-Pol II: The TFIIB-TBP complex is next bound by another complex consisting of TFIIF and Pol II. TFIIF helps target Pol II to its promoters, both by interacting with TFIIB and by reducing the binding of the polymerase to nonspecific sites on the DNA. Binding of TFIIE and TFIIH: Following recruitment of Pol II-TFIIF, two more transcription factors viz TFIIE and TFIIH are recruited to complete the assembly of the closed preinitiation complex. They bind upstream of Pol II. TFIIH is a complex factor having multiple enzymatic activities including ATPase, helicase, kinase and DNA repair activities. The DNA helicase activity of TFIIH promotes the unwinding of DNA near the RNA start site (i.e. Inr), thereby creating an open complex. This process requires the hydrolysis of ATP. The helicase activity is required for unwinding the DNA and the DNA repair activity presumably couples transcription with DNA repair to avoid transcription of any faulty gene. TFIIH has an additional function during the initiation phase. A kinase activity in one of its subunits phosphorylates Pol II at many places in the CTD. Several other protein kinases, including CDK9, which is part of the complex p-tefb, also phosphorylate the CTD. In the preinitiation complex, TFIIE stimulates the kinase activity of TFIIH resulting in the hyperphosphorylation of the carboxyl terminal domain (CTD) of Pol II. Sometimes in the formation of this complex, the carboxyl terminal domain of the polymerase is phosphorylated on the serine and threonine residues and then the Pol II escapes the promoter to begin transcription. The importance of the CTD is highlighted by the finding that yeast cell containing mutant Pol II with fewer than 10 repeats is not viable. Phosphorylation of CTD causes a conformational change in the overall complex that weakens the interaction of Pol II with TBP, thereby aiding in initiation of transcription. Most of the factors are released before the Pol leaves the promoter and can than participate in another round of initiation. Requirement of additional proteins including mediator complex, nucleosome modifiers and remodellers: One reason for the additional requirements of mediator complex, nucleosome modifiers and remodellers is that the DNA template in vivo is packaged into nucleosomes and chromatin. This condition complicates binding of polymerase and its associated factors to the promoter. Transcription regulatory proteins called activators help recruit polymerase to the promoter, stabilizing its binding there. This recruitment is mediated through interactions between DNA bound activators and parts of the transcription machinery. Often the interaction is with the CTD tail of the large polymerase subunit through one surface, while presenting other surfaces for interaction with DNA-bound activators. This explains the need for mediator to achieve significant transcription in vivo. Despite this central role in transcriptional activation, deletion of individual subunits of mediator often leads to loss of expression of only a small subset of genes, different for each subunit (it is made up of many subunits). This result likely reflects the fact that different activators are believed to interact with different mediator subunits to bring polymerase 43

44 to different genes. In addition, mediator aids initiation by regulating the CTD kinase in TFIIH. The need of nucleosome modifiers and remodellers also differs at different promoters or even at the same promoter under different circumstances. When and where required, these complexes are also recruited by the DNA-bound activators. Nucleosome modifying enzymes include histone acetyltransferase, histone deacetylase and histone methylase. Promoter melting, abortive initiation, synthesis of nascent RNA, phosphorylation of CTD and promoter clearance or escape: After promoter melting, synthesis of nascent RNA is initiated. Just as in bacterial case, there occurs a period of abortive initiations before the Pol II escapes the promoter and enters the elongation phase. During abortive initiation, the Pol II synthesizes a series of short transcripts. As Pol II continues the elongation, TFIIB, TFIIF and TFIIH are also released from the promoter by a so-called promoter clearance. TFIIF, however, remains associated with Pol II and helps the elongation by suppressing pausing. In contrast to the situation in bacteria, promoter melting in eukaryotes also requires hydrolysis of ATP and is mediated by TFIIH. In contrast to bacteria, promoter escape in eukaryotes also involves phosphorylation of polymerase. The form of Pol II recruited to the promoter initially contains a largely unphosphorylated tail, but the species found in the elongation complex bears multiple phosphoryl groups on its tail. Addition of these phosphate groups help polymerase shed most of the general transcription factors used for initiation and which the enzyme leaves behind as it escapes the promoter. Indeed, in addition to TFIIH, a number of other kinases (eg. p-tefb) have been identified that act on CTD as well as a phosphatases that removes the phosphates added by those kinases. Regulating the phosphorylation state of the CTD of Pol II controls late steps, those involving processing of the RNA as well. (ii) Elongation: Once RNA Pol has initiated transcription, it shifts into the elongation phase. This transition involves the Pol II enzyme shedding most of its initiation factors, for eg. general transcription factors and mediator. During synthesis of the initial nucleotides of RNA, TFIIE is released. Subsequently, TFIIH is released. However, TFIIF remains associated with Pol II throughout elongation. Pol II then enters the elongation phase of transcription. In the place of transcription factors and mediator, another set of factors is recruited. This new set of factors stimulates Pol II elongation and RNA proof reading. These proteins that greatly enhance the activity of the Pol II are called elongation factors. Examples include TFIIS, ptefb, hspt5, Elongin and ELL. The elongation factors suppress pausing or arrest of transcription by the Pol II- TFIIF complex and also coordinate interactions between protein complexes involved in posttranscriptional processing of mrnas. The enzymes involved in all these processes are, like several of the initiation factors, recruited to the C-terminal tail of large subunit of Pol II, the CTD. In this case, however, the factors favor the phosphorylated form of the CTD. Thus, phosphorylation of the CTD leads to an exchange of initiation factors for those factors required for elongation and RNA processing. As is evident from the crystal structure of yeast Pol II, the polymerase CTD lies directly adjacent to the channel through which the newly synthesized RNA exits the enzyme. This, together with its length (it can extend some 800 Å from the body of enzyme) allows the tail to bind several components of the elongation and processing machinery and to deliver them to the emerging RNA. Some other elongation factors are required for RNA processing. 44

45 (iii) Termination and release: Once the RNA transcript is completed, transcription is terminated. The enzyme RNA Pol II does not terminate immediately. Rather, it continues to move along the template, generating a second RNA molecule that can become as long as several hundred nucleotides before terminating i.e. termination of mrna synthesis is combined with polyadenylation (hence the details of termination step are described after polyadenylation). Pol II is dephosphorylated, dissociated from the template, recycled and is then ready to initiate another transcript. In the process, new RNA is released, which may be degraded without ever leaving the nucleus. (B) Synthesis of precursors of rrna by RNA Pol I The pre-rrna transcription units contain three sequences that encode the 5.8S, 18S and 28S rrnas (Fig. 19). Pre-rRNA transcription units are arranged in clusters in the genome as long tandem arrays separated by non-transcribed spacer sequences. RNA Pol I in nucleolus synthesizes pre-rrna. The arrays of rrna genes loop together to form the nucleolus and are known as nucleolar organizer regions. 18S rrna 5.8S rrna 28S rrna 5S rrna subunit transcribed separately Fig. 19: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt) The synthesis of rrna (5.8S, 18S and 28S) involves transcription factors and complexes, for eg. Upstream binding factor (UBF) and eukaryotic transcription complex called Selectivity factor (SL-1) (similar complex in different species are called TIF-IB, Rib1). UBF is a specific DNA binding protein, which binds to UCE. It greatly stimulates the transcription rate. In its absence, a low rate of basal transcription is seen. SL-1 contains four subunits: one TBP (TATA binding protein) and three TAFIs (TBP associated factors for RNA Pol I). The process of transcription of rrna (5.8S, 18S and 28S) is outlined below and depicted in Fig. 20. UBF binding: UBF binds to the sequence in the upstream part of core element, called upstream control element (UCE) of RNA Pol I promoter. Other UBF also binds to the upstream region of the core element (core promoter). The sequences in the two UBF binding sites have no obvious similarity. One molecule of the UBF is thought to bind to each sequence element. UBF-UBF binds by protein-protein interaction causing intervening DNA to form loop between the two binding sites. (Some are of the view that a single UBF binds to two different sites, viz UCE and the upstream part of the core element). 45

46 +1 UCE Core UBF UBF UBF TAF Is TAF Is TBP TAF Is SL1 UBF SL1 UBF RNA Pol I UBF SL1 UBF Fig. 20: rrna transcription initiation Selectivity factor binding: Selectivity factor (SL-1) binds to and stabilizes the UBF-DNA 46

47 B'' B'' B'' complex. It interacts with the free downstream part of the core element. Binding of UBF increases transcription initiation activity by SL-1. Acanthamoeba has a simple transcription control system. This has a single control element and a single factor TIF-1, which are required for RNA Pol I binding and initiation at the rrna promoter. RNA Pol I binding: SL-1 binding allows RNA Pol I to bind the complex and initiate transcription and is essential for rrna transcription. (C) Synthesis of precursors of trna and 5S rrna by RNA Pol III (i) trna: The promoter of trna genes has two consensus sequences downstream of transcription start site, namely Box A and Box B, as described in earlier section. Two complex DNA binding factors have been identified which are required for transcription initiation by RNA Pol III. These are transcription factors TFIIIC and TFIIIB (Fig. 20). TFIIIC is large protein complex having six subunits and has a size of >500 kd. It is the assembly factor for positioning TFIIIB at right location. TFIIIB is the true initiation factor for Pol III. It has three subunits TBP, B and BRF (TFIIB related factor; it has homology to TFIIB-the RNA Pol II initiation factor). B is comparable to sigma factor of prokaryotes and functions to initiate transcription bubble. TFIIIB has no sequence specificity and therefore its binding site appears to be determined by the position of the TFIIIC binding to DNA. Once TFIIIB has bound, TFIIIC can be removed without affecting transcription. TFIIIC is therefore an assembly factor for the positioning of the initiation factor TFIIIB. The process of transcription, involving following steps is outlined in Fig. 21. Transcription initiation at eukaryotic trna promoter: +1 A Box B Box TFIIIC A Box B Box TFIIIC TBP BRF TFIIIB TBP BRF A Box TFIIIC B Box RNA Pol III RNA Pol III TBP BRF A Box TFIIIC B Box Fig. 21: Transcription initiation by eukaryotic trna promoter 47

48 TFIIIC binding: TFIIIC binds to both Box A and Box B of the trna promoter. TFIIIB binding: TFIIIB binds TFIIIC-DNA complex and interacts with DNA upstream from TFIIIC binding site (TFIIIB binds 50 bp upstream from A box). RNA Pol III binding: TFIIIB helps in recruitment of RNA Pol III. The enzyme RNA Pol III then initiates transcription, presumably displacing TFIIIC from DNA template as it goes. Termination of transcription occurs without accessory factors. A cluster of da residues is often sufficient for termination and the termination efficiency depends on surrounding sequence. An example of an efficient termination signal in somatic 5S rrna genes of Xenopus borealis is 5 - GCAAAAGC-3. (ii) 5S rrna: The promoter of trna genes has two consensus sequences downstream transcription start site, namely Box A and Box C, as described in earlier section. The process of transcription of 5S rrna genes involves the transcription factors TFIIIA, TFIIIC and TFIIIB. TFIIIA is assembly factor for positioning TFIIIB at right location. TFIIIB is true initiation factor for Pol III. TFIIIB has no sequence specificity and therefore its binding site appears to be determined by the position of the TFIIIC binding to DNA. The process of transcription involves following steps. TFIIIA binding: TFIIIA binds strongly to Box C promoter sequence. TFIIIC binding: TFIIIC then binds to TFIIIA-DNA complex interacting also with Box A sequence. TFIIIB binding: Once TFIIIC has bound, TFIIIB can interact with the complex. RNA Pol III binding: TFIIIB then recruits RNA Pol III to initiate transcription. Fig. 22 depicts the schematic representation of the process of transcription of 5S rrna. Post transcriptional RNA processing Transcription products of all three eukaryotic RNA polymerases undergo various alterations to yield the mature product. RNA processing is the collective term used to describe these alterations to the primary transcript. The various post transcriptional processing occurring to RNA are summarized in Table 9. (A) Post transcriptional processing of mrna In prokaryotes, there is little or no processing of prokaryotic mrna after synthesis by RNA Pol. Indeed many mrna molecules are translated while they are being transcribed, i.e. before being completely synthesized. Prokaryotic mrna is degraded rapidly from the 5 end and the first cistron (protein-coding region) can therefore only be translated for a limited amount of time. In eukaryotes, RNA Pol II synthesizes mrna as longer precursors (pre-mrna), the population of different pre-mrnas being called heterogeneous nuclear RNA (hnrna). Once transcribed, eukaryotic precursor mrna has to be processed in various ways before being exported from the nucleus where it can be translated (Table 7). 48

49 B'' B'' B'' Transcription initiation at eukaryotic 5S rrna promoter: +1 A Box C Box TFIIIA +1 A Box C Box TFIIIA TFIIIC A Box TFIIIC B Box TFIIIA TBP BRF TFIIIB TBP BRF A Box TFIIIC B Box TFIIIA RNA Pol III RNA Pol III TBP BRF A Box B Box TFIIIC TFIIIA Fig. 22: Transcription initiation at eukaryotic 5S rrna promoter Table 9: Various post transcriptional processing occurring to RNA 1. End modification It occurs during the synthesis of eukaryotic and archael mrnas. This involves addition of nucleotides to the 5 or 3 ends of the primary transcripts or their cleavage products. Such events do not occur in case of prokaryotes. These include: (i) Capping of 5 end of mrna (ii) Polyadenylation of 3 end of mrna 2. Splicing It is the removal of introns (non-coding sequences in the genes) from the precursor RNAs (i.e. eukaryotic mrnas, and some eukaryotic rrnas and trnas). It leads to physical change in the length of the transcript. 3. Cutting events These involve cutting of primary transcripts (or removal of nucleotides) of rrna and trna with endonucleases or exonuclease to produce mature transcripts in both prokaryotes and eukaryotes. It leads to physical change in the length of the transcript. 4. Chemical modifications These modifications are made within the rrnas, trnas and mrnas. The rrnas and trnas of all organisms are modified by addition of new chemical groups. These groups are added on either the base or the sugar moiety of specific nucleotides in RNAs. It occurs to a much lesser extent with pre-mrna in eukaryotes. Equivalent events in archaea are poorly understood. Chemical modification of mrna called RNA editing is seen in a diverse group of eukaryotes. 49