The gene. Fig. 1. The general structure of gene

Size: px
Start display at page:

Download "The gene. Fig. 1. The general structure of gene"

Transcription

1 The gene is the basic unit of heredity and carries the genetic information for a given protein and/or RNA molecule. In biochemical terms a gene represents a fragment of deoxyribonucleic acid (DNA), which in turn forms part of a much larger genetic unit, the chromosome. Even the simplest unicellular organisms (e.g. bacteria) require several thousand different genes and their respective gene products, while complex multicellular organisms such as ourselves may require as many as different genes. The human genome is estimated to contain to genes. The term gene was first used in 1911 by Danish geneticist Wilhelm Johannsen as a convenient term for Mendel's particulate factors. A realization that genes were carried on chromosomes began to emerge at about the same time, principally from the studies of Thomas Hunt Morgan. In 1910 he provided the first definitive evidence for the so-called Chromosomal Theory of Inheritance, a theory first proposed in 1902 by Walter Sutton and Theodor Boveri. By the early 1930s it had become the accepted dogma that genes carried genetic information that determined a specific Phenotype, and that such geneswere carried as linear arrays on chromosomes, which in turn were located in the nucleus. It was not until 1944 that the chemical nature of genes was confirmed. The three-dimensional structure of DNA and how that related to its role as the genetic material was established in 1953 by Francis Crick and James Watson in collaboration with Maurice Wilkins and Rosalind Franklin. Not all genes are made of DNA; some animal and plant viruses and some bacteriophages have genetic systems based on RNA rather than DNA. Genes are the blueprints for all RNA and protein molecules found in the cell. Some genes encode RNA molecules as the final gene product (genes for ribosomal RNA, transfer RNA, and other small RNAs) whereas others encode polypeptide chains, which are synthesized by way of the intermediate messenger RNA. Studies by Archibald Garrod at the beginning of this century first began to suggest that genes encode the information for specific polypeptides, but it was not until the work of George Beadle and E.L. Tatum in the 1940s that the One gene One enzyme hypothesis was experimentally established. The hypothesis was later changed as One gene One polypeptide chain. Fig. 1. The general structure of gene The genes consist of transcribed region and regulatory regions. The regulatory regions are represented by promoter and terminator (Fig. 1). A sequence of DNA that is transcribed from a single promoter is called transcription unit. In eukaryotes, transcription units are usually single genes (monocistronic); in bacteria and other prokaryotes, several genes may be grouped together to form a single (polycistronic) transcription unit under the control of a promoter an operon. The transcribed region consists of coding region, which contains information about a macromolecule (polypeptide, RNA) and non-coding regions at the ends. The promoter region is a sequence at 5 -end that is required to promote the enzyme RNA polymerase to bind to the gene at the correct position with respect to the region that needs to be transcribed that is the transcription unit. The nucleotides from promoter are marked with (-). The first nucleotide of the transcribed region is +1. The prokaryotic promoter is relatively simple to define in physical terms whereas a typical eukaryotic promoter is generally much

2 larger; with DNA sequences positioned many thousands of base pairs away still being able to have profound effects on the rate at which a gene may be transcribed. The terminator region serves to terminate the migration of the RNA polymerase molecule once it has traversed the gene's transcriptional unit. EUKARYOTIC GENES Classification of nuclear genes in eukaryotes In eukaryotic cells the genes are classified in tree classes: - I -st class genes, which encode for rrnas 5,8S, 18S, 28S. They are transcribed by RNA-polymerase I. - II -nd class genes, which encode for mrnas and snrnas. They are transcribed by RNA-polymerase II. - III -rd class genes, which encode for trnas and rrna 5S. They are transcribed by RNA-polymerase III. In nuclear genome can exist a set of genes, derived by duplication from some ancestral gene gene family. Its members may be clustered together or dispersed on different chromosomes or combination of both. The members of repeatable gene family usually have related or even identical functions (repetitive gene family e.g. genes for rrna or histone proteins). The families of unrepeatable genes are copies of one of the genes with the same effect, but they have different structure and functions (genes for globins, HLA etc.). Pseudogenes there are sequences resulted from the duplication, but that are inactive in all tissues. They accumulate mutations and after a period of time may result as a new member of the family, with distinctive peculiarities. Peculiarities of organization of the structural (II -nd class) genes in eukaryotic cells In a eukaryotic structural gene promoter consist of hundred of nucleotides placed at the 5 end of the gene. This region contains a combination of consensus sequence that contains sites recognized by transcription factors. The promoter of the structural genes recognized by RNApolymerize II, contains at the distance of nucleotides from the initiation sites (+1) the Goldberg Hogness box or TATA box. This sequence directs the RNA polymerize II to the sites of initiation of transcription. In position 75 there is the CAAT-box, which controls the bounding of specific proteins during initiation of transcription. At the 90 nucleotide there is GC -75 CAAT GC -20 TATA Fig. 2. Structure of the II -nd class promoter START another sequence GC-box, which may be in some copies and assure the correct movement of RNApolymerase. There are other different consensus sequences that bond with specific regulatory proteins during activation of gene. These are necessary for a differential expression of the genes in different tissues and different periods of cell cycle Coding region. It was proved that in eukaryotic cells the gene structure is discontinuous (Fig. 3). The transcribed regions of genes in eukaryotic cells contain encoding regions called exons and non-coding region introns. The number of exons and introns is various and depends on the complexity of the encoding protein: The -globin gene contains three exons and two introns. The gene for procollagen contains 51 exons and 50 introns The gene for haemophilia A contains 26 exons and 25 introns (the DNA sequence consist of bp and the molecule of mrna has a length of 9000 bp and consist only 0.2%) The gene that encodes dystrophin contains more than 76 exons (25 million bp in gene, but only 14000bp in mrna). 2

3 Leading sequence The gene Transcribed region Promoter Coding region Terminator Point of initiation of transcription Site of initiation of translation Site of termination of translation Site of polyadenilation Fig. 3. Structure of II -nd class gene Exons Exons are coding sequences of gene; they are present in the RNA precursors, in the mrna and can be found in the aminoacid sequence of the protein. At the 5 end the first exon contains the universal site of initiation of translation (ATG). At the 3 end of the last exon is placed one of the three codons (TAA, TAG, TGA) that determine the termination of the protein synthesis and a short un-translated sequence. The number of exons differs in different genes. Introns Introns represent non-coding sequence of genes, are present in precursor RNA but they are not present in mrna. During the splicing the introns are removed from the precursor RNA. Usually introns are more longer that exons. Each intron has at the 5 end of the GT sequence and at the 3 end AG sequence. Special enzymes recognize these sequences during the process of removing of non-coding regions. Peculiarities of organization of genes coding for rrna and trna Genes for RNA (585; 185; 285) are organized in polysistronic transcription units and form the nucleolus organizer (located in chromosomes 13, 14, 15, 21, 22 in many copies). Transcription units are separated by non-transcribed sequences called spacers. In the human, the transcription unit for I -st class genes didn t contain introns and have ~ bp length (Fig.4). The promoters are recognized by RNA polymerize I. Promoters have a bipartite structure: a basic sequence situated at that controls the initiation of transcription and an additional control element at UCE (upstream control element). Spacer Transcription unit Repetitive unit Fig. 4. Structure of eukaryotic ribosomal genes 3

4 Genes for rrna 5S are located outside of the nucleus and are transcribed by RNApolymerize III. This genes form transcription unit in which the gene for rrna 5S repeats.the promoter is localized inside the genes at +55, -80. Genes for trna are organized in transcription units too, in which coding sequences are separated through non-coding sequences. Genes for trna is transcried by RNA-polymerize III and the promoter has a similar structure with the rrna 5S genes. The mitochondrial Genome The mitochondrial genome is represented by a circular 16.6 kb long molecule of DNA. Every mitochondrion contains a different number of molecules depending on cell energy necessity. Almost of the nucleotides are encoding sequences; regulatory fragments are very short. The mitochondrial DNA of the human cells do not contain introns. All the genes form two transcriptional units: one on the heavy strand with the HSP promoter and another - on the light strand the LSP. The mitochondrial DNA contains 13 structural genes, 2 genes for rrna (12S and 16S) and 22 genes for trna (Fig. 5). The mitochondrial genes are inherited through the maternal line. The nuclear genome cooperates with the Fig. 5. Structure of mitochondrial genome mitochondrial genome: about 90 nuclear genes encode the proteins that have the role of supporting the mitochondria ribosomal proteins, aminoacyl-trna-synthetase, DNA and RNA polymerases. Some proteins encoded by the nuclear genes intervene in the control of the number of mitochondrion per cell and the quantity of the proteins, which were synthesized by these organelles. THE PECULIARITIES OF PROKARYOTES GENES ORGANIZATION Promoter Gene 1 Gene 2 Gene 3 Gene 4 Terminator Polycistronic RNA Protein 1 Protein 2 Protein 3 Fig. 6. Structure of the operon Protein 4 The genetic apparatus of the prokaryotic cells is represented by circular molecules of DNA (nucleotide, and plasmids). The prokaryotic genome contains only few non-coding fragments. The genes are characterized by a simple structure and they don t have any introns. A lot of genes implicated in a metabolic chain form polycistronic transcriptional units called 4

5 operons. The operon contains a single promoter at the 5 end, some structural genes (the number of structural genes is equal to the number of proteins implicated in a particular metabolic process), and the terminator at the 3 end (Fig. 6). Some operons can contain other control fragments (operator regions, attenuation regions etc.). The promoter of the prokaryotic genes contains specific fragments characterized by the Pribnow box in the 10 position, responsible for the initiation of the local unwinding of the DNA and the TTGACA box in the 35, position, in which takes place the association of the RNA-polymerase (Fig. 7). Fig. 7. Structure of promoter of a prokaryotic gene The other gene classes (for rrna, trna) are organized in mixed transcriptional units, separated by the spacers. Fig. 8 represents the structure of such cluster of genes. P1 P2 15 S trna 23 S 5 S trna Primary transcript 15 S trna 23 S 5 S Fig. 8. Structure of prokaryotic genes for trnas and rrnas trna 5