Chapter I. Review of Literature and Introduction

Size: px
Start display at page:

Download "Chapter I. Review of Literature and Introduction"

Transcription

1 Chapter I Review of Literature and Introduction

2 1.1 GENERAL INTRODUCTION Complexity of life forms arises from the genomic content and as a consequence of various interactions that occur between the basic components, namely, DNA, RNA and proteins. Several approaches have been used to derive the relationship between genome sequence, organism complexity and species diversity (Stellwag, E. 2004). Analyses of the genome size, termed as the C-value estimation proposed that genome sizes of related species are similar (Gregory, T. R. 2001), which had to be refuted after observing closely related species of Drosophila and fishes having large differences in genomic content. This led to analysis of genes in the genome as a means to study species divergence, termed as the G-value estimation (Hahn, M. W. et al. 2002). Progress in genome sequencing shed light on the presence of large tracts of repeats, transposable elements and non-coding regions in higher eukaryotes which render such approaches insufficient for such comparisons (Taft, R. J. et al. 2007). Study of gene duplication events, sequence similarity and expression patterns of the genome are few of the more recent approaches undertaken (Wagner, A. 2000). The biggest surprise of genome scale study is the discovery of thousands of non-coding transcripts in the human transcriptome. The most recent developments in the area of gene regulation the increasingly expanding role of RNA have been considered to be very important that it could turn the trajectories of our understanding of basic biological processes like growth and development (Mattick, J. S. 2004). Besides the structural role of non-protein-coding RNA (ncrna) like trna and rrna, several other long and small RNAs have been found to directly and indirectly influence gene regulation (Figure 1.1). The role of ncrna in regulating monoallelic expression like X-chromosome inactivation, genomic imprinting, allelic exclusion is known (Yang, P. K. et al. 2007). Long ncrna are also known to affect genome-wide rearrangements during ciliate differentiation (Nowacki, M. et al. 2008) and T-cell receptor recombination (Bassing, C. H. et al. 2002). The second category of ncrnas is the small non-protein-coding RNA. The major groups of small ncrnas include small nucleolar RNA (snorna), small nuclear RNA (snrna), small interfering RNA (sirna), microrna (mirna) and piwi interacting RNA (pirna). Other variants of these small RNAs are also identified to occur in association with specialized genomic contexts like repeats (repeat associated sirna or rasirna, trans-acting sirna or tasirna, small Cajal-body specific RNA (or scarna). The table 1.1 describes the major roles played by these small ncrnas. 1

3 Figure 1.1: Summary of the components of the RNA world Table 1.1: Major classes of small non-coding RNA and their biological role Type of small RNA Small nucleolar RNA Small nuclear RNA Small interfering RNA Abbreviation Length Major Functions Reference snorna snrna sirna 35nt 100nt 20-25nt microrna mirna 17-25nt Piwi interacting RNA Repeat associated sirna Trans-acting sirna Small Cajal-body specific RNA pirna rasirna tasirna scarna 26-30nt 24-29nt 21-22nt 35nt Guides chemical modification in rrna, trna etc Splicing, transcription regulation Post-transcriptional mrna degradation Post-transcriptional mrna degradation / silencing Silencing of retrotransposons in germline Heterochromatin packaging silencing genes Post-transcriptional silencing, in plants Guides chemical modification of spliceosomal RNAs (Bachellerie, J. P. et al. 2002) (Wolff, T. et al. 1994) (Huttenhofer, A. et al. 2002) (Lai, E. C. 2003) (Girard, A. et al. 2006) (Klenov, M. S. et al. 2007) (Yoshikawa, M. et al. 2005) (Stanek, D. et al. 2006) 2

4 In very general terms, small RNAs have been involved in mediating gene regulatory networks by means of forming complementary bonds with mature transcripts. Although the mode of binding and the effects they produce vary greatly among the various classes mentioned above, it is noteworthy that their combined regulatory potential is very significant. The two basic modes of gene regulation is 4. transcriptional regulation 5. post-transcriptional silencing (or RNA interference) Transcriptional regulation by small RNA influences splicing, heterochromatin packaging, RNA editing etc. and could also result in activation of transcription (Li, L. C. et al. 2006b) while post-transcriptional silencing reduces the translational efficiency of targeted transcripts or degrades the target transcript. Apart from the basic functional consequences among them, the duration of the regulation is also different. Effect of RNA activation is seen only after a about 48 hours after transfection of the small RNA while the effect of RNA interference (RNAi) is seen much faster (within hours). RNA activation occurrence has been proposed when small doublestranded RNA (dsrna) bind to the upstream regions of gene thereby altering the methylation pattern leading to increased transcriptional activity (Li, L. C. et al. 2006a). RNA interference is a very well-studied phenomenon prevalent in various organisms ranging from viruses to members of fungal, animal and plant kingdoms. In fungi, RNA interference was generally known as quelling (Fulci, V. et al. 2007) while in animals, it was known as post-transcriptional gene silencing (PTGS). The binding of small RNAs along with a protein complex to mature transcripts in the cytoplasm, thereby inhibiting translation either by degrading the targeted transcript or by other means is called RNAi. This mechanism of gene regulation was first observed in early 1990s when flower discoloring due to addition of colorproducing chalcone synthase gene in petunia, instead of the expected darkening of flower color (Napoli, C. et al. 1990). In 1992, reversal of albino strains of Neurospora crassa was observed by addition of exogenous RNA which inhibit the al-1 or al-3 genes, which was termed quelling (Romano, N. et al. 1992). In viruses, this phenomenon was observed when exogenous RNA was introduced to virus-infected plant. The major breakthrough was in 1998 when it was demonstrated that potent RNA mediated silencing of genes can be brought about by RNA interference by introducing double stranded RNA (Fire, A. et al. 1998). 3

5 1.2 microrna AS KEY REGULATORY MOLECULES microrna (mirna) are a relatively recently identified class of regulatory molecules. The discovery of mirna was a result of genetic experiments in the Horvitz lab on cell lineages of Caenorhabditis elegans in which heterochronic genes which regulate developmental timing were being studied. Several lin (lineage) mutants were obtained among which the first to be studied was the lin-4. The first larval stage L1 reiterates in such mutants even at later stages of development. Another mutant lin-14 has the converse effect, of totally avoiding the L1 stage (Ruvkun, G. et al. 2004). It was also noticed that the lin-4 mutant had inappropriate amounts of lin-14 and also that mutations in the 3' UTR of lin-14 reduces the effect (Ruvkun, G. et al. 1989). It was thus identified that lin-4 acted through the 3'UTR. Further experiments identified the precise localisation of the lin-4 and found it to be a non-protein-coding region that is transcribed. Its effect on the lin-28 was also established in follow-up studies (Arasu, P. et al. 1991). Such studies diversified to report cases in several genes and ncrnas like the let-7 which was found to be conserved across various bilaterian species including flies and humans. Small RNA cloning in human revealed large numbers of nt long sequences led to identification of such RNA which were sometimes obtained as dsrna which were called mirna-mirna* while the single stranded molecules were called mirna (earlier named as short-temporal RNA or strna) (Kawaji, H. et al. 2008). It was then noted that these molecules arise from stem-loop structures which are mostly conserved across several species as seen in plants (Zeng, Y. et al. 2005). Based on these rules, several computational approaches were developed which could identify possible mirna forming regions (Huang, T. H. et al. 2007), (Hertel, J. et al. 2006), (Artzi, S. et al. 2008). This led to identification and discovery of several mirna in many species. The phenotypic output of the different mirna activity although not shared, the biogenesis and mode of interaction with target transcripts are shared among all or most of the several mirnas (Faller, M. et al. 2008). These are small ncrnas which range from nt in length. They arise from large segments of non-protein coding transcripts called the primary mirna (pri-mirna) which range from ,000 nt in length. The pri-mirnas are transcribed mostly by RNA polymerase II, while some of them (~50 human mirnas) are also known to be transcribed by RNA polymerase III (Zhou, H. et al. 2008). They are processed to shorter hairpin-forming molecules, ~ nt long, by an RNAse III enzyme Drosha in association with DiGeorge syndrome critical region gene 8 (DGCR8) or Pasha (partner for 4

6 Drosha in case of Drosophila). This complex is called the microprocessor complex (Gregory, R. I. et al. 2004). This hairpin structure is known are precursor mirna or pre-mirna. The premirna are transported actively to the cytoplasm by the action of the nuclear export receptor Exportin-5 (Ran binding protein 21)-GTP system. In the cytoplasm, another RNAse III enzyme Dicer cleaves the loop and trailing parts of the hairpin structure. The stem region is then incorporated into a protein complex to form a Ribonucleotide protein complex (RNP) also called mirnp. In this complex several proteins are present, few of which still not characterized. The protein complex is constituted by the Dicer enzyme, the Argonaute (AGO) proteins and several RNA binding proteins like TRBP (transactivation response element RNA binding protein), GW182 (a glycine tryptophan repetitive sequence domain containing RNA binding protein) and PACT (interferon inducible dsrna dependent protein kinase activator A) in humans, loquacious in flies (Gregory, R. I. et al. 2006). Although the exact role of the RNA binding proteins is not fully understood, they are identified to play a role in acting along with Dicer to process the premirna and also to facilitate target inhibition. The AGO proteins consists of three domains the PAZ, the Mid and the Piwi domains. The PAZ and Mid domains bind to the 3 and 5 ends of the mirna respectively which enable the Piwi domain to position itself in the center so as to bring about the endonucleolytic cleavage observed in some mirna-target interactions (Wang, Y. et al. 2008). AGO1 and AGO2 is most widely associated with plant mirnps (Fagard, M. et al. 2000). An important domain in the RNP is the helicase which unwinds the stem region by opening up the intramolecular hydrogen bonding. One of the strands remains associated to the RNP while the other gets degraded. The thermodynamic stability of the stem region determines which strand gets degraded. The strand with weaker 3 end gets degraded (Krol, J. et al. 2004). Another parameter is the strand preference determined by the AGO protein in the RNP. The AGO1 can selectively incorporate the strand which contains a 5 Uridine while AGO2 and AGO4 prefers the mirna with 5 adeninosine (Mi, S. et al. 2008). The strand that remains integrated to the protein complex is called the Guide strand (since it is this fragment that leads the mirnp to the target transcript) and the strand that gets degraded is called the Passenger strand (since it is this strand that remains attached to the now mature mirna) (Krol, J. et al. 2004). Sometimes, both strands of the pre-mirna can act as mature mirnas (Ro, S. et al. 2007). Once the mature mirna is loaded to the RNP, the complex is now capable of silencing target transcripts. The 5

7 Review of Literature and Introduction complex now is called RNA-Induced Silencing Complex (RISC) or mirna-induced Silencing Complex (mirisc) (Figure 1.2). Figure 1.2: An overview of mirna biogenesis and action 6

8 Once the mirisc is formed, the complex is now ready to target specific transcripts which harbor complementary sites in their 3 UTRs. The mechanism by which translational inhibition is brought about by means of mirna is not thoroughly understood (Pillai, R. S. et al. 2007). There is no unifying generalized manner by which RNAi occurs. Due to the interaction of mirisc with the transcript, there could be degradation of the transcript, reduction of its translational ability or even its activation (Pillai, R. S. 2005)a). The most common amongst them however, is the reduction in the translational ability of target transcripts. This is brought about by three different broad approaches: 2. Repression of translation at initiation: This mechanism was clear when the mirnas and target mrnas co-sediment along with polyribosomes in sucrose-gradient (Kim, J. et al. 2004). 3. Repression of translation by preventing initiation: It has been demonstrated that the AGO proteins have cap-binding abilities which thereby tether the ends of mrnas, thus blocking the translation (Lytle, J. R. et al. 2007). 4. Repression of translation at elongation: The proteolytic cleavage of nascent polypeptide chain which is obtained by incomplete translation of the target transcript due to the block created by the mirisc in the 3 UTR (Maroney, P. A. et al. 2006). Genome-wide microarray studies and the recent proteomics analysis of mirna and their targets have indicated that a large fraction of mirna targets tend to get degraded as a result of the mirna-target interaction (Liu, J. 2008). This is either brought out by the deadenylation and decapping which leads to decay of the target transcript (Eulalio, A. et al. 2007), (Eulalio, A. et al. 2009) It is also possible that in cases where the binding of mirna with the target is through almost complete complementarity, there is probability of Dicer mediated target site cleavage (Mallory, A. C. et al. 2008). The interaction of mirna to the target site is mediated by intermolecular hydrogen bonds by complementary sequences. It has been noticed that in most cases, the binding is through incomplete complementarity. Several studies have been performed to understand the precise binding pattern (Watanabe, Y. et al. 2007)a, (Stark, A. et al. 2003), (Lewis, B. P. et al. 2003)a). An extensive study has been done on Drosophila wing imaginal disc by interaction of the mir-7 with the EGFP 3 UTR (Watanabe, Y. et al. 2007)a). Point mutations were created serially from the 5 end of the mirna. Even if the first base (with respect to 5 end of mirna) is mutated, the RNAi 7

9 effect was unaffected. But mutation in any of the bases 2-7 resulted in loss of mirna effect. Hence, the region between bases 2-7 is termed as the seed region. It was also noted that single or double mutations in subsequent bases also did not reduce the mirna effect. Another set of experiments identified that more than three G:U wobble bonds between the mirna and target site reduces the RNAi effect (Doench, J. G. et al. 2004)a). G:U wobble in seed region is also detrimental for RNAi. Hence, the following rules could be deduced (Figure 1.3): 5. mirna binds to regions of complementarity in the 3 UTR of transcripts 6. A seed stretch of 2-7 bases towards the 5 end of the mirna should have completely complementary bases. 7. The other bases should have approximately half of the bases which are not complementary to the target site. 8. An optimum minimal free energy (MFE) should be maintained to form a stable duplex. 9. G:U wobbles should not be present in the seed region. 10. Not more than three G:U wobbles should be present in the mirna target site. Figure 1.3: Binding modality of mirna to target site. Later, studies in other systems and mirna-target pairs identified more characteristics for mirna target sites (Nielsen, C. B. et al. 2007)a, (Saxena, S. et al. 2003) (Figure 1.4). The basic mirna binding mechanism described above can be again classified as Seed based (pairing well at the seed region) and Canonical (where there is significant pairing at the seed region as well as towards the 3 end of mirna). In cases where the seed match is not observed, there still could be a good mirna target site, provided a significant number of bases have perfect 8

10 complementarity towards the 3 end of the mirna. This type of binding is termed 3 Compensatory binding. Figure 1.4: Classification of types of binding modality of mirna to target site. Multiplicity and cooperativity are two consequences that arise due to the above mentioned modes of mirna interaction to the target site (John, B. et al. 2004). Since totally complementary sites are not required by mirna to identify targets, one mirna can have more than one target (multiplicity) while one particular transcript can be a target of many mirnas (cooperativity). Involvement of mirnas in several pathways is being uncovered, by various experimental approaches like conditional deletion of the mirna processing machinery of specific mirnas or use of artificial mirna. The earliest observed phenotypic effects due to mirna mediated gene silencing were in development and morphogenesis (Cao, X. et al. 2007). The effects were soon to be evident in case of metabolism, apoptosis and organogenesis (Esau, C. et al. 2006), (Chan, J. A. et al. 2005), (Zhao, Y. et al. 2007). Their role in hematopoesiss and stem-cell differentiation are also evident (Chen, C. Z. et al. 2004), (Suh, M. R. et al. 2004). These molecules have also emerged as key regulators of several stages of immune response. Example, the role of mir-155 during active immune response by negatively regulating switching to IgG1 and also during inflammation (Thai, T. H. et al. 2007), the role of mir-146 family in regulating proinflammatory cytokines (Taganov, K. D. et al. 2006), the role of mir-150 in lymphocyte development (Zhou, B. et al. 2007), the importance of mir-181a in T-cell differentiation (Li, Q. J. et al. 2007)a), the role of mir family in B and T-cell 9

11 differentiation (Mendell, J. T. 2008). mirnas are also an integral component in the hostpathogen relationship. Human mirnas are known to target five key genes in the HIV genome and causing reduction of viral proliferation (Hariharan, M. et al. 2005), (Ahluwalia, J. K. et al. 2008). Especially in case of viral-host interaction where the genome of the pathogen is open to be amenable to interaction with components of the host genome, mirna mediated interaction is more relevant. Several mirnas are known to be downregulated or deleted in cancers of several types (Calin, G. A. et al. 2006). They are also found to be regulators of several diseases like Asthma, cardiovascular diseases, Diabetes, Fragile X syndrome etc (Tan, Z. et al. 2007), (Williams, A. E. 2008), (Li, Y. et al. 2008) (Table 1.2). Table 1.2: Brief list of functions of human mirnas mirnas involved basic functions mir Function mirnas involved Reference stem cell differentiation hsa-mir-302b, -302c, -302d (Suh, M. R. et al. 2004) haematopoiesis hsa-mir-142, -155, -188, -233 (Chen, C. Z. et al. 2004) cardiac and skeletal muscle development hsa-mir-1, -2 (Zhao, Y. et al. 2007) neurogenesis hsa-mir-124, -134 (Cao, X. et al. 2007) insulin secretion hsa-mir-375 (Poy, M. N. et al. 2004) cholesterol metabolism hsa-mir-122 (Esau, C. et al. 2006) Apoptosis hsa-mir-21, -24 (Chan, J. A. et al. 2005) Cytokine regulation hsa-mir-106a (Sharma, A. et al. 2009) [part of this thesis work] mirnas involved Disease Disease mirnas involved Reference HIV progression hsa-mir-29a, -29b, p, -378, -149 (Hariharan, M. et al. 2005) [part of this thesis work] Fragile X syndrome hsa-mir-134 (Li, Y. et al. 2008) Diabetes hsa-mir-375 (Williams, A. E. 2008) Coronary Artery Disease hsa-mir-1 (Tan, Z. et al. 2007) Asthma hsa-mir-148, -152 (Tan, Z. et al. 2007) mirna nomenclature follows precise rules which has helped functionally categorize them (Griffiths-Jones, S. et al. 2006). On acceptance of a publication describing a novel mirna, the author submits the entry and the details to the mirbase registry. The first three characters in the 10

12 name is an abbreviation of the species from which the mirna arises. After a hyphen, the three letters mir is again followed by a hyphen. The number after this is the successive digit to the last existing mirna. In case a mirna with identical mature sequence is discovered on a different chromosomal loci, a numerical suffix (-1, -2 etc) is added to the existing name. Some mirnas may have almost same sequence with few changes in the mature mirna sequence whereby, an alphabetical suffix is included (29a, 29b etc). Few mirnas are known to arise from both strands of the stem structure in the pre-mirna. In this case, the mirna arising from the 5 arm of the pre-mirna is named with a -5p suffix while the other mirna is named with a -3p suffix (Figure 1.5). Figure 1.5: Nomenclature of mirna: (A) The mirnas hsa-mir-29a and -29b differ by a few sequences and hence have an alphabetical suffix ( a, b) to the mirna name. The mirnas hsamir-29b-1 and hsa-mir-29b-2 have identical sequence, but arise from different chromosomal loci. Hence they have same mirna name with numerical suffix (-1, -2). (B) The mirnas hsamir-29a and hsa-mir-29a* arise from the same precursor hairpin, from opposite strands of the stem structure (marked in purple). Hence, one of the mirna is named with suffix *. Although it is widely recognized that mirnas arise from non-coding regions in the genome, it is true that several mirnas also arise from the genic regions (Figure 1.6). As per the mirbase version 9, where 470 human mirnas have been catalogued, a major part of the known human mirnas arises from genic locations. They are known to be present in either exons of genes which have alternate splice variants, in the UTRs of the genes or in the introns of the protein-coding genes. In the introns, they could occur either in the same orientation as that of the source gene or in the opposite orientation (Figure 1.7). 11

13 Figure 1.6: Organization of mirna in the genic region: The blue boxes marked as E indicates exons while the red line indicates intronic regions. The last exon could be nontranscribing in certain splice variants where they become part of the 3 UTR. mirnas are known to be present within exons, in introns (both in sense orientation and antisense orientation with respect to that of the source gene), in the UTR regions and also in the mixed UTR region (depicted by the last exon mirna). Figure 1.7: Distribution of mirna in the genome: 55% of the known human mirnas arise from genic regions. 41% of these arise from introns, encoded in the same orientation as that of the source gene (represented as Sense). Three major questions are being addressed as part of this thesis work: 1. Possible cascading effect on regulatory networks as a result of transcription factors being targeted by intronic mirna 2. Model for role of Intron derived mirnas in coordinated regulation of several proteins 3. Potential effect of Single Nucleotide Polymorphisms in and around the mirna binding sites 12