Chi To BME 230 Final Project Relationship of Gene s Types and Introns Abstract: The relationship in gene ontology classification and the modification of the length of introns through out the evolution is an important information can help for the process of determination the origin of intron. The statistics shows that genes in dog that have the biological process are actin cytoskeleton organization and biogenesis, actin filament-based process tends to have many deletion in the introns of the genes. Introduction Most eukaryotes contain multiple introns per gene. This requires hundreds of thousands to millions of individual intron gains to have occurred throughout eukaryotic evolution. For example, there are an extremely large percentage of bases in introns compared with small amount of bases in exons in human genome (33% of genome and less than 2% in human genome). Five different hypotheses for the origin of introns have been proposed: (i) intron transposition: An intron from one gene is spliced out of an mrna transcript. That intronic RNA sequence then reinserts into a previously intronless site of a transcript of the same or different gene. That structure is then retroposed to give a DNA copy of the gene containing an intron at the new site. (ii)transposon insertion: A transposon inserts into a contiguous coding region and is transformed into an intron. (iii) Tandem genomic duplication: A region including part or all of an exon with and internal AGGT is duplicated. The two homologous AGGTs are then used as 5 and 3 splicing boundaries for a new intron. (iv) Intron transfer: A gene undergoes a gene conversion or simple double recombination with an intron-containing paralog. (v)self-splicing type II intron insertion: a type II insertion, presumably from an organelle of the same organism, inserts into a contiguous region of coding sequence of a nuclear genome and is then converted to a spliceosomal intron. [1] To determine the origin of introns, it is important if one can find whether introns of a species is gain or loss in comparison with their ancestors[2]. Intron early mean
Materials and Methods The UCSC genome browser was used to get a bed file for the data of all exons in the Boreoeutherian genome (ancestral genome)[3]. From the data, the coordinates of exons and the gene IDs were extracted, and a perl program called getintron.pl used these extracted data to created a script that have the information of all introns in the Boreoeutherian genome. This script was then used by a program called chaintoaxt which helped to do a pairwise sequences alignment between the Boreoeutherian genome and the dog s genome, and another pairwise sequences alignment between the Boreoeutherian genome with the human genome. [4]. The outputs of these pairwise sequences alignments were then used by another perl program called getidalign.pl which helped to determine the number of deletion or insertion in the dog and human genome. The number of deletions and insertions were determined by counting the number of gaps, and only gaps have the values greater than 100bp were selected; otherwise, they were just considered noises. Four output files was received after the getidalign.pl called. In each output file, the gene coordinates for only introns that had either more than 100bp deletion or insertions were shown. Another perl program called getgene.pl was then used to match the information from each output file to get the gene IDs for selected introns for each file. Then, four geneid output files (insertion in dog, deletion in dog, insertion in human, and deletion in human) were upload to the GOstat website, which helped to do the Gene Ontology statistics on gene IDs from each file by using the hypergeometrics distribution statistics [5]. Result
Table 1: Insertion in Human GO:0006813 Biological process: potassium ion 0.28 transport GO:0005886 Cellular component: protein complex 0.28 Table 2: Deletion in Human GO:0003779 Mol. Function: Actin Binding 0.23 GO: 0008092 Mol. Function: cytoskeletal protein binding 0.23 GO: 0005096 Mol. Function: GTPase activator activity 0.23 Table 3: Insertion. In Dog GO:0016772 Mol. Function: transerase activity, 0.65 transferring phosporus-containing groups GO:0005057 Mol. Function: receptor signaling protein 0.65 activity GO:0016301 Mol. Function: kinase activity 0.65 Table 4: Deletion in Dog.
GO:0030036 Actin cytoskeleton organization and biogenesis 0.0598 GO:0030029 Bio. Process: Actin filament-based process 0.0598 GO:0007010 Bio. Process: cytoskeleton organization and biogenesis 0.0598 Figure 1 Discussion: The result form the Gene Ontology (G.O.) statistics shows that for human, genes have the biological process is potassium ion transport tend to have many insertions in the introns (Table1). However, the p-value of getting this result was little high (0.28), so there is a little high chance that this G.O classification is getting from random (28%). Table 2 shows that genes in human that have the molecular function is actin binding, cystokeletal protein binding, or GTPase activator activity tend to have many deletions in the introns (Table 2). However, the p-value of this outcome is also high, so our prediction is very wiggle. Table 3 shows that genes in dog that have molecular function are transferase activity ( transferring phosporus-containing groups),
receptor signaling protein activity and kinase activity tend to have many insertions in the introns. However, similarly with the predictions of table 1 and table 2, the prediction of the outcome G.O. classification is very wiggle! One reason to explain for the high value of p-values for the table1, table 2 and table 3 can be the size of our data. The data used in this experiment was small, the result can be improve by increasing the size of the data.. Table 4 shows a quite interesting result. It shows that genes in dog that have the biological process are actin cytoskeleton organization and biogenesis, actin filament-based process tends to have many deletion in the introns of the genes. Our p-value is very small here, and it shows that it is very small chance that the G.O. classification is getting from random (only about 5-6%). Figure 1 was generate from the UCSC genome browser, and it shows an very interesting feature. In this figure, it shows that more than ¾ the intron of the dog and about ½ the intron of human was deleted during the evolution in comparing with the original intron of the ancestral! Conclusion: The information of biological function, biological process, and components of genes related to how the introns delete or insert may be meaningful in the research of the origin of introns. By knowing the relationship of Gene Ontology classification and the intron, the researches can narrow down their experiments to the specific type of genes have insertions or deletions! Reference: 1/ Scott W Roy, The origin of recent introns: transposons?. Genome Biology 2004, page 2). 2/ Coghlan, A. and Wolfe, K., Origins of recently gained introns in Caenorhabditis. www.pnas.org/cgi/doi/10.1073/pnas.0308192101 3/ UCSC genome browser, http://genome-test.cse.ucsc.edu/ 4/ Robert Baertsch, chaintoaxt for pairwise alignment. 5/ Gostat, http://gostat.wehi.edu.au/ - for the G.O. statistics