Gene Networks for Neuro-Psychiatric Disease

Size: px
Start display at page:

Download "Gene Networks for Neuro-Psychiatric Disease"

Transcription

1 International Journal of Emerging Engineering Research and Technology Volume 1, Issue 2, December 2013, PP 8-13 Gene Networks for Neuro-Psychiatric Disease Brijendra Gupta, Prof. R.B. Mishra Department of Computer Engg., I.I.T(BHU), Varanasi, , India Abstract: A Genome Wide Association study focuses on detecting genetic variants and traits of a specific disease by comparing the genetic variants between a case as well as a control population. Common variants and rare mutated genes may be involved in the pathogenesis of common diseases. Common disease-rare variant (CD-VR) indicates that multiple rare variants serve as the main factor to influence some common diseases. We apply variants predictions before statistically analyzing proteins. In this paper, we prioritize candidate genetic variants through association principle, on the assumption that genetic variants associated with the similar disease share some common physiochemical properties. We Focus on a specific genetic variants called single amino acid polymorphisms (SAPs), based on physiochemical features of amino acids, sequence investigation of genes, and multiple alignment of protein families to gather the power of prioritizing candidate SAPs through STRING for specific diseases. Keywords: Common disease-rare variant, single amino acid polymorphisms, amino acids, STRING 1. INTRODUCTION Human variation data is the most valuable information evolving from the Human Genome Project (HGP). The current issue is how to optimally use this data to better understand trait association and speedup the pace towards treatments. In fact, there are numerous questions on the exact relationship between genetic variants, its phenotypes and diseases. Many prediction tools exist among the databases, but only few are covering mutations over all genes. They are mostly gene-centric, with very little information related to the proteome. Molecular view and protein interaction over diseases emphasizes the relationship between the disease networks and functional networks. The human genome project [International Human Genome Sequencing Consortium, 2001] completion has provided a huge volume of data that characterizes all human genes and defines the role of genetic diversity in determining health and disease. Very few traits have only a single genetic origin. Most of them depend on the combinations of various other genetic factors, along with environmental influences. To develop the relationship between genetic and phenotypic variation, if locus- specific mutation databases relevant to a given protein exists, then a direct link to these databases can be found to represent a complement to the data in UniProtKB/Swiss-Prot by focusing on one single gene and providing more complete information on each variant. SNP helps to detect regions in the human genome involved in disease out of coding and regulatory regions. Non-synonymous SNPs leads to an amino acid change in the protein product and substitutes nearly half of the known gene lesions responsible for human inherited disease [7]. Bioinformatics tools are expected to functionally predict variants. Then the functional variants are chosen in the statistical analysis successively. Single nucleotide polymorphisms (SNPs) leads to single amino acid polymorphisms (SAAPs) into proteins, which potentially affect functions and structures of IJEERT 8

2 Brijendra Gupta & Prof. R.B. Mishra proteins, and cause human diseases [1]. Several existing methods, such as PolyPhen [2], MSRV [3], SIFT [4], and KBAC [5] identify SAAPs that are associated with diseases with a binary classification problem but give no information about specific diseases the SAAP is associated. Therefore, contribution for clinical applications is limited and provided with the only classification results of these methods [6]. Swiss-Prot is a manually curated knowledgebase which provides information on protein sequence and functional annotations. For each SAP, the knowledgebase provides information on the effects of SAPs on protein structure and function, the position of the mutation, resulting change in amino acid, as well as their involvement in diseases. Online Mendelian Inheritance in Man (OMIM) database enhances the integration of phenotype information of data at protein level. Swiss-Prot database gives direct access to the newly improved Swiss-Prot variant pages. It provides a possibility to query for similar traits as well as the protein functions and molecular details of each variant. There are several ways to get mutation-related information from the knowledgebase. For example, if one wants to find protein entries with data on variants, Disease-related keywords are also regularly being created to allow easy retrieval of proteins involved in complex disorders. Position of a Protein-Centered Mutation Resource is compared to Genomic and Phenotypic Information. Proteins are important participators in most cellular activity. They help in understanding genomic variations and phenotypic results. Protein-centered view offers a variant data; UniProtKB/Swiss-Prot provides information on the gene-centered view or disease-centered view of most SNP-related databases. Knowledgebase not only simply list amino acid changes predicted out of nucleotide variations, but also stores, when available, information on direct protein sequencing and characterization by simple translation of single nucleotide substitutions at the DNA level. By offering numerous links to genomic, UniProtKB/Swiss-Prot can be regarded as an integration platform between genomics and proteomics. As for phenotypic data, UniProtKB/Swiss-Prot is linked to OMIM. Presently, 2,601 genetic diseases described in the knowledgebase have a direct link to the OMIM phenotype entries. The users go to the links to retrieve more detailed disease information to record the disease comment lines. Each OMIM entry has a summary of up-to-date knowledge about phenotypes (genetically determined) and information on inheritance and representative allelic variants. OMIM provides the cytogenetic map location of disease genes. As OMIM is mainly directed to geneticists, it is not specifically concerned with sequenceoriented data. So, there is a low correspondence between allelic variants and sequence information and it is difficult to map allelic variants over a sequence. 2. METHODS Mutation to any other amino acid is predicted to disorder protein function and if a position in an alignment contains the hydrophobic amino acids isoleucine, leucine and valine, then this position can only contain amino acids of hydrophobic character. For this position, changes to other hydrophobic amino acids are predicted to be tolerated but changes to other (such as charged or polar) residues will be predicted to affect protein activity. STRING predicts nonconservative amino acid changes as deleterious, whereas non-conservative substitutions are defined to have negative scores in an amino acid substitution scoring matrix. Since there will be positions that differ between the test protein with the other sequences. So, Depending on the amino acids at these positions, we may predict protein activities importance. This information can delete functionally neutral substitutions and increase deleterious substitutions selectivity. Single-nucleotide polymorphisms (SNPs) are a kind of major genetic variants in large population scale. But, polymorphisms at the International Journal of Emerging Engineering Research and Technology 9

3 Gene networks for Neuro-Psychiatric Disease proteome level remain elusive. Here, we named amino acid variances as single amino acid polymorphisms (SAPs) at the proteome level which is derived from SNPs within coding regions, and develops a pipeline of targeted proteomics to detect SAP peptides and quantify them. Since SNPs in coding regions may not change its amino acid sequence of the protein. So, SNPs within coding regions is divided into two categories: synonymous and nonsynonymous. The non-synonymous polymorphisms results in variation of amino acid sequences and have impact on genes functions SNPs are located in coding regions out of 200 human exomes, where less than half of SNPs were synonymous and other were non-synonymous [8]. This suggests that non-synonymous polymorphisms are highly distributed in human genome only the nonsynonymous polymorphisms can be detected both qualitatively and quantitatively at the proteome level. Mutations at Mendelian point of non-synonymous amino acids causing functional change of proteins e.g. hemoglobin of anemia [9]. The substitution of amino acids at protein level at certain important areas, may change the normal structures of proteins, and results in specific pathological or physiological changes, which may rise from either post-translational modifications or functional proteins abundances. Swiss-Prot database [10] is used to provide the information of single amino acid polymorphisms. In our study, we focus on SAAPs that corresponds to OMIM accession. Query by disease enables search using disease by names, their OMIM identifiers of the disease category. The second type of query is protein centric. Finally, variants present in Swiss-Prot can be searched through their molecular characteristics. Several attributes of the amino acids concerned by the mutation can be specified, like the conservation score of the residues, its sequential and structural environment, its surface accessibility as well as its involvement in interfaces are over all adjustable parameters. The variants can also be detected by the position of the mutation or the type of amino acid change. We constructed a PPI network from Swiss- Prot database to observe Genes for detecting Gene- Disease association. The lists of genes for all five Neuro-Psychiatric Disease are as follows: [fig.1-5] Figure 1. Representing the amino acid presence within the Genes for ADHD Figure 2. Representing the amino acid presence within the Genes for DEMENTIA Figure 3. Representing the amino acid presence International Journal of Emerging Engineering Research and Technology 10

4 Brijendra Gupta & Prof. R.B. Mishra within the Genes for MOOD DISORDER 35 Genes are more interacted in the above graph network, which can be explained as below in table 1. Further, these Genes have major roles in affecting Brain activities and its mutations involves various Neuro-psychiatric Disease, are as follows: Table 1. Representing 35 Genes location, mutation effects and Amino Acid richness for all 5 Neuropsychiatric Diseases Figure 4. Representing the amino acid presence within the Genes for OCD Figure 4. Representing the amino acid presence within the Genes for SCHIZOPHRENIA And the Random Walk With Restart(RWR) algorithm working for STRING database provides the group of association of all Genes for transcriptional response towards Neuropsychiatric Disease and the graphical network mapped is as follows: Figure 6. Representing 63 Genes interactions for all 5 Neuro-psychiatric Diseases Juvenile Batten disease impairs mental and motor development causing difficulty with speaking, walking and intellectual activities. Kufs disease type A is a type of Neuronal Ceroid Lipofuscinosis (NCL) characterized by seizures, problems with movement, and intellectual function. Lipopigments accumulates in the lysosomes of nerve cells (neurons) in the brain resulting in cell dysfunction and eventually cause cell death. amyotrophic lateral sclerosis (ALS), a condition characterized by progressive movement problems and muscle wasting. These mutations change single protein building blocks in the FUS protein. Frontotemporal dementia International Journal of Emerging Engineering Research and Technology 11

5 Gene networks for Neuro-Psychiatric Disease (FTD), which is a progressive brain disorder that affects personality, behavior, and language. Aceruloplasminemia have been identified when these mutations substitute one protein building block (amino acid) for another amino acid in the ceruloplasmin protein, resulting an unstable protein that quickly breaks down (degrades). Mutations in the IRE segment of the FTL gene prevent it from binding with IRP by which ferritin production is matched to iron levels and resulting in excess ferritin being formed, resulting in the release of excess iron in nerve cells (neurons) of the brain. Huntington disease causes emotional problems, uncontrolled movements, and loss of thinking ability. Supranuclear palsy is related to abnormalities in the tau protein. The defective tau protein assembles into abnormal clumps within neurons and other brain cells. Parkinson disease, a condition characterized by progressive problems with movement and balance. Dementia impairs loss of intellectual functions. Alzheimer disease increases number of protein clumps, called amyloid plaques, in the brain leading to the death of neurons. Perry syndrome progresses brain disease characterized by a pattern of movement abnormalities known as parkinsonism having psychiatric changes and abnormally slow breathing. Hereditary cerebral amyloid angiopathy, characterized by stroke and a decline in intellectual function. Neonatal encephalopathy cause small head size, movement disorders, slow breathing and seizures. Hereditary sensory and autonomic neuropathy, characterized by a gradual loss of intellectual functions (dementia), deafness, and sensory problems in the feet. Wilms tumor, Aniridia, genitourinary anomalies, and mental retardation, certain common genetic variations (polymorphisms) in the BDNF gene have been associated with an increased risk of developing psychiatric disorders such as bipolar disorder, anxiety, schizophrenia, and eating disorders. DRD3 mutation creates neurotransmission problem. 22q11.2 deletion syndrome leads to abnormal regulation of catechol-omethyltransferase levels in the brain, risk of behavioral problems and mental illness. ataxia neuropathy spectrum problems with coordination and balance (ataxia) and disturbances in nerve function (neuropathy). Alpers-Huttenlocher syndrome impairs- muscle-, nerve-, and brain-related functions, characterized by seizures, loss of mental and movement abilities. 3. RESULTS Now, the 35 Genes are again plotted with STRING network and the resulted concised network ia as follows( Fig.7) Figure 7. Representing 35 most interactive Genes interactions for all 5 Neuro-psychiatric Diseases The STRING NETWORK presents a complete outline Of known information on each variant, improved by newly added features which includes the display of conservation score of the mutated residue at sequence and structural level; ligand binding or post-translational changes and residues involved in protein protein interaction with 3D information availability. 4. CONCLUSION NCBI portal provides access to this wealth of data by using the possibility to gather similar International Journal of Emerging Engineering Research and Technology 12

6 Brijendra Gupta & Prof. R.B. Mishra diseases proteins/variants, and allowing variants to use a range of sequences and structured parameters. The classification is based on traits Similarities and incorporated by comprehensive structured phenotype information to enhance the Neuro-psychiatric disease query and STRING provides a unique environment and search facility to find the relationship between human variants and traits, particularly focusing on human proteome and offering search possibilities that directly link molecular details of Single Amino Acid Polymorphism to disease classification. [8] Li, Y., Vinckenbosch, N., Tian, G., et al. (2010). Resequencing of 200 human exomes identifies an excess of lowfrequency non-synonymous coding variants. Nat. Genet. 42, [9] Kutlar, A. (2007). Sickle cell disease: a multigenic perspective of a single gene disorder. Hemoglobin 31, [10] T.U. Consortium, The Universal Protein Resource (UniProt) in 2010, Nucleic Acids Res, 2010, 38: D REFERENCES [1] J.Wu, W. Zhang, R. Jiang, Comparative study of ensemble learning approaches in the identification of disease mutations, BMEI [2] V. Ramensky, P. Bork, S. Sunyaev, Human non-synonymous SNPs: server and survey, Nucleic Acids Res, 2002, 30: [3] R. Jiang, H. Yang, L. Zhou, C.C. Kuo, F. Sun, et al., Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations, Am J Hum Genet, 2007, 81: [4] P.C. Ng,.S. Henikoff, SIFT: Predicting amino acid changes that affect protein function, Nucleic Acids Res, 2003, 31: [5] D.J. Liu, S.M. Leal, A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions, PLoS Genet 6, 2010, e [6] D. Altshuler, M. Daly, L. Kruglyak, Guilt by association, Nat Genet, 2000, 26: [7] Krawczak,M., Ball,E.V., Fenton,I., Stenson,P.D., Abeysinghe,S., Thomas, N. and Cooper,D.N. (2000) Human gene mutation database-a biomedical information and research resource. Hum. Mutat., 15, International Journal of Emerging Engineering Research and Technology 13