Inquest of the SNP in Cystic Fibrosis A Bioinformatic Approach

Similar documents
History of the CFTR chase

(a) (3 points) Which of these plants (use number) show e/e pattern? Which show E/E Pattern and which showed heterozygous e/e pattern?

Bio 101 Sample questions: Chapter 10

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function.

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

ELE4120 Bioinformatics. Tutorial 5

Types of Databases - By Scope

HASPI Medical Biology Lab 02 Background/Introduction

An introduction to genetics and molecular biology

Algorithms in Bioinformatics

DNA DNA Profiling 18. Discuss the stages involved in DNA profiling 19. Define the process of DNA profiling 20. Give two uses of DNA profiling

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

4.1. Genetics as a Tool in Anthropology

Bioinformatics for Proteomics. Ann Loraine

DNA Replication and Protein Synthesis

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Protein Synthesis. OpenStax College

Genes and human health - the science and ethics

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Fig Ch 17: From Gene to Protein

8/21/2014. From Gene to Protein

AS91159 Demonstrate understanding of gene expression

CHAPTER 17 FROM GENE TO PROTEIN. Section C: The Synthesis of Protein

number Done by Corrected by Doctor Hamed Al Zoubi

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Basic Bioinformatics: Homology, Sequence Alignment,

Hands-On Four Investigating Inherited Diseases

Fundamentals of Genetics. 4. Name the 7 characteristics, giving both dominant and recessive forms of the pea plants, in Mendel s experiments.

DNA & DNA Replication

Worksheet for Bioinformatics

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Table of Contents. Chapter: Heredity. Section 1: Genetics. Section 2: Genetics Since Mendel. Section 3: Biotechnology

Central Dogma. 1. Human genetic material is represented in the diagram below.

CHAPTER 20 DNA TECHNOLOGY AND GENOMICS. Section A: DNA Cloning

Protein Synthesis & Gene Expression

REGISTRATION DOCUMENT FOR RECOMBINANT DNA RESEARCH

Gene Expression: Transcription

Chapter 15 Gene Technologies and Human Applications

Why learn sequence database searching? Searching Molecular Databases with BLAST

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Molecular Genetics Student Objectives

Not all mutations result in a change to the amino acid sequence of the encoded polypeptide

Self-test Quiz for Chapter 12 (From DNA to Protein: Genotype to Phenotype)

Introduction to Pharmacogenetics Competency

DNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted

GENETICS: BIOLOGY HSA REVIEW

Protein Bioinformatics Part I: Access to information

MATH 5610, Computational Biology

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

2012 GENERAL [5 points]

Rapid Learning Center Presents. Teach Yourself AP Biology in 24 Hours

From Gene to Protein transcription, messenger RNA (mrna) translation, RNA processing triplet code, template strand, codons,

Name_BS50 Exam 3 Key (Fall 2005) Page 2 of 5

Answers to additional linkage problems.

Unit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total)

Pre-Lab: Molecular Biology

Basic Concepts of Human Genetics

Genetics Lecture 21 Recombinant DNA

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

BIOLOGY LTF DIAGNOSTIC TEST DNA to PROTEIN & BIOTECHNOLOGY

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Overview of Health Informatics. ITI BMI-Dept

Biotechnology Explorer

Chapter 9 Genetic Engineering

DNA Function: Information Transmission

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

GENETICS. I. Review of DNA/RNA A. Basic Structure DNA 3 parts that make up a nucleotide chains wrap around each other to form a

CHAPTER 21 LECTURE SLIDES

Sequence Databases and database scanning

NON MENDELIAN GENETICS. DNA, PROTEIN SYNTHESIS, MUTATIONS DUE DECEMBER 8TH

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write.

Polymerase Chain Reaction (PCR) and Its Applications

Huether and McCance: Understanding Pathophysiology, 5 th Edition

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Examination Assignments

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

Sequence Analysis Lab Protocol

7 Gene Isolation and Analysis of Multiple

GUIDANCE ON THE EVALUATION OF NON ACCREDITED QUALIFICATIONS

DNA RNA PROTEIN SYNTHESIS -NOTES-

BA, BSc, and MSc Degree Examinations

If Dna Has The Instructions For Building Proteins Why Is Mrna Needed

Chapter 15 DNA and RNA

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

I nternet Resources for Bioinformatics Data and Tools

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016

Unit 2 Review: DNA, Protein Synthesis & Enzymes

Introduction to Basic Human Genetics. Professor Hanan Hamamy Department of Genetic Medicine and Development Geneva University Switzerland

DNA & Protein Synthesis UNIT D & E

Goals of pharmacogenomics

Biomedical Big Data and Precision Medicine

not to be republished NCERT BIOTECHNOLOGY AND ITS APPLICATIONS CHAPTER BIOLOGY, EXEMPLAR PROBLEMS MULTIPLE-CHOICE QUESTIONS

Chapter 13. From DNA to Protein

7.06 Cell Biology QUIZ #3

Mutation entries in SMA databases Guidelines for national curators

Transcription:

International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 6 Number 8 (2017) pp. 1255-1263 Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2017.608.152 Inquest of the SNP in Cystic Fibrosis A Bioinformatic Approach Balakrishnaraja Rengaraju 1*, Keerthana 1, Akila 1, K. Pavithra 1, D. Vinotha 1, Sri Harsha Challapalli 2 and Arunava Das 1 1 Bannari Amman Institute of Technology, Sathyamangalam-638401, India 2 ETH Technical University, Zurich-8093, Switzerland *Corresponding author A B S T R A C T K e y w o r d s SNP, CFTR gene, Cystic fibrosis, BLAST, SWISSPROT. Article Info Accepted: 17 June 2017 Available Online: 10 August 2017 Single nucleotide polymorphisms or SNPs are DNA sequence variations occurring due to an alteration of a single nucleotide of the genome sequence. These variations have a major impact on human response to diseases and therapeutic drugs. Although SNPs may or may not cause a disease, they can help us to determine the probability of occurrence of the disease. One such well-known lethal genetic disorder is Cystic Fibrosis. It is an autosomal recessive disease caused by an alteration in the CFTR gene (Cystic Fibrosis Trans membrane conductance Regulator gene), which codes for a chloride channel protein, found in the membranes lining the respiratory and digestive tracts. Bearing some common symptoms is the disorder chronic rhino sinusitis, which can also be traced to the alteration in the same CFTR gene. A comparative study of the two diseases indicates that the former requires two copies of the altered gene whereas a single copy may influence the occurrence of the latter. Our SNP analysis of this gene reveals that a point mutation occurs at locus 5618 of the nucleotide sequence leading to the conversion of the amino acid histidine to proline thus affecting the threedimensional structure of the protein. Introduction Along the length of the helical strand are strewn, the basic entities of inheritance genes that are constituted of nucleotide sequences. These sequences comprise the different combinations of the alphabets of life -A, T, G and C. The arrangement of these nitrogenous bases remains the same in almost all the genomes of a species except for certain loci that may exhibit variations. Such alterations are due to change in a single nucleotide and hence these are referred to as Single Nucleotide Polymorphism or SNPs. For example, the change of GGAATC to GGAATT can be considered as an SNP. At least 1% of the population must have these variations for 1255 them to be considered as an SNP. In the three billion base human genome, there is one SNP for every 100-300 bases. They can be found either in the coding or non-coding regions of the genome. In the former case they can affect the biological properties of a protein occasionally leading to diseased conditions. Even in the latter case, when they are not responsible for the disease, they may serve as genetic markers due to their proximity to the mutated regions. These markers can aid in designing therapeutic drugs for various genetic disorders. SNPs are indicative of how a person responds to diseases, microbes, toxins, drugs and therapies. Hence they are of

great value in pharmacogenomics, diagnostic and biomedical research. SNPs can also help unravel the evolutionary pathway because these are the consequences of past mutations, which prove unique when two individuals are compared given by Riordan et al., (1989) Cystic fibrosis is an inherited disease believed to be caused by an SNP. It is characterized by abnormal fluid secretion. Cystic fibrosis (CF) is a recessively inherited condition caused by mutation of the CFTR gene given by Southern et al., (2007). Cystic Fibrosis Trans membrane conductance Regulator (CFTR) gene plays a key role in chloride ion transport across the membrane. The alteration in the same gene can also result in a different disorder of the mucous membrane known as chronic rhino sinusitis. The gene sequence of CFTR can be analyzed using Bioinformatics tools and the exact position of the SNP can be determined. Being a killer disease, cystic fibrosis has always been an area of extensive research. Studies have been carried out to identify its root cause and the possible remedies. Cystic fibrosis can interfere with the normal functioning of the liver, lungs, gastrointestinal tract, pancreas, the sweat glands and the male reproductive system. Pathogens are associated with chronic lung infections and increased risk of death in patients with cystic fibrosis (CF) given by Isabel Sá-Correia et al., (2017). The chemical properties of the mucus are altered resulting in the secretion of thickerthan-normal mucus that obstructs the airways and the ducts of digestive organs. The major symptom is the presence of an excessive amount of salt in sweat As this is caused by a recessive gene, for a person to be afflicted, causes this disease the gene must be inherited from both parents. If a person has just a single copy, he is a carrier of the disease. When two such carriers have a baby, there is 25% chance of the child not to have the disease at all (homozygous dominant), 25% probability that the child will show the symptoms of cystic fibrosis (homozygous recessive) and the other 50% that he or she is a carrier (heterozygous) as illustrated in table 1. The clinical manifestations of CF involved several systems. The most common symptom was recurrent pulmonary infections given by Xu et al., (2017). Although the 3D structure of the entire CFTR has not been experimentally determined, a model depicting the structure of CFTR has been proposed. CFTR is made up of five domains: two membrane-spanning domains (MSD1 and MSD2) that form the chloride ion channel, two nucleotide-binding domains (NBD1 and NBD2) that bind and hydrolyze ATP, and a regulatory (R) domain given by Xiaorong Zhang et al., (2011) (Fig. 1). It is a chronic inflammatory disease Apart from the CFTR gene, cystic fibrosis is also affected by environmental factors and genetic factors. Mutations in the gene responsible for cystic fibrosis may also be associated with the development of another genetic disorder namely, chronic rhino sinusitis in general population. It is a long-lasting inflammatory disease of the nasal and associated mucosa, having a variety of signs and symptoms. Purulent discharges anteriorly and posteriorly are the significant findings in its diagnosis. Large number of patients has been associated with rhinitis, either of sensitized or transferable source. Physiologic testing in them showed normal indices of nasal epithelial sodium transport with a slight reduction in CFTR mediated chloride conductance. Though the same gene is involved in both cystic fibrosis and chronic rhino sinusitis, the alterations in the gene or may not be the same. Those distressed with prolonged rhino sinusitis can transmit a single copy of the CF associated CFTR gene or another difference in the similar gene known as or may not be the same. Those distressed 1256

from prolonged rhino sinusitis could transmit a single copy of the CF related CFTR gene or another difference in the similar gene known as M470V. Insight of CFTR gene The CFTR gene is located at q31.2 on the long arm (q) of human chromosome 7.It is 250000 base pairs long and contains 27 exons. Lap Chee Tsui and John Riordan first isolated the gene in 1989. The coding sequence lie within 4443 base pairs. The CFTR protein is 1480 amino acids long and has a molecular weight of 168173 Da. It is classified as an ABC (ATP-Binding Cassette) transports molecules of sugars, peptides, inorganic phosphates and metal cations. It is especially involved in the transport of chloride ions across the cellular membrane. Model studies on the protein show that it consists of five domains, two membrane spanning domains that form the chloride ion channel, two nucleotide binding domains that bind and hydrolyze ATP and the regulatory domain containing several serine residues that can be phosphorylated by the protein kinase called PKA that is activated by the secondary messenger camp. The presence of the regulatory domain is a unique feature of the CFTR gene and the modification by addition or removal of phosphate groups can regulate the movement of chloride ions across the membrane. With normal CFTR once the protein is synthesized it is transported to endoplasmic reticulum and the Golgi bodies for post translational modification before being integrated into the cell membrane. Modification in the gene effects in acute loss of chloride ion conveyance that interrupts the balance between sodium and chloride ions that is needed to sustain a thin mucus layer. This inequity creates a thick mucus layer thus flooring the way for prolonged infections. SNP analysis of the CFTR gene 1257 Exploiting the tools of bioinformatics one can store and search data, analyze and determine the relationship between macromolecular sequences, structures and biochemical pathways. Generally, we encounter three major forms of DNA sequences viz. genomic DNA; cdna and recombinant DNA. C DNA is comprised only of exons reverse transcribed from mrna and rdna is a hybrid of a cloning vector and a foreign DNA, synthesized in vitro. Our analysis is mainly of the genomic DNA sequences that contain both the coding (exons) and non-coding (introns) sequences. Standard format for the expression of the DNA and protein sequences come in various forms of which the most widely used are: NBRF/PIR (National Biomedical Research Foundation / Protein Information Resource) format, FASTA format and the GDE format. The available databases can be classified as under primary databases that contain raw sequence data, secondary sequences containing information on sequence patterns and organism-specific databases pertaining to a particular species. The DNA sequence of interest can be obtained from any of the three primary sequence databases, which are repositories of enormous amount of information Gen Bank (NCBI), the Nucleotide Sequence Database (EMBL) and the DNA Databank of Japan (DDBJ). The sequences thus obtained can be crosschecked using SWISS-PROT that is a collection of confirmed protein sequences that are being curated and are up-dated regularly. BLAST and FASTA are used to compare the sequence of interest with the database sequences. These are preferred to dynamic programming algorithms due to their greater speed. While BLAST gives the local alignment and can identify many highscoring regions of similarity, FASTA provides the global alignment and only identifies the highest-scoring region. The high-scoring local alignments are known as

high scoring segment pairs (HSPs). BLAST relates 3 protein sequences or 11 nucleotide sequence simultaneously whereas FASTA performs the same for 2 protein sequences or 6 nucleotide sequences. This sequence comparison can provide a knowledge almost the protein function. Once the position of the SNP has been known, it is essential to find if it appears in the coding region of the gene. Protein encoding genes are featured by the presence of an Open Reading Frame (ORF), consisting of a series of sense codons that have a start and a stop codon. Using the ORF Finder in the NCBI, we can locate the exact position of the exons in the given sequence. Further, we can find the mutated nucleotide and the corresponding amino acid. It has been proposed that genes interacting with CF Tran membrane conductance regulator (CFTR) and epithelial sodium channel (ENaC) are potential modifiers. Therefore, the impact of singlenucleotide polymorphisms (SNPs) of several of these interacters on CF disease outcome was assessed by Gallati et al., (2012). Materials involved: DNA sequence of the CFTR gene; GenBank (NCBI) database; SWISS PROT; SNP database; ORF Finder of NCBI. Materials and Methods The GenBank ID of the CFTR gene (1080) was obtained from the search engine of the NCBI homepage. The nucleotide sequence was found using the access code NM000492. This sequence was blasted with an SNP database 2. The BLAST result showed variations at 3 loci namely 5618, 5732 and 5826. To check if these SNPs lie within the coding regions, the sequences were fed to the ORF Finder 1. The exon of interest was obtained and the codes were read. To confirm the protein, the nucleotide sequence was translated and blasted with SWISS PROT (ID: P13569). Results and Discussion Sequence extraction The nucleotide sequence obtained from the NCBI database and the bioinformatics tools were used to find out the mutated protein responsible for the cystic fibrosis gene. NUCLEOTIDE SEQUENCE NM 000492. Homo sapiens cyst...[gi:6995995] >gi 90421312 ref NM_000492.3 Homo sapiens cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) (CFTR), mrna AATTGGAAGCAAATGACATCACAGCA GGTCAGAGAAAAAGGGTTGAGCGGCA GGCACCCAGAGTAGTAGGTCTTTGGCA TTAGGAGCTTGAGCCCAGACGGCCCTA GCAGGGACCCCAGCGCCCGAGAGACC ATGCAGAGGTCGCCTCTGGAAAAGGCC AGCGTTGTCTCCAAACTTTTTTTCAGCT GGACCAGACCAATTTTGAGGAAAGGA TACAGACAGCGCCTGGAATTGTCAGAC ATATACCAAATCCCTTCTGTTGATTCTG CTGACAATCTATCTGAAAAATTGGAAA GAGAATGGGATAGAGAGCTGGCTTCA AAGAAAAATCCTAAACTCATTAATGCC CTTCGGCGATGTTTTTTCTGGAGATTTA TGTTCTATGGAATCTTTTTATATTTAGG GGAAGTCACCAAAGCAGTACAGCCTCT CTTACTGGGAAGAATCATAGCTTCCTA TGACCCGGATAACAAGGAGGAACGCT CTATCGCGATTTATCTAGGCATAGGCT TATGCCTTCTCTTTATTGTGAGGACACT GCTCCTACACCCAGCCATTTTTGGCCTT CATCACATTGGAATGCAGATGAGAATA GCTATGTTTAGTTTGATTTATAAGAAG ACTTTAAAGCTGTCAAGCCGTGTTCTA GATAAAATAAGTATTGGACAACTTGTT AGTCTCCTTTCCAACAACCTGAACAAA TTTGATGAAGGACTTGCATTGGCACAT TTCGTGTGGATCGCTCCTTTGCAAGTG GCACTCCTCATGGGGCTAATCTGGGAG 1258

1259 TTGTTACAGGCGTCTGCCTTCTGTGGA CTTGGTTTCCTGATAGTCCTTGCCCTTT TTCAGGCTGGGCTAGGGAGAATGATGA TGAAGTACAGAGATCAGAGAGCTGGG AAGATCAGTGAAAGACTTGTGATTACC TCAGAAATGATTGAAAATATCCAATCT GTTAAGGCATACTGCTGGGAAGAAGC AATGGAAAAAATGATTGAAAACTTAA GACAAACAGAACTGAAACTGACTCGG AAGGCAGCCTATGTGAGATACTTCAAT AGCTCAGCCTTCTTCTTCTCAGGGTTCT TTGTGGTGTTTTTATCTGTGCTTCCCTA TGCACTAATCAAAGGAATCATCCTCCG GAAAATATTCACCACCATCTCATTCTG CATTGTTCTGCGCATGGCGGTCACTCG GCAATTTCCCTGGGCTGTACAAACATG GTATGACTCTCTTGGAGCAATAAACAA AATACAGGATTTCTTACAAAAGCAAGA ATATAAGACATTGGAATATAACTTAAC GACTACAGAAGTAGTGATGGAGAATG TAACAGCCTTCTGGGAGGAGGGATTTG GGGAATTATTTGAGAAAGCAAAACAA AACAATAACAATAGAAAAACTTCTAAT GGTGATGACAGCCTCTTCTTCAGTAAT TTCTCACTTCTTGGTACTCCTGTCCTGA AAGATATTAATTTCAAGATAGAAAGAG GACAGTTGTTGGCGGTTGCTGGATCCA CTGGAGCAGGCAAGACTTCACTTCTAA TGGTGATTATGGGAGAACTGGAGCCTT CAGAGGGTAAAATTAAGCACAGTGGA AGAATTTCATTCTGTTCTCAGTTTTCCT GGATTATGCCTGGCACCATTAAAGAAA ATATCATCTTTGGTGTTTCCTATGATGA ATATAGATACAGAAGCGTCATCAAAGC ATGCCAACTAGAAGAGGACATCTCCAA GTTTGCAGAGAAAGACAATATAGTTCT TGGAGAAGGTGGAATCACACTGAGTG GAGGTCAACGAGCAAGAATTTCTTTAG CAAGAGCAGTATACAAAGATGCTGATT TGTATTTATTAGACTCTCCTTTTGGATA CCTAGATGTTTTAACAGAAAAAGAAAT ATTTGAAAGCTGTGTCTGTAAACTGAT GGCTAACAAAACTAGGATTTTGGTCAC TTCTAAAATGGAACATTTAAAGAAAGC TGACAAAATATTAATTTTGCATGAAGG TAGCAGCTATTTTTATGGGACATTTTCA GAACTCCAAAATCTACAGCCAGACTTT AGCTCAAAACTCATGGGATGTGATTCT TTCGACCAATTTAGTGCAGAAAGAAGA AATTCAATCCTAACTGAGACCTTACAC CGTTTCTCATTAGAAGGAGATGCTCCT GTCTCCTGGACAGAAACAAAAAAACA ATCTTTTAAACAGACTGGAGAGTTTGG GGAAAAAAGGAAGAATTCTATTCTCAA TCCAATCAACTCTATACGAAAATTTTC CATTGTGCAAAAGACTCCCTTACAAAT GAATGGCATCGAAGAGGATTCTGATGA GCCTTTAGAGAGAAGGCTGTCCTTAGT ACCAGATTCTGAGCAGGGAGAGGCGA TACTGCCTCGCATCAGCGTGATCAGCA CTGGCCCCACGCTTCAGGCACGAAGGA GGCAGTCTGTCCTGAACCTGATGACAC ACTCAGTTAACCAAGGTCAGAACATTC ACCGAAAGACAACAGCATCCACACGA AAAGTGTCACTGGCCCCTCAGGCAAAC TTGACTGAACTGGATATATATTCAAGA AGGTTATCTCAAGAAACTGGCTTGGAA ATAAGTGAAGAAATTAACGAAGAAGA CTTAAAGGAGTGCTTTTTTGATGATAT GGAGAGCATACCAGCAGTGACTACAT GGAACACATACCTTCGATATATTACTG TCCACAAGAGCTTAATTTTTGTGCTAA TTTGGTGCTTAGTAATTTTTCTGGCAGA GGTGGCTGCTTCTTTGGTTGTGCTGTGG CTCCTTGGAAACACTCCTCTTCAAGAC AAAGGGAATAGTACTCATAGTAGAAA TAACAGCTATGCAGTGATTATCACCAG CACCAGTTCGTATTATGTGTTTTACATT TACGTGGGAGTAGCCGACACTTTGCTT GCTATGGGATTCTTCAGAGGTCTACCA CTGGTGCATACTCTAATCACAGTGTCG AAAATTTTACACCACAAAATGTTACAT TCTGTTCTTCAAGCACCTATGTCAACCC TCAACACGTTGAAAGCAGGTGGGATTC TTAATAGATTCTCCAAAGATATAGCAA TTTTGGATGACCTTCTGCCTCTTACCAT ATTTGACTTCATCCAGTTGTTATTAATT GTGATTGGAGCTATAGCAGTTGTCGCA GTTTTACAACCCTACATCTTTGTTGCAA CAGTGCCAGTGATAGTGGCTTTTATTA TGTTGAGAGCATATTTCCTCCAAACCT CACAGCAACTCAAACAACTGGAATCTG

1260 AAGGCAGGAGTCCAATTTTCACTCATC TTGTTACAAGCTTAAAAGGACTATGGA CACTTCGTGCCTTCGGACGGCAGCCTT ACTTTGAAACTCTGTTCCACAAAGCTC TGAATTTACATACTGCCAACTGGTTCTT GTACCTGTCAACACTGCGCTGGTTCCA AATGAGAATAGAAATGATTTTTGTCAT CTTCTTCATTGCTGTTACCTTCATTTCC ATTTTAACAACAGGAGAAGGAGAAGG AAGAGTTGGTATTATCCTGACTTTAGC CATGAATATCATGAGTACATTGCAGTG GGCTGTAAACTCCAGCATAGATGTGGA TAGCTTGATGCGATCTGTGAGCCGAGT CTTTAAGTTCATTGACATGCCAACAGA AGGTAAACCTACCAAGTCAACCAAACC ATACAAGAATGGCCAACTCTCGAAAGT TATGATTATTGAGAATTCACACGTGAA GAAAGATGACATCTGGCCCTCAGGGG GCCAAATGACTGTCAAAGATCTCACAG CAAAATACACAGAAGGTGGAAATGCC ATATTAGAGAACATTTCCTTCTCAATA AGTCCTGGCCAGAGGGTGGGCCTCTTG GGAAGAACTGGATCAGGGAAGAGTAC TTTGTTATCAGCTTTTTTGAGACTACTG AACACTGAAGGAGAAATCCAGATCGA TGGTGTGTCTTGGGATTCAATAACTTT GCAACAGTGGAGGAAAGCCTTTGGAG TGATACCACAGAAAGTATTTATTTTTTC TGGAACATTTAGAAAAAACTTGGATCC CTATGAACAGTGGAGTGATCAAGAAAT ATGGAAAGTTGCAGATGAGGTTGGGCT CAGATCTGTGATAGAACAGTTTCCTGG GAAGCTTGACTTTGTCCTTGTGGATGG GGGCTGTGTCCTAAGCCATGGCCACAA GCAGTTGATGTGCTTGGCTAGATCTGT TCTCAGTAAGGCGAAGATCTTGCTGCT TGATGAACCCAGTGCTCATTTGGATCC AGTAACATACCAAATAATTAGAAGAA CTCTAAAACAAGCATTTGCTGATTGCA CAGTAATTCTCTGTGAACACAGGATAG AAGCAATGCTGGAATGCCAACAATTTT TGGTCATAGAAGAGAACAAAGTGCGG CAGTACGATTCCATCCAGAAACTGCTG AACGAGAGGAGCCTCTTCCGGCAAGCC ATCAGCCCCTCCGACAGGGTGAAGCTC TTTCCCCACCGGAACTCAAGCAAGTGC AAGTCTAAGCCCCAGATTGCTGCTCTG AAAGAGGAGACAGAAGAAGAGGTGCA AGATACAAGGCTTTAGAGAGCAGCAT AAATGTTGACATGGGACATTTGCTCAT GGAATTGGAGCTCGTGGGACAGTCACC TCATGGAATTGGAGCTCGTGGAACAGT TACCTCTGCCTCAGAAAACAAGGATGA ATTAAGTTTTTTTTTAAAAAAGAAACA TTTGGTAAGGGGAATTGAGGACACTGA TATGGGTCTTGATAAATGGCTTCCTGG CAATAGTCAAATTGTGTGAAAGGTACT TCAAATCCTTGAAGATTTACCACTTGT GTTTTGCAAGCCAGATTTTCCTGAAAA CCCTTGCCATGTGCTAGTAATTGGAAA GGCAGCTCTAAATGTCAATCAGCCTAG TTGATCAGCTTATTGTCTAGTGAAACT CGTTAATTTGTAGTGTTGGAGAAGAAC TGAAATCATACTTCTTAGGGTTATGAT TAAGTAATGATAACTGGAAACTTCAGC GGTTTATATAAGCTTGTATTCCTTTTTC TCTCCTCTCCCCATGATGTTTAGAAAC ACAACTATATTGTTTGCTAAGCATTCC AACTATCTCATTTCCAAGCAAGTATTA GAATACCACAGGAACCACAAGACTGC ACATCAAAATATGCCCCATTCAACATC TAGTGAGCAGTCAGGAAAGAGAACTT CCAGATCCTGGAAATCAGGGTTAGTAT TGTCCAGGTCTACCAAAAATCTCAATA TTTCAGATAATCACAATACATCCCTTA CCTGGGAAAGGGCTGTTATAATCTTTC ACAGGGGACAGGATGGTTCCCTTGATG AAGAAGTTGATATGCCTTTTCCCAACT CCAGAAAGTGACAAGCTCACAGACCTT TGAACTAGAGTTTAGCTGGAAAAGTAT GTTAGTGCAAATTGTCACAGGACAGCC CTTCTTTCCACAGAAGCTCCAGGTAGA GGGTGTGTAAGTAGATAGGCCATGGGC ACTGTGGGTAGACACACATGAAGTCCA AGCATTTAGATGTATAGGTTGATGGTG GTATGTTTTCAGGCTAGATGTATGTAC TTCATGCTGTCTACACTAAGAGAGAAT GAGAGACACACTGAAGAAGCACCAAT CATGAATTAGTTTTATATGCTTCTGTTT TATAATTTTGTGAAGCAAAATTTTTTCT CTAGGAAATATTTATTTTAATAATGTTT CAAACATATATAACAATGCTGTATTTT

AAAAGAATGATTATGAATTACATTTGT ATAAAATAATTTTTATATTTGAAATATT GACTTTTT ATGGCACTAGTATTTCTATGAAATATT ATGTTAAAACTGGGACAGGGGAGAAC CTAGGGTGATATTAACCAGGGGCCATG AATCACCTTTTGGTCTGGAGGGAAGCC TTGGGGCTGATGCAGTTGTTGCCCACA GCTGTATGATTCCCAGCCAGCACAGCC TCTTAGATGCAGTTCTGAAGAAGATGG TACCACCAGTCTGACTGTTTCCATCAA GGGTACACTGCCTTCTCAACTCCAAAC TGACTCTTAAGAAGACTGCATTATATT TATTACTGTAAGAAAATATCACTTGTC AATAAAATCCATACATTTGTGTGAAA Fig.1 Diagrammatic representation of the cystic fibrosis condition through gene regulation Fig.2 CFTR protein showing point mutation Table.1 Explanation of the cystic fibrosis based on genetic pattern MOTHER N n N Non- carrier Carrier NN Nn FATHER n Carrier Nn Cystic Fibrosis nn 1261

Strand plus/plus specificity 427/430 (99%) Query: 5418 gtatgttagtgcaaattgtcacaggacagcccttctttccacagaagctccaggtagagg 5477 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I Subject: 1 gtatgttag tg caaattg tcacaggacagcccttctttccacagaagctccaggtagagg 60 Query: 5478 gtgtgtaagtagataggccatgggcactgtgggtagacacacatgaagtccaagcattta 5537 11.111111111111111111.111111111.111111.1.11111 11111 11111111111111 Subject: 61 gtgtgta agtagataggccat gggcac tg'tgggtagacacacatgaagtccaagcattta 120 Query: 5538 gatgtataggt tgatggtggtatgttttcaggctagatgtatgtacttcatgctgtctac 5597 1111111111111111111111111111111-11111111111111111111111111111 Subject: 121 gatgtataggttgatggtggt atgttttcaggctagatgtatgtacttca tgctgtctac 180 Query: 5598 acta gag agaatgagagac acactgaagaagcaccaatc atgaattag ttttatatgct 5657 11111111111111111111111111111111111111111111111111111111111 Subject: 181 actaagagagaatgagagacmcactgaagaagcaccaatcatgaattagttttatatgct 240 Query:5658 ctgrn:t:a"taarn:t:gt:gaagcaaaartt:rtt:ct:ct:aggaaat:artt:at:rn:aat:aat:g 5717 I I I I I I II I I I I II I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I Subject: 241 tctgttttataattttgtgaagcaaaattttttctctaggaaatatttattttaataatg 300 Query: 5718 tttcaaa catatattacaatgctgtattttaaaagaatgattatgaattacatttgtata 5777 I I I I I I I I I I I I II I I I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I Subject: 301tttcaaacatatataacaatgctgtattttaaaagaatgattatgaattacatttgtata 360 Query: 5778 aaataatttttatatttgaaatattgactttttatggcactagtatttttatgaaatatt 5837 11111111111111111111111111111111111111111111111111111111111 Subject: 361 aaataatttttatatttgaaatattgactttttatggcactagtatttctatgaaatatt 420 Query: 5838 atgttaaaac 5847 11111111 1 1 Subject: 421 atgttaaaac 430 This result shows the comparison of the CFTR gene with the most similar SNP database sequence. As is evident from the identities value, out of the 430 nucleotides, there are three positions at which the bases differ. It is also necessary to note that two positive strands of DNA have been compared. The E value (Expect value) of O.O is indicative of the highest similarity. This shows that the locus 5618 appears in the coding region. The triplet 'cac' codes for the amino acid Histidine. 5539 atgtataggttgatggt ggtatgttttcaggctagatgt atgtac MYRLMVV CFQARCMY 5584ttcatgctgtctacactaagagaga tgagagacacactgaagaa FMLSTLRENERHTEE 5629 gcaccaatc atgaattag APIMN* Translated protein sequence '>spip135691cftr _HUMAN Cystic fibrosis Trans membrane conductance regulator (CFTR) (CAMP dependent chloride channel) -Homo sapiens (Human). MQRSPLEKASVVSKLFFSWTEPILRKGYR QRLELSDIYQIPSVDSADNLSEKLEREWDR ELASKKNPKLINALRRCFFWRFMFYGIFLY LGEVTKAVQPLLLGRIIASYDPDNKEERSI AIYLGIGLCLLFIVRTLLLHPAIFGLHHIGM QMRIAMFSLIYKKTLKLSSRVLDKISIGQL VSLLSNNLNKFDEGLALAHFVWIAPLQVA LLMGLIWELLQASAFCGLGFLIVLALFQA GLGRMMMKYRDQRAGKISERLVITSEMIE NIQSVKAYCWEEAMEKMIENLRQTELKLT RKAAYVRYFNSSAFFFSGFFVVFLSVLPYA LIKGIILRKIFTTISFCIVLRMAVTRQFPWA VQTWYDSLGAINKIQDFLQKQEYKTLEYN LTTTEVVMENVTAFWEEGFGELFEKAKQ NNNNRKTSNGDDSLFFSNFSLLGTPVJKDI NFKIERGQLLAVAGSTGAGKTSLLMMIMG ELEPSEGKIKHSGRISFCSQFSWIMPG'I'IKE NIIFGVSYDEYRYRSVIKACQLEEDISKFAE KDNIVLGEGGITLSGGQRARISLARAVYK DADLYLLDSPFGYLDVLTEKEIFESCVCKL MANKTRILVTSKMEHLKKADKILILHEGSS YFYGTFSELQNLQPDFSSKLMGCDSFDQFS AERRNSILTETLHRFSLEGDAPVSWTETKK QSFKQTGEFGEKRKNSILNPINSIRKFSIVQ 1262

KTPLQMNGIEEDSDEPLERRLSLVPDSEQG EAILPRISVISTGPTLQARRRQSVLNLMTHS VNQGQNIHRKTTASTRKVSLAPQANLTEL DIYSRRLSQETGLEISEEINEEDLKECFFDD MESIPAVTTWNTYLRYITVHKSLIFVLIWC LVIFLAEVAASLVVLWLLGNTPLQDKGNS THSRNNSYAVIITSTSSYYVFYIYVGVADT LLAMGFFRGLPLVHTLITVSKILHHKMLHS VLQAPMSTLNTLKAGGILNRFSKDIAILDD LLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFV ATVPVIVAFIMLRAYFLQTSQQl,KQJESEG RSPIFTHLVTSLKGLWTLRAFGRQPYFETL FHKALNLHTANWFLYLSTLRWFQMRIEMI FVIFFIAVTFISILTTGEGEGRVGIILTLAMNI MSTLQWAVNSSIDVDSLMRSVSRVFKFID MPTEGKPTKSTKPYKNGQLSKVMIIENSH VKKDDIWPSGGQMTVKDLTAKYTEGGNA ILENISFSISPGQRVGLLGRTGSGKSTLLSA FLRLLNTEGEIQIDGVSWDSITLQQWRKAF GVIPQKVFIFSGTFRKNLDPYEQWSDQEIW KVADEVGLRSVIEQFPGKLDFVLVDGGCV LSHGHKQLMCLARSVLSKAKILLLDEPSA HLDPVTYQIIRRTLKQAFADCTVILCEHRIE AMLECQQFLVIEENKVRQYDSIQKLLNER SLFRQAISPSDRVKLFPHRNSSKCKSKPQIA ALKEETEEEVQDTRL This is the protein sequence obtained from SWISS-PROT. Figure 2 shows the E-chain of the protein molecule. The normal protein contains two histidine residues at positions 608 and 620. The histidine at 620 is converted into proline due to a point mutation of cac to ccc. The amino acid histidine is depicted in red and the proline residue is sketched in green. In conclusion, the SNP analysis clearly proves that the disorder, cystic fibrosis is the result of a change in a single nucleotide of the CFTR gene. This study underlines the significance of single nucleotide polymorphisms. Though, environmental and lifestyle factors play a major role in causing a disease, the genetic factors are more important to be considered as they can be predetermined and accordingly treated. This disfunction can also be attributed to the disease chronic rhino sinusitis. This type of analysis using bioinformatics is a novel approach to the identification and characterization of the mutated genes. It is accurate and faster than the wet lab techniques. This strategy can be followed for a variety of other genetic disorders and can serve useful in drug designing. References Four case reports of Chinese cystic fibrosis patients and literature review (2017) A survey of newborn screening for cystic fibrosis in Europe (2007) Exploring cystic fibrosis using bioinformatics tools: A module designed for the freshman biology course (2011) Identification of SNPs in the cystic fibrosis interactome influencing pulmonary progression in cystic fibrosis (2012) Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, et al., Science 1989; 245:1066-73. (AN: 89368940) Structure of O-Antigen and Hybrid Biosynthetic Locus in Burkholderia cenocepacia Clonal Variants Recovered from a Cystic Fibrosis Patient (2017). How to cite this article: Balakrishnaraja Rengaraju, Keerthana, Akila, K. Pavithra, D. Vinotha, Sri Harsha Challapalli and Arunava Das. 2017. Inquest of the SNP in Cystic Fibrosis A Bioinformatic Approach. Int.J.Curr.Microbiol.App.Sci. 6(8): 1255-1263. doi: https://doi.org/10.20546/ijcmas.2017.608.152 1263