STUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX

Similar documents
Two Mark question and Answers

Basic Local Alignment Search Tool

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment

Bioinformatic analysis of similarity to allergens. Mgr. Jan Pačes, Ph.D. Institute of Molecular Genetics, Academy of Sciences, CR

Dynamic Programming Algorithms

Data Retrieval from GenBank

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station

Introduction to BIOINFORMATICS

The University of California, Santa Cruz (UCSC) Genome Browser

Types of Databases - By Scope

Textbook Reading Guidelines

Genome Sequence Assembly

Engineering Genetic Circuits

Making Sense of DNA and Protein Sequences. Lily Wang, PhD Department of Biostatistics Vanderbilt University

ELE4120 Bioinformatics. Tutorial 5

Protein Bioinformatics Part I: Access to information

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

G4120: Introduction to Computational Biology

PCA and SOM based Dimension Reduction Techniques for Quaternary Protein Structure Prediction

BIOINFORMATICS IN BIOCHEMISTRY

Chapter 2: Access to Information

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

Algorithms in Bioinformatics ONE Transcription Translation

(Very) Basic Molecular Biology

What Are the Chemical Structures and Functions of Nucleic Acids?

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Introduction to Bioinformatics

Introduction to Bioinformatics

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Evolutionary Genetics. LV Lecture with exercises 6KP

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Data Mining for Biological Data Analysis

Gene-centered resources at NCBI

Why learn sequence database searching? Searching Molecular Databases with BLAST

Exploring Similarities of Conserved Domains/Motifs

Examination Assignments

This is the accepted version of this conference paper: Buckingham, Lawrence and Hogan, James and Mann, Scott

Big picture and history

Overview of Health Informatics. ITI BMI-Dept

Entrez and BLAST: Precision and Recall in Searches of NCBI Databases

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

Genome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)

COMPARATIVE ANALYSIS AND INSILICO CHARACTERIZATION OF LEA, DREB AND OsDHODH1 GENES INVOLVED IN DROUGHT RESISTANCE OF ORYZA SATIVA

DNA is normally found in pairs, held together by hydrogen bonds between the bases

Viewing the Proteome from Oligopeptides and Prediction of Protein Function

DNA Sequence Alignment based on Bioinformatics

ipac: identifcation of Protein Amino acid Mutations

Finding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Sequence logos for DNA sequence alignments

An Evolutionary Optimization for Multiple Sequence Alignment

Database Searching and BLAST Dannie Durand

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Admission Exam for the Graduate Course in Bioinformatics. November 17 th, 2017 NAME:

Article A Teaching Approach From the Exhaustive Search Method to the Needleman Wunsch Algorithm

Iterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data

Lab 1: A review of linear models

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

MATH 5610, Computational Biology

Textbook Reading Guidelines

Bioinformatics : Gene Expression Data Analysis

03-511/711 Computational Genomics and Molecular Biology, Fall

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

CAP 5510/CGS 5166: Bioinformatics & Bioinformatic Tools GIRI NARASIMHAN, SCIS, FIU

Biotechniques ELECTROPHORESIS. Dr. S.D. SARASWATHY Assistant Professor Department of Biomedical Science Bharathidasan University Tiruchirappalli


SeeDNA: A Visualization Tool for K -string Content of Long DNA Sequences and Their Randomized Counterparts

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

Outline. Structure of DNA DNA Functions Transcription Translation Mutation Cytogenetics Mendelian Genetics Quantitative Traits Linkage

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Bioinformatics, in general, deals with the following important biological data:

Name Date Class. The Central Dogma of Biology

NCBI web resources I: databases and Entrez

DNA genetic artificial fish swarm constant modulus blind equalization algorithm and its application in medical image processing

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Chapter 5: The Structure and Function of Large Biological Molecules

Gene Prediction in Eukaryotes

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Molecular Biology and Pooling Design

GENOME ANALYSIS AND BIOINFORMATICS

Agilent Software Tools for Mass Spectrometry Based Multi-omics Studies

Homework 4. Due in class, Wednesday, November 10, 2004

Bioinformatics Database of Some Leguminous Trees in Anand district of Gujarat state in India Sagar Patel 1, Kalpesh Anjaria 2, Hetalkumar Panchal 3

Class XII Chapter 6 Molecular Basis of Inheritance Biology

Exam MOL3007 Functional Genomics

Reveal Motif Patterns from Financial Stock Market

Sequence Based Function Annotation

Assembly and Analysis of Extended Human Genomic Contig Regions

Single-cell sequencing

Extracting Database Properties for Sequence Alignment and Secondary Structure Prediction

Introduc)on to Databases and Resources Biological Databases and Resources

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

Transcription:

Vol. 4, No.4,. STUDYING THE SECONDARY STRUCTURE OF ACCESSION NUMBER USING CETD MATRIX Anamika Dutta Department of Statistics, Gauhati University, Guwahati-784, Assam, India anamika.dut8@gmail.com Kishore K. Das Department of Statistics, Gauhati University, Guwahati-784, Assam, India daskkishore@gmail.com Professional Paper doi:.597/jouproman4-57 Abstract: This paper, we have tried to analyze about the Secondary Structure of nucleotide sequences of rice. The data have been collected from NCBI (National Centre for Biotechnology Information) using Nucleotide as data base. All the programs were developed using R programming language using sequinr package. Here, we have used CETD matrix method to study the prediction. The conclusions are drawn accordingly. Key words: Secondary Structure, ATD Matrix, RTD Matrix, CETD Matrix. Introduction: Proteomics as the name suggest is a very vast field of research. Here we have tried to study the secondary structure of accession number for rice (Molina et al., ) using CETD matrix (Kuppuswami et al., 5). We have considered 4 accession numbers for rice from the paper Cho et al., where the accession number were already been used for other studies (Cho et al., ). The Accession Numbers for rice (Cho et al., ) are: D758, M49, X58877, Z9, X755, U7, U75, X49, D789, D785, D794, L4, D9, U478, M959, X58, U844, D4, U77, X559, U7, D, U49, L758 Primary structure of protein is defined as first level of protein i.e., the sequence of amino acids is basically known as the primary structure of protein. The primary structure of protein has different characteristics which is dependent on hydrogen bonding and is known as secondary structure of protein. In other words, secondary structure is the second level of primary structure (Technical Brief 9). Here we only secondary structure of the nucleotide sequence of rice has been considered. Out of the secondary structures we have selected only six particles viz. Helix Residue, Beta Sheet Residue, Beta Turn Residue, Helix Region, Beta Sheet Region and Beta Turn Region. The objective of the paper is:. To first split the accession numbers into some groups using lottery method.. To find which is the most conservative accession number group using CETD matrix. Method and materials: We have used NCBI (Pruitt et al., 7 and Altschul et al., 99) nucleotide data base to find the sequences of rice. Later, using Chaufasman Algorithm and sequinr package in R programming language, the value of six parts of secondary structure of protein for 4 accession numbers have been found. 5

Vol. 4, No.4,. Firstly, the Initial Raw Data matrix has to be created (Porchelvi, S. R. and Vanitha, R. (), Narayanamoorthy et al., and Kuppuswami et al., 5) which can be constructed by taking the accession numbers in rows and the secondary particles in columns. Next we contrive Average Time Dependent (ATD) Matrix (Porchelvi, S. R. and Vanitha, R. (), Narayanamoorthy et al., and Kuppuswami et al., 5) which is obtained by dividing the entries of Initial Raw Data matrix by the difference between the upper range and the lower range of accession number. Following the mean (µ j ) and standard deviation (σ j ) of each column has been found separately for ATD matrix. Now using the mean and standard deviation for each column and choosing a parameter α lying between [,] and construct a Refined Time Dependent Matrix (RTD Matrix) using the formulae (Porchelvi, S. R. and Vanitha, R. (), Narayanamoorthy et al., and Kuppuswami et al., 5): If a ij (µ j - α σ j ) then e ij = - If a ij ϵ (µ j - α σ j, µ j + α σ j ) then e ij = If a ij (µ j + α σ j ) then e ij = a ij are the elements of ATD matrix and e ij are the elements of RTD matrix (Porchelvi, S. R. and Vanitha, R.()). The RTD matrix consists of only -, and elements. Succeeding, each row of the RTD matrix has to be added. The maximum sum gives the group of accession number which is giving a better secondary prediction. Lastly, we combine all the RTD matrices and get a Combined Effective Time Dependent Data Matrix (CETD matrix) Porchelvi, S. R. and Vanitha, R. (), Narayanamoorthy et al., and Kuppuswami et al., 5). The row sum is obtained for CETD matrix and conclusions are made using the CETD matrix. Subsequently, all the matrices are represented using line diagram.. Estimation: The grouping of the accession numbers into six parts has been done using lottery method. We have numbered the accession numbers like: as D758, as D, as M49, 4 as D789, 5 as X58877, as Z9, 7 as D4, 8 as L758, 9 as X755, as U7, as U75, as D794, as X49, 4 as D785, 5 as L4, as D9, 7 as U478, 8 as M959, 9 as X58, as U844, as U77, as X559, as U7 and 4 as U49. The numbering has been done so that we can group the accession number unbiasedly using Lottery Method; a method used in simple random sampling. The numbering of accession number has been done accordingly. Subsequently, the 4 accession number has been divided into groups are -4, 5-8, 9-, -, 7- and -4. With this our first objective gets fulfilled. We define, X : Helix Residue X : Beta Sheet Residue X : Beta Turn Residue X 4 : Helix Region X 5 : Beta Sheet Region X : Beta Turn Region Here, CETD matrix has been used in order to check which group of accession number is giving a conservative region. The raw data of 4 accession number has been converted in the form of matrix, known as Initial Raw Data matrix.

Vol. 4, No.4,. Table : Initial Raw Data Matrix Accession Number X X X X 4 X 5 X -4 4 4 5 48 5 4 arbitrary values of α has been selected viz..,.5 and.. Table 4: Values of (µ j - α σ j ) and (µ j + α σ j ) for different values of α 5-8 99 54 59 8 74 9 9-5 9 8 5 5-87 85 9 9 7-9 9 4 4 7-4 45 495 55 4 7..5.95 8.4.8.8 5.9 5.9.89 4. 5.8.9 5.8 5.7 4.47 7..7.4 5.4 5. 9.7 5.7 5.5. 5.9 5.77 Table : The ATD Matrix. 7...7.8 4.58 5.8.8 9.4.99.5.4. Access ion No. X X X X 4 X 5 X -4 5.75 7.5.5 5-8 7.5 8.5 9.75 7 8.5 7.5 9-5.5 98.5-7.75 5 4.5 8 7.5 7.5 7-98.5 5.5.5 4.5-4.75.75 8.75 8.5 5.5 Table : The Mean and Standard Deviation of the above ATD Matrix Let us create the RTD matrices for various values of α. The RTD matrix for α =. The RTD matrix for α =.5 Mean (µ).9. 4..7 5.5 5.54 Standard Deviation (σ) 49.7.9 9.54.87.7.54 It has already been mentioned that the parameter α for the construction in RTD matrix lies in the range [,], so three 7

Vol. 4, No.4,. 8 The RTD matrix for α =. Adding the elements of the three RTD matrices, we get the CETD matrix, which is given below: The CETD matrix The row sum of CETD matrix 8 9 8

Vol. 4, No.4,. From figure 4, that is the line diagram above, it can be seen that the highest peak is in the second group i.e., in the group 5-8. Which implies the region from 5 to 8 is conservative. The second objective fulfills here.. Conclusion: From the CETD matrix, it is seen that the row sum of CETD matrix is highest in the interval 5-8 which means that the group of accession numbers viz. X58877, Z9, D4 and L758 are giving a better secondary prediction than the other intervals of accession numbers, which means that the region is conservative and the sequence of the accession numbers are identical. Also it has been seen that the lowest negative is in the interval 7- which means that the group of accession numbers viz. U478, M959, X58 and U844 are giving the least secondary prediction among the other intervals. It concludes that the sequences in these accession numbers are least alike. 4. Acknowledgement: The author Miss. Anamika Dutta thank to Department of Science and Technology (DST), India for providing financial assistance for carrying out this work as an INSPIRE Fellow. 5. Reference:. Cho, Y. G., Ishii, T., Temnykh, S., Chen, X., Lipovich, L., McCouch, R. S., Park, D. W., Ayres, N., and Cartinhour, S. (). Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice.(oryza sativa L.) Theor Appl Genet, Vol., pages: 7-7. Springer-Verlag. Pruitt, K. D., Tatusova, T., and Maglott, D. R. (7). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 5, pages: D D5. Molina, J., Sikora, M., Garud, N., Flowers, M. J., Rubinstein, S., Reynolds, A., Huang, P., Jackson, S., Schaal, A. B., Bustamante, D. C., Boyko, R. A., and Purugganan, D. M. (). Molecular evidence for a single evolutionary origin of domesticated rice. PNAS, Vol. 8, No., pages: 85-85 4. Kuppuswami, G., Sujatha, R., Vasantha Kandaswamy, W.B. (5). Study of traffic flow using CETD Matrix. Indian Journal of Science of Technology, Vol 8(4): :5 5. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J.(99). Basic local alignment search tool. Journal of molecular biology, 5():4 4. Porchelvi, S. R. and Vanitha, R. (). On studying a health problem using fuzzy matrices. International Review of Fuzzy Mathematics, Vol., No., page: 5-7. Narayanamoorthy, S., V. M. S. and Sivakamasundari, K. (). Fuzzy cetd matrix to estimate the maximum age group victims of pesticide endosulfan problems faced in Kerala. International Journal of Mathematics and Computer, Vol., Issue, page: 7-8. Particle Sciences Drug Development Services, Technical Brief 9, Vol 8 9