IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay 1* and Mr. Suman Chakraborty

Similar documents
1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

Fishy Amino Acid Codon. UUU Phe UCU Ser UAU Tyr UGU Cys. UUC Phe UCC Ser UAC Tyr UGC Cys. UUA Leu UCA Ser UAA Stop UGA Stop

Level 2 Biology, 2017

Biomolecules: lecture 6

Protein Synthesis. Application Based Questions

ANCIENT BACTERIA? 250 million years later, scientists revive life forms

Biomolecules: lecture 6

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

Deoxyribonucleic Acid DNA. Structure of DNA. Structure of DNA. Nucleotide. Nucleotides 5/13/2013

How life. constructs itself.

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

Important points from last time

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

Honors packet Instructions

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR

p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION

Just one nucleotide! Exploring the effects of random single nucleotide mutations

7.016 Problem Set 3. 1 st Pedigree

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones?

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Chemistry 121 Winter 17

Protein Synthesis: Transcription and Translation

A Zero-Knowledge Based Introduction to Biology

PROTEIN SYNTHESIS Study Guide

Human Gene,cs 06: Gene Expression. Diversity of cell types. How do cells become different? 9/19/11. neuron

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

CISC 1115 (Science Section) Brooklyn College Professor Langsam. Assignment #6. The Genetic Code 1

Codon Bias with PRISM. 2IM24/25, Fall 2007

PRINCIPLES OF BIOINFORMATICS

DNA Base Data Hiding Algorithm Mohammad Reza Abbasy, Pourya Nikfard, Ali Ordi, and Mohammad Reza Najaf Torkaman

Describe the features of a gene which enable it to code for a particular protein.

Data Hiding Based on Contrast Mapping Using DNA Medium

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Gene Prediction

Enduring Understanding

Chapter 13 From Genes to Proteins

Molecular Level of Genetics

CHAPTER 12- RISE OF GENETICS I. DISCOVERY OF DNA A. GRIFFITH (1928) 11/15/2016

Station 1: DNA Structure Use the figure above to answer each of the following questions. 1.This is the subunit that DNA is composed of. 2.

Gene Expression REVIEW Packet

Bioinformation by Biomedical Informatics Publishing Group

DNA sentences. How are proteins coded for by DNA? Materials. Teacher instructions. Student instructions. Reflection

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

7.013 Problem Set

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Inheritance of Traits

BIOL591: Introduction to Bioinformatics Comparative genomes to look for genes responsible for pathogenesis

PGRP negatively regulates NOD-mediated cytokine production in rainbow trout liver cells

Chapter 3: Information Storage and Transfer in Life

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

Emergence of the Canonical Genetic Code

Evolution of protein coding sequences

Mechanisms of Genetics

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

DNA Begins the Process

7-9/99 Neuman Chapter 23

FROM MOLECULES TO LIFE

Name Date Class. The Central Dogma of Biology

FROM MOLECULES TO LIFE

Protein Synthesis Fairy Tale

7.013 Exam Two

SUPPLEMENTARY INFORMATION

PROTEIN SYNTHESIS WHAT IS IT? HOW DOES IT WORK?

Problem Set 3

Translating the Genetic Code. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences

Sections 12.3, 13.1, 13.2

Lecture 19A. DNA computing

Four different segments of a DNA molecule are represented below.

If stretched out, the DNA in chromosome 1 is roughly long.

Part 1: The Flower. Activity #60 Mendel, First Geneticist (Part 1 and Part 2) Challenge Question: Initial Thoughts:

It has not escaped our notice that the specific paring we have postulated immediately suggest a possible copying mechanism for the genetic material

Genes & Inheritance Series: Set 1. Copyright 2005 Version: 2.0

Forensic Science: DNA Evidence Unit

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 1. Bioinformatics 1: Biology, Sequences, Phylogenetics

Today in Astronomy 106: polymers to life

Basic concepts of molecular biology

Today in Astronomy 106: the important polymers and from polymers to life

BIOLOGY. Gene Expression. Gene to Protein. Protein Synthesis Overview. The process in which the information coded in DNA is used to make proteins

2. Examine the objects inside the box labeled #2. What is this called? nucleotide

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Basic concepts of molecular biology

Gene Prediction. Srivani Narra Indian Institute of Technology Kanpur

6. Which nucleotide part(s) make up the rungs of the DNA ladder? Sugar Phosphate Base

MCB 407 (MICROBIAL GENETICS)

DNA stands for deoxyribose nucleic acid

DNA: The Molecule of Heredity

MOLECULAR EVOLUTION AND PHYLOGENETICS

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

Chapter 7 DNA Structure and Gene Function

Quiz 2 on Wednesday 4/3 from 11-noon. Bring IDs to exam. Review Session 4/1 from 7-9 pm in Tutoring Session 4/2 from 4-6 pm in

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

Q1. Lysozyme is an enzyme consisting of a single polypeptide chain of 129 amino acids.

1 How DNA Makes Stuff

Keywords: DNA methylation, deamination, codon usage, genome, genomics

Biology. Genes.

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).

Why are proteins important?

Transcription:

Volume 2, No. 4, April 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay 1* and Mr. Suman Chakraborty 1 Dept. of Computer Sc. & Engg, University of Calcutta 92 A.P.C. Road, Kolkata 700009, India skb1@vsnl.com 2 B.P. Poddar Institute of Management and Technology 137, V.I.P. Road, Kolkata 700052, India Abstract: Recently, biological techniques become more and more popular, as they are applied to many kinds of applications, authentication protocols, biochemistry, and cryptography. One of the most interesting biology techniques is deoxyribo nucleic acid and using it in such domains. Hiding secret data in deoxyribo nucleic acid becomes an important and interesting research topic. Some researchers hide the secret data in transcribed deoxyribo nucleic acid, translated ribo nucleic acid regions, or active coding segments where it doesn't mention to modify the original sequence, but others hide data in non-transcribed deoxyribo nucleic acid, non-translated ribo nucleic acid regions, or active coding segments. Unfortunately, these schemes either alter the functionalities or modify the original deoxyribo nucleic acid sequences. DNA has the ability to store large amount of digital data. This paper presents a method to hide an image in DNA sequence using arithmetic encoding. Keywords : DNA, mrna, Arithmetic Encoding and Decoding INTRODUCTION Today, network technologies have improved a lot so that more and more people access the remote facilities and send or receive various kinds of digital data over the Internet. However, the Internet is a public but insecure channel to transmit data. Thus, important information must be manipulated to be concealed while delivered via the Internet such that only the authorized receiver can get it. There are two main methods for concealing secret message traditional encryption and steganography. The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Human DNA consists of about 3 billion bases, and more than 99 percent of those bases are the same in all people. The order, or sequence, of these bases determines the information available for building and maintaining an organism, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences. DNA bases pair up with each other, A with T and C with G, to form units called base pairs. Each base is also attached to a sugar molecule and a phosphate molecule. Together, a base, sugar, and phosphate are called a nucleotide. Nucleotides are arranged in two long strands that form a spiral called a double helix. The structure of the double helix is somewhat like a ladder, with the base pairs forming the ladder s rungs and the sugar and phosphate molecules forming the vertical sidepieces of the ladder [1-4]. An important property of DNA is that it can replicate, or make copies of itself. Each strand of DNA in the double helix can serve as a pattern for duplicating the sequence of bases. This is critical when cells divide because each new cell needs to have an exact copy of the DNA present in the old cell. This is shown in figure 1. Figure 1 DNA is a double helix formed by base pairs attached to a sugarphosphate backbone. mrna is transcribed from DNA, carrying information for protein synthesis. Three consecutive nucleotides in mrna encode an amino acid or a stop signal for protein synthesis. The trinucleotide is known as a codon. This is shown in figure 2. 2. Figure -2 The sequence relationship of DNA, mrna and the encoded peptide JGRCS 2010, All Rights Reserved 167

The sequence of mrna is complementary to DNA's template strand, and thus the same as DNA's coding strand, except that T is replaced by U. Image Arbitrarily chosen mrna Sequence Corresponding Amino acid Arithmetic coding Corresponding mrna Sequence Amino acid Figure 3 Image Hiding Scheme Decoding Algorithm 1: Image embedding technique in mrna Step1:Choose a arbitrary mrna sequence for binary sequence of image. Step2: Convert mrna sequence into amino acid sequence. ARITHMETIC CODING In arithmetic coding, a message is represented by an interval of real numbers between 0 and 1. As the message becomes longer, the interval needed to represent it becomes smaller, and the number of bits needed to specify that interval grows. Successive symbols of the message reduce the size of the interval in accordance with the symbol probabilities generated by the model. The more likely symbols reduce the range by less than the unlikely symbols and hence add fewer bits to the message [5-6]. Before anything is transmitted, the range for the message is the entire interval [0, l), denoting the half-open interval 0 5 x < 1. As each symbol is processed, the range is narrowed to that portion of it allocated to the symbol. For example, if we are going to encode the random message BILL GATES, we would have a probability distribution that looks like this: Character Probability --------- ----------- SPACE 1/10 A 1/10 B 1/10 E 1/10 G 1/10 I 1/10 L 2/10 S 1/10 T 1/10 Once the character probabilities are known, the individual symbols need to be assigned a range along a probability line, which is nominally 0 to 1. It doesn t matter which characters are assigned which segment of the range, as long as it is done in the same manner by both the encoder and the decoder. The nine character symbol set use here would look like this: Character Probability Range --------- ----------- ----------- SPACE 1/10 0.00-0.10 A 1/10 0.10-0.20 B 1/10 0.20-0.30 E 1/10 0.30-0.40 G 1/10 0.40-0.50 I 1/10 0.50-0.60 L 2/10 0.60-0.80 S 1/10 0.80-0.90 T 1/10 0.90-1.00 Each character is assigned the portion of the 0-1 range that corresponds to its probability of appearance. Note also that the character owns everything up to, but not including the higher number. So the letter T in fact has the range 0.90 0.9999. The most significant portion of an arithmetic coded message belongs to the first symbol to be encoded. When encoding the message BILL GATES, the first symbol is B. In order for the first character to be decoded properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30. What we do to encode this number is keep track of the range that this number could fall in. So after the first character is encoded, the low end for this range is 0.20 and the high end of the range is 0.30. After the first character is encoded, we know that our range for our output number is now bounded by the low number and the high number. What happens during the rest of the encoding process is that each new symbol to be encoded JGRCS 2010, All Rights Reserved 168

will further restrict the possible range of the output number. The next character to be encoded, I, owns the range 0.50 through 0.60. If it was the first number in our message, we would set our low and high range values directly to those values. But I is the second character. So what we do instead is say that I owns the range that corresponds to 0.50-0.60 in the new subrange of 0.2 0.3. This means that the new encoded number will have to fall somewhere in the 50th to 60th percentile of the currently established range. Applying this logic will further restrict our number to the range 0.25 to 0.26. Algorithm: The algorithm to accomplish this for a message of any length is shown below: Set low to 0.0 Set high to 1.0 While there are still input symbols do get an input symbol code_range = high - low. high = low + range*high_range(symbol) low = low + range*low_range(symbol) End of While output low The algorithm for decoding the incoming number looks like this: get encoded number Do find symbol whose range straddles the encoded number output the symbol range = symbol low value - symbol high value subtract symbol low value from encoded number divide encoded number by range until no more symbols Example For the above image corresponding mrna sequence is CUU CCG UGC GAU GUA GCC GGU AUC UUU GGA CAU UGG UAU AUU UCA UGC which is chosen arbitrarily. Converting this mrna sequence into amino acid sequence with the help of Table-1.The amino acid sequence is Leu Pro Cys Asp Val Ala Gly Ile Phe Gly His Trp Tyr Ile Ser Cys. Table-1.mRNA to amino acid mapping There are only twenty distinct amino acids, encoded from the mrna codon. This clearly shows that some codons might be mapped to the same amino acids. For example, the codons GCU, GCC, GCA and GCA are mapped to the same amino acid Leu. Due to this repetition we can get different mrna sequence for same amino acid sequence. Arithmetic coding is used for generating different mrna and converting the image into a decimal number. JGRCS 2010, All Rights Reserved 169

ARITHMETIC ENCODING Samir K. Bandyopadhyay et al, Journal of Global Research in Computer Science,2 (4), April 2011, 0-0.16 Leu CUU 0.16-0.32 Leu CUC 0.32-0.48 Leu CUG 0.48-0.64 Leu UUA 0.64-0.80 Leu UUA 0.8-1 Leu UUG 0.16-0.2 Pro CCU 0.2-0.24 Pro CCC 0.24-0.28 Pro CCA 0.28-0.32 Pro CCG 0.24-0.26 Cys UGU 0.26-0.28 Cys UGC 0.26-0.27 Asp GAU 0.27-0.28 Asp GAC 0.27-O.2725 Val GUU 0.2725-0.275 Val GUC 0.275-0.2775 Val GUA 0.2775-0.278 Val GUB 0.2725-0.273125 Ala GCU 0.273125-0.27375 Ala GCC 0.27375-0.274375 Ala GCA 0.274375-0.275 Ala GCG 0.2725-0.2726563 Gly GGU 0.2726563-0.2728126 Gly GGC 0.2728126-0.2729689 Gly GGA 0.2729689-0.2731252 Gly GGG 0.2728126-0.2728642 Ile AUU 0.2728642-0.2729158 Ile AUC 0.2729158-0.2734318 Ile AUA 0.2728126-0.2728384 Phe UUU O.2728384-0.2728642 Phe UUC 0.2728384-0.2728449 Gly GGU 0.2728449-.2728514 Gly GGC 0.2728514-0.2728579 Gly GGA 0.2728579-0.2728584 Gly GGG 0.2728499-0.2728482 His CAU 0.2728482-0.2728515 His CAC ----- Trp UGG 0.2728482-0.2728499 Try UAU 0.2728499-0.2728515 Try UAC 0.2728499-0.2728505 Ile AUU 0.2728505-2728511 Ile AUC 0.2728511-0.2728517 Ile AUA 0.2728505-0.272850596 Ser UCU 0.272850596-0.272850692 Ser UCC 0.272850692-0.272850788 Ser UCA 0.272850788-0.272850884 Ser UCG 0.272850884-0.27285098 Ser CGU 0.27285098-0.272851076 Ser CGC 0.272850788-0.272850836 Cys UGU 0.272850836-0.272850884 Cys UGC JGRCS 2010, All Rights Reserved 170

From this method got the different mrna sequence of same amino acid sequence. The sequence is CUC CCA UGC GAC GUC GCU GGA AUU UUC GGC CAG UGG UAC AUC UCG UGU. It will produce the same image. CONCLUSION Today, network technologies have improved a lot so that more and more people access the remote facilities and send or receive various kinds of digital data over the Internet. However, the Internet is a public but insecure channel to transmit data. Thus, important information must be manipulated to be concealed while delivered via the Internet such that only the authorized receiver can get it. There are two main methods for concealing secret message traditional encryption and steganography. DNA has many characteristics which make it a perfect steganographic media. These techniques depend on the high randomness of the DNA to hide any message without being noticed. This paper presents a method to hide an image in DNA sequence using arithmetic encoding. REFERENCE [1] Peterson P., Hiding in DNA, in Proceedings of Muse, pp. 22, 2001. [2] Rijmen P., Advanced Encryption Standard, in Proceedings of Federal Information Processing Standards Publications, National Institute of Standards and Technology, pp. 19-22, 2001. [3] Rivest L., Shamir A., and Adleman L., A Method for Obtaining Digital Signature and Public Key Cryptosystem, Computer Journal of Communications of the ACM, vol. 21, no. 2, pp. 120-126, 1978. [4] Saeb M., El-abd E., and El-Zanaty M., On Covert Data Communication Channels Employing DNA Recombinant and Mutagenes is based Steganographic Techniques, Computer Journal of Bio Systems, vol. 57, no. 2, pp. 13-22, 2000 [5] Shimanovsky B., Feng J., and Potkonjak M., Hiding Data in DNA, Springer, UK, 2003. [6] Smid E. and Branstad M., Data Encryption Standard, in Proceedings of Federal Information Processing Standards Publications, National Institute of Standards and Technology, pp. 550-559, 1988 JGRCS 2010, All Rights Reserved 171