Research on Intrusion Detection based on Immunology Principle. Guannan GONG

Size: px
Start display at page:

Download "Research on Intrusion Detection based on Immunology Principle. Guannan GONG"

Transcription

1 Research on Intrusion Detection based on Immunology Principle Guannan GONG Liang HU College of Computer Science and Technology, Jilin University, Changchun, , P.R.China stract In this paper, it evaluates the recent applications of Immunology principle using in Intrusion Detection, and mainly discusses the application situation of the Antibody Clonal Selection Principle, Immune Networks and Danger Theory. Conclude the used immunology principle and algorithms, and introduce some of the hotly debated issues. At last, based on analyzing the problems of intrusion detection, the future research direction is discussed. Keywords Artificial Immune System, Intrusion Detection, Negative Selection, Clonal Selection, Idiographic Networks, Danger Theory 1 Introduction In recent years, as the computer technology develops and the network-scale extends, the systems suffer more and more intrusion and attack, and the problems of network and information security become more and more serious. Thus, detecting and preventing the intrusion and attack, and further protecting the computer system, network system and the whole information fundamental equipment, become the important issue to solve. For the Intrusion Detection System (IDS), which was first discussed by James P. Anderson in his paper in 1980, the module of intrusion detected and analyzed is the kernel part. Since the traditional intrusion detection model includes series of disadvantages, but Human Immune System (HIS) could protect the human body from being attacked by several pathogens such as germina and virus, and represents many great advantages including distributed protection, multiform, adaptability, and robustness. Fortunately, the characters of HIS are quite similar with the requirements of IDS, then the development of intrusion detecting technology using HIS become research focus. In this paper, it concludes the application of immunology principle in intrusion detection and some new immunology theories. Some problems which are worth being deeply researched are proposed. So far, the main theories which dominate in immunology are including: Antibody Clonal Selection Principle and Immune Networks. Also, the models of Artificial Immune System (AIS) based on these two

2 theories could be probably separated into Network Model (NM) and Negative Selection Model (NSM) [1], and the latter one was already widely used in founding IDS. The immune mechanisms related to it include immune reply, character identification of immune system, mode identification, clonal selection and enlargement, immune memory, and difference between self and non-self. Recently, as the human immunology enriches and develops, one new immunology theory, which is danger theory, attracts people s interest. The research and application of this theory in intrusion detection already expand. 2 Antibody Clonal Selection Principle The main point of Antibody Clonal Selection Principle is that the extraneous antigens select Complementary Cells which are formerly in static state to clone. The functions that are activation, multiplication and domino-offect of selected cells clone is cytology process of immune reply. On the other hand, the cell clone of self-antigen is restrained or eliminated. Thus, the identification of extraneous antigens is the key. The fundamental principle is that the main job of immune cells (antigens) is distinguishing the noumenon cells and the variant cells, the noumenon cell is normal human cell, while the variant cell is the harmful heterogeneous cell (also called antigen). Each lymphocyte could identify a few of antigen cells which are similar in architecture through banding mechanism. The lymphocytes of human body are mainly produced by medulla and thymus, and there are candidate gene depots of lymphocytes in medulla and thymus. The lymphocytes are just produced from composing the random selected gene snippets from the gene depot. Since the process is random, the generated lymphocytes would possibly mistake the noumenon cells as the variant cells, however, it is forbidden. To decrease such possibility, the random generated lymphocytes should pass one negative selecting process before they leave medulla and thymus. During the negative selecting process, the lymphocytes contact lots of noumenon cells, and the lymphocytes would be deleted as invalidation if it could band any noumenon cell. The lymphocytes which pass the negative selecting process would be considered as mature ones, and would be released from medulla and thymus, then enter the blood and do antigen-detecting task. If they could band a number of antigens, which are more than one threshold, in limited time, the lymphocytes would be activated to kill the antigens. Otherwise, if the lymphocytes are not activated in defined time, they would die and be replaced by other new cells. After the lymphocytes are started u it becomes the clonal selecting phase. In this phase, the activated lymphocytes would generate several self-copies, and these copy cells are so called memory cells. The memory cells are different with the general lymphocytes, they have smaller threshold and longer life cycle. Thus, these copy cells could accelerate the identification process of antigen when the antigens had already appeared in human body, and then the validation of HIS is guaranteed. The intrusion detection technology based on this theory commonly needs to build a detection subset to match antigen, the detector is generated in random, and it would become mature detector after passing the negative selecting phase. These mature detectors monitor the data of network, and the detector would be activated when the

3 number of matched anomaly exceeds the threshold. One message would be sent to the operator, and the operator decides whether it is a real intrusion. If it is one real intrusion, the detector would advance to memory detector. Generally, the normal users and network activities are considered self, while others are considered non-self. The genes that compose self and non-self are expressed by attribute which are stand for activity, such as the attributes including source IP address, destination IP address, and the service ports of TCP SYN request packet. The character complementary association between the detector and antigen is realized by match algorithm [2]. There are three mainly designed IDS: 1) the model based on immune endurance mechanism (negative selection) proposed by the group of Forrest and Hofmeyr from University of New Mexico; 2) the model based on multi-agent proposed by the group of Dasgupta from University of Memphis; 3) the master-slave DS model proposed by the group of Kim and Bentley from University College London. After these three models were proposed, the research and application of using Clonal Selection and Negative Selection in Intrusion Detection are mainly focus on the improvement and consummation of them [3]. The main research points are listed as follow: 1) Negative Selection Algorithm Kim has researched the function of negative selection using in network intrusion detection AIS [3], the experiment results prove that there exists a serious scale problem when using the algorithm in actual network flux data, that is the time needed to generate enough detectors is so long that it could not be used in actual DS. Thus, one conclusion could be got as that the negative selection is mostly fit to be the sizer of invalid detector but to be the generation method of mature detectors. Actually, they had already realized the over-calculated problem, so they proposed the negative selection algorithm based on Microhabitat [4]. Also, as the extending of negative selection method, it proposes the Negative Characterization method (NC) [5], and gives out the Positive Characterization method (PC) as a comparison. The experiment results show that although the PC method is more exact, it needs more time and more space. Thus, using NC method to generate detectors is feasible. 2) Clonal Selection Algorithm The master-slave DS model proposed by the group of Kim includes three different phases which are negative selection, clonal selection and gene depot evolution. To solve the scale problem of single using negative selection algorithm, it proposes one kind of static clonal selection algorithm using negative characters [6]. However, this method could not adapt the continually changed environment. Thus, the dynamic clonal algorithm was proposed, it could not only study the changed self activities but also forecast the non-self activities [7,8]. 3) Expressing Method of Gene The expressing methods of gene in the initial negative selection algorithm are using binary character string, however, it was discovered that the method could not express the relationship of appetencies exactly [9]. Thus, the method that uses real-valued to express gene appears, such as the randomized real-valued negative selection [10], it not only overcomes the problems above but also could estimate the best number of detectors which could cover the non-self space. Using this method in anomaly detection, the experiment results show

4 that it is better than binary expression method [11]. 4) Match Rule It proposes that most of the match functions could be separated into two classes: distance and similarity [12]. The distance measures how is the difference between two sequences while the similarity measures the similar grade between them. The match rules such as Statistic, Physics and binary measurement of distance or similarity had been already researched and compared. Most AIS uses Hamming distance or r-contiguous matching rule, but r-contiguous matching rule expresses the disadvantages such as low efficiency and great number of detectors needed for given cover rate of matching when the length of character string increases. Based on this theory, it proposes one improved r-contiguous matching rule, or one predigested r-contiguous matching rule: r-contiguous template (r-chunks). The advantages of this rule are that it could not only solve the problems above but also easy to analyze using mathematics. Generally speaking, most DS based on negative selection and self/non-self identification exist the disadvantages that there are lots of false positive and it is hard to use in actual large network. Although it had done serious of improvement and consummation on the method, some more new theories are needed to solve the problems drastically. 3 The idiographic network theory of Immunology The immune network theory is mainly the idiographic network assumption, it is first referred by JERNENK in 1973 [15] that the fundamental theory--the antibody could not only identify the antigen but also identify other antibodies. General antigens have antigen decision bits that they are able to be identified and combined by antibodies. The antibody has antibody decision bit that could identify antigen, also it has antigen decision bit which could be identified and combined by other antibodies, and it is always called idiographic antigen decision bit or idiographic bit. Thus, the antibody is composed by antibody decision bit and idiographic bit. The antibody could identify other antibodies, on the other hand, it could be identified by other antibodies. This inspirit function could keep spreading and make the whole system a network architecture. The interaction between antibody and antigen, also antibody and antibody, composes the idiographic network adjusting mechanism. The idiographic function was analyzed in detail [16], and it was proposed that it was contributed to explaining the infected memory holding mechanism. Moreover, the restraining function among the similar antibodies is contributed to holding the diversity of antibodies in system. One idiographic formula was proposed as follow [17]: dx dt i = c antibodies = c recognized lam recognized antigens + recognized N N n m ji xi x j k1 m ij x i x j + m ji xi y j k 2 x1 j = 1 j = 1 j = 1 death rate (a) In the formula, N is the number of antibody; n is the number of antigen; x i or x j is the thickness of antibody i or j; y j is the thickness of antigen j; c is proportion

5 constant; k 1 is the restrained function; k 2 is the death rate; m ji is the match function between antibody i and antibody (or antigen) j. The first part of the formula is stands for the received inspirit function when the antibody identifies other antibodies; the second part is the received inspirit function when the antibody is identified by other antibodies; the third part is stands for the received inspirit function when the antibody identifies other antigens. The attributes of idiographic interaction could be positive or negative, that is the number of antibodies could increase or decrease. In actual applications, the formula could be predigested according to the detail situations [16]. Recently, the applications of theory about idiographic network are mostly focus on the optimized problems such as polymorphism function optimization, promoted system and building immune reply model. The simulated antibody searching mechanism combines the idiographic network adjusting theory and proposes one new function optimized algorithm. The algorithm uses antibody express the possible modes of optimizing function, and uses antigen express the polymorphism functions local optimization and whole optimization which are need to be solved. It achieves the searching of whole and local optimized solution through founding clonal selector (including two operations which are high-variation clone and optimization) and using B-cell to hold multiply antibody together. Promoted systems are the ones which predict and recommend results using cooperated filtration technology. By far, it is mostly used in electronic business systems as the seller promoting the goods to the customers and helping them find the goods they need and finish the purchasing process. The Uwe Aickelin group from University of Nottingham proposes one kind of artificial immune system promoting model, which uses the known user s favorites as antibodies and uses needed matching new favorites as antigens. They assume that thickness of antibodies could be considered as the well matched antibodies subset and finish the process when it increased as time passes. Actually, the aim of the system is not to find the optimized matching antibody (the optimized problem), but to find one antibody set. These antibodies own similar match but are definitely different in the same time. They use this model in movie promoting systems, the interaction between antibody and antigen is used in match, while the interaction between antigen and antigen is used in holding polymorphism, and the matching algorithm uses k-nearest-neighbor algorithm. Moreover, the model is generalized into website promoting systems. Also, the appetency measurement algorithm was researched and two methods using in calculating related coefficients, which are Kendall tau algorithm and Kappa algorithm. As a comparison, the weighted Kappa algorithm is better for movie promoting system, meanwhile, it proves that the system has robustness, which is the results would be fine if chosen appropriate appetency [20,21]. Based on the immune reply process of HIS, the assumption of idiographic immune network in immunology, and the combination, inspirit and copy of B-cells, it proposed one new theory which is building artificial immune reply network model [21]. In this theory, the antibody and antigen are both expressed by binary character

6 string. The attributes of antibody is decided by Epitope and Paratope. Epitope decides the antigen while Paratope decides the attributes of its combined antigens. The antibody could be expressed as ( p, e), p is parato e is epitope. The match algorithm uses the method that measure the Hamming distance between the two character strings. If the distance is less than one defined threshold, the result is successful match. The total inspirit of B-cells is shown as follow: ' m n n S Ag c i Ag + k i 2 S c j k 1 j 3 i= 1 i= 1 k = 1 S = ( k H c ) c (b) 0 e Ag ) > D S Ag = 1 eag ) D (c) c = p e N e) = c i / N i= (d) 0 e ) > D S = 1 e ) e ) D (e) 0 e, p ) > D H = 1 e ) e, p ) D (f) In the formulas, Ag i, j and k are the antigen around B-cell, the spirit functioned antibody for B-cell, and restrain functioned antibody; c is antibody thickness of B-cell; c i,c and c j k are the thickness of Ag i, j and k ; k 1, k 2 and k 3 are weighted coefficients; S Ag stands for inspirit from A g to B-cell; e Ag stands for Epitope of Ag; e Ag ) was defined as the distance between the Paratope of antibody and Epitope of antigen in unitary data space; D is the threshold for combining them; N is the digit of p and e; C i is no.i digit of C; S is the inspirit from other B-cells; H is the restrain from other B-cells. That is, the first part is the inspirit from combination of antibody and antigen; the second one is the inspirit from combination of antibody s Paratope and other antibodies Epitope; the third part is the restrain from combination of antibody s Epitope and other antibodies s Paratope. When the B-cells receive defined amount of inspiriting from antibodies and antigens around them, the cells would clone to expand. The clonal expand character is simulated by function Sigmoid, the copy level is low when the inspirit level is low; the number of cells would increase rapidly by cloning when the inspiriting level reach k k k

7 the threshold, but it would run into a new steady status when the restrain exceed the inspiriting. Although there is no researching product about using idiographic adjusting principle in Intrusion Detection research, Clonal selection algorithm could be used to solve optimized problems [22]. Thus, it is possible to use idiographic adjusting principle to replace Clonal selection algorithm in DS since most of their functions are similar. 4 Analysis of Danger Theory Danger theory was first proposed by Matzinger in 1994 [23], and it considered that the key factor of causing body to immune-reply is level of danger signal which is generated by intruder but not the difference of intruders. Thus, it is not self-factor s changing but out-factor s influence, the domino-offect cells would activity when the danger signals are identified. Moreover, danger intruders could active Cell Stress and Cell Death and then generate danger signals, which could be identified by Antigen Presenting Cells (APCs). Danger signals could be separated into two kinds: internal danger signals and external signals caused by intruders such as bacteria. Both of these two kinds of signals could inspirit APC and generate immune apply. The Uwe Aickelin Group from University of Nottingham was focus on application of danger theory in intrusion detection recent years. The goal is to build one computing model for danger theory, which is building some new algorithms based on defining, researching and detecting danger signals, and then found one low false positive rate IDS. It was proposed that current AIS are based on self/non-self detection [25], which is immune cells only reply the non-self factors through negative selection. Thus, self/non-self detection is the key of IDS. However, many examples proved that HIS is not only based on self/non-self detection results to determine whether to send immune reply. Since most of the IDS are based on this theory, there must be false positive and scale problems, and it is hard to use in large network. Using danger theory, AIS and intrusion detection together to solve series of problem in future such as how to define the danger signals, since feasible danger signal could solve the localization of self/non-self selection method. Under research, it is considered that using danger theory in intrusion detection could not only detect both the known attacks and unknown attacks but also reduce false positive rate and solve the scale problem of negative selection model. As discussed above, Cell Stress and Cell Death could generate danger signals, and Cell Death has two modes: Apoptosis and Necrosis. Apoptosis is the normal mode of death, which has defined channels and adapters; while, Necrosis is abnormal death after Cell Stress, and will cause absolute Cell Lysis. Then the alert of new IDS could be separated into two classes: Apoptosis Alert (AA) and Necrosis Alert (NA). AA could be defined as low level and noise alert, it would not compose large abnormal activity, but it is the precondition of attack. NA would appear when the system is under great attack. Apoptosis has restraining function while Necrosis has inspiriting function, although the difference between them are not as large as considered, the relationship between them is the fundament of danger theory.

8 Similar as DS based on DT, there is one dual signal model which is the immune system needs two signals to generate inspiriting: one appears when detecting some anomaly (non-self) and the other one appears when the body is hurt. As a result, the immune system only uses hurting percentage to set reply, the strength of reply is relevant to the hurt. This hurt-reply mechanism could make immune system avoid to reply to the false positive, and solve the DS high false positive rate problem based on negative selection. 5 Conclusions AIS based on human immune developed rapidly in recent years--appreciating for its distributing, multiplex, robustness, adaptability and idiographic identification--it could be well used in IDS. So far, the research of Clonal Selection for antibody is somewhat mature, the number of application using it in intrusion detection is the most and the result is greatest. However, Negative Selection Algorithm could only deal simple problems, and the dynamic solving ability of Clonal Selection Algorithm is limit, there is still long distance to achieving the goal of applied IDS. More other theories should be researched. On the other hand, the development of immune network theory is mature, but the application of it is limit, most of the researches are focus on optimized problems. The difference between idiographic adjusting theory and antibody clone selection is only whether there is interaction among antibodies. Thus, combining the idiographic adjusting theory and negative selection algorithm together is worth to be researched. Last, Danger Theory is one new theory in human immunology, but many principles of it are not known clearly, more researches are needed in the future. How to use these theories and principles together and compose one whole IDS is our future work. Reference AICKELIN U, GREENSMITH J, TWYCROSS J. Immune System Approaches to Intrusion Detection A Review, Proceedings International Conference on Artificial Immune Systems [C], Catania, Italy, HOFMEYR S, FORREST S. Architecture for an AIS [J]. Evolutionary Computation, 2000, 7 (1): KIM J, BENTLEY P. Evaluation of negative selection in an artificial immune for network intrusion detection [A]. Proceedings of the GEC2CO [C] KIM J, BENTLEY P. Negative Selection and Niching by an Artificial Immune System for Network Intrusion Detection [A]. Proceedings of Genetic and Evolutionary Computation Conference (GECCO p99) [C]. Orlando, Florida, DASGUPTA D, GONZALEZ F. An Immunity 2Based Technique to Characterize Intrusions in Computer Networks [J]. IEEE Transactions on Evolutionary Computation, 2002, 6 (3): KIM J, BENTLEY P. The Artificial Immune System for Network Intrusion Detection: An Investigation of Clonal Selection with a Negative Selection Operator [A].

9 The Congress on Evolutionary Computation (CEC22001) [C]. Seoul, Korea, KIM J,BENTLEY P. Towards an Artificial Immune System for Network Intrusion Detection: An Investigation of Dynamic Clonal Selection [A]. The Congress on Evolutionary Computation [C] KIM J, BENTLEY P. Immune Memory in the Dynamic Clonal Selection Algorithm [A]. Proceedings of the First International Conference on Artificial Immune Systems ( ICAR IS) [C]. Canterbury, GONZALEZ F, DASGUPTA D, GOMEZ J. The Effect of Binary matching Rules in Negative Selection [A]. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ) [C] GONZLEZ F, DASGUPTA D, NISO L. A Randomized Real -Valued Negative Selection Algorithm [A]. Proceedings of the 2 nd International Conference on Artificial Immune Systems [C]. Edinburgh, UK, GONZLEZ F, DASGUPTA D. Anomaly Detection Using Real Valued Negative Selection [J]. Genetic Programming and Evolvable Machines, 2003, 4 (4): HARMER PK, WILLIAMS PD, GUNSCH GH, et al. An Artificial Immune System Architecture for Computer Security Applications [J]. IEEE Transaction on evolutionary computation, 2002, 6 (3): SINGH S. Anomaly detection using negative selection based on the r- contiguous matching rule [A]. Proceedings of the 1st International Conference on Artificial Immune Systems(ICAR IS) [C]. University of Kent at Canterbury, BALTHROP J, ESPONDA F, FORREST S, et al. Coverage and Generalization in an Artificial Immune System [A]. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2002) [C]. Morgan Kaufmann, New York, JERNE NK. Towards a network theory of the immune system [ J]. Annals of Immunology, 1973, 125C: CAYZER S, AICKELIN U. On the Effects of Idiotypic Interactions for Recommendation Communities in Artificial Immune Systems [A]. Proceedings of the 1st International Conference on Artificial Immune Systems (ICAR IS22002) [C]. Canterbury, UK, FARMER JD, PACKARD NH, PERELSON AS. The immune system, adaptation, and machine learning [ J ]. Physica, 1986, 22D: CAYZER S, AICKELIN U. A Recommender System based on the Immune Network [A]. Proceedings CEC2002 [C]. Honolulu, USA, MORRISON T, AICKELIN U. An Artificial Immune System as a Recommender for Web Sites [A]. Proceedings of the 1st International Conference on Artificial Immune Systems (ICAR IS22002) [C]. Canterbury, UK, AICKELIN U, CHEN Q. On Affinity Measures for Artificial Immune System Movie

10 Recommenders [A]. Proceedings RASC22004, The 5th International Conference on: Recent Advances in Soft Computing[C ]. Nottingham, UK, AICKELIN U, CHEN Q. Movie Recommendation Systems Using An Artificial Immune System [A]. 6th International Conference in Adaptive Computing in Design and Manufacture [C]. Bristol, UK, NUNES L, FERNANDO J. The Clonal Selection Algorithm with Engineering Applications [A]. In Workshop Proceedings of GECCO 00, Workshop on Artificial Immune Systems and Their Applications [C]. Las Vegas, USA, MATZINGER P. Tolerance Danger and the Extended Family [ J]. Annual reviews of Immunology, 1994, 12: MATZINGER P. The Danger Model: A Renewed Sense of Self [J]. Science, 2002, 296: AICKELIN U, CAYZER S. The Danger Theory and Its Application to Artificial Immune Systems [A]. Proceedings of the 1st International Conference on Artificial Immune Systems (ICARIS22002) [C]. Canterbury, UK, 2002: AICKELIN U, BENTLEY P, CAYZER S, et al. Danger Theory: The Link between AIS and IDS? [A]. Proceedings ICAR IS22003, 2 nd International Conference on Artificial Immune Systems [C] GREENSMITH J, AICKELIN U, TWYCROSS J. Detecting Danger: Applying a Novel Immunological Concept to Intrusion Detection Systems[A]. 6th International Conference in Adaptive Computing in Design and Manufacture [C]. Bristol, UK, HOFMEYR S. The implications of immunology for secure systems design [J]. Computers & Security, 2004, 23: