1998 IEEE. Reprinted with permission.

Size: px
Start display at page:

Download "1998 IEEE. Reprinted with permission."

Transcription

1 Höglund A. J. and Hätönen K., 1998, Computer Network User Behavour Vsualsaton Usng Self Organsng Maps, Proceedngs of the 8th Internatonal Conference on Artfcal Neural Networks (ICANN 1998), vol. 2, pp IEEE. Reprnted wth permsson. Ths materal s posted here wth permsson of the IEEE. Such permsson of the IEEE does not n any way mply IEEE endorsement of any of Helsnk Unversty of Technology's products or servces. Internal or personal use of ths materal s permtted. However, permsson to reprnt/republsh ths materal for advertsng or promotonal purposes or for creatng new collectve works for resale or redstrbuton must be obtaned from the IEEE by wrtng to pubs-permssons@eee.org. By choosng to vew ths document, you agree to all provsons of the copyrght laws protectng t.

2 Computer Network User Behavour Vsualsaton Usng Self Organsng Maps Albert J. Höglund Noka Research Center, Noka Group Helsnk, Fnland Kmmo Hätönen Noka Research Center, Noka Group Helsnk, Fnland Abstract Computer systems are vulnerable to abuse by nsders and to penetraton by outsders. The amount of montorng data generated n computer networks s enormous. Tools are needed to ease the work of system operators. Anomaly detecton attempts to recognse abnormal behavour to detect ntrusons. A prototype Anomaly Detecton System has been constructed. The system provdes means for automatc anomaly detecton and user behavour vsualsaton. The system conssts of a data gatherng component, a user behavour vsualsaton component, an automatc anomaly detecton component and a user nterface. Ths paper s focused on the user behavour vsualsaton component. Ths component uses large Self Organsng Maps as a bass. The constructon and the usage of the component s presented. Some dscusson on comments from the test usage of the Anomaly Detecton System s also provded. 1 Introducton Computers and computer networks are becomng more and more mportant. The computer networks are normally protected from unauthorsed usage by securty mechansms, such as passwords and access controls. However, f an abuser or ntruder manages to bypass these securty mechansms and gans access to vtal nformaton, the potental loss s enormous. The loss can be decreased by detectng ntruders or abusers at an early stage. The two man ntruson detecton technques are rule-based msuse detecton and anomaly detecton. Rule-based msuse detecton attempts to recognse specfc behavours that are known to be mproper. That s, f a user follows certan ntrusve patterns, he s classfed as an ntruder. Anomaly detecton, on the other hand, attempts to recognse anomalous or abnormal user behavour to detect ntrusons. Anomalous or abnormal behavour s suspected f the current behavour devates suffcently from the

3 prevous behavour, whch s assumed normal. Anomaly detecton deals wth behavour that s not known n advance, whle rule-based msuse detecton deals wth predefned volatons. In ths dffculty also les the advantage of anomaly detecton: t can namely be used to detect types of ntrusons that have never occurred before. References on anomaly detecton and msuse ntruson detecton research can be found n [1, 2, 3, 4]. A prototype Anomaly Detecton System for the UNIX envronment has been constructed [5]. The system provdes means for automatc anomaly detecton and user behavour vsualsaton. The system conssts of a data gatherng component, a user behavour vsualsaton component, an automatc anomaly detecton component and a user nterface. Ths paper presents the user behavour vsualsaton component. The objectve wth the user behavour vsualsaton was to vsualse the user behavour durng a certan perod n a smple way. Such a component s useful, snce the amount of data generated by users n a computer network s enormous. The amount of nformaton s reduced by selectng a set of features that characterses the behavour of the users n the network. Ths set of features should form a daly fngerprnt of the network user, whch means that t has to be selected carefully. Although the amount of data s reduced n the feature selecton process, t s stll dffcult to compare and analyse the user behavour. The vsualsaton problem s tackled usng an approach, n whch the Self Organsng Map s used to vsualse the user behavour n two dmensons. The Self Organsng Map s a neural network based on unsupervsed learnng and t s sutable for vsualsaton and nterpretaton of large hgh-dmensonal sets of data. 2 Methods and Implementaton 2.1 Software and Envronment The prototype Anomaly Detecton System was bult n the UNIX envronment. The data gatherng and the data processng are performed on separate servers. Ths enhances securty and makes t more dffcult to dsturb the operaton of the system. The routnes that buld the prototype Anomaly Detecton System have been coded usng C and Perl and the Self Organsng Maps are made usng the procedures of SOM_PAK [6]. The user nterface uses Netscape 1 to vew the html pages generated by the system. 2.2 Data Gatherng and Scalng For a perod of 400 days the user account logs of more than 600 users have been stored. The user account logs gve nformaton on the processes performed by the users. Ths nformaton ncludes CPU-tmes, characters transmtted and blocks read. The selecton of features descrbng the user behavour s dscussed n Secton The Netscape browser can be downloaded from

4 Snce the magntude of the features vares greatly, logarthmc or lnear scalng was consdered necessary. The features were scaled accordng to (1) and (2), where f s the feature n queston. The dvson by the maxmum scales the parameters to the range [0, 1]. Hstograms were studed n order to determne whch one of the two scalngs was more sutable for the feature n queston. ln( f = max + 1) f _ Log scaled, [ ln( f + 1) ] f f _ Ln scaled = (1), (2) max[ f ] 2.3 Feature Selecton Feature selecton provdes a means of reducng the enormous amount of data generated by computer network users. The feature selecton problem can be stated as follows: Objects are descrbed wth a large set of features. The objectve s to fnd a subset of features that dstngushes the objects from each other as well as possble. Features charactersng the user behavour durng a perod of 24 hours are used n the user behavour vsualsaton component. Frstly an ntal feature set wth 34 features was derved. Features descrbng CPU-tme and transmtted characters for dfferent servces were ncluded, but also sesson, process and logn nformaton. The ntal feature set was reduced to a set of 16 features. Ths was acheved by omttng features wth strong lnear dependency to other features and by omttng very nosy features. The lnear correlaton was checked usng a correlaton test and the nosy features were found by closely examnng the varances. Careful consderaton was used n the feature selecton process. 2.4 User Behavour Vsualsaton The dea s to use Self Organsng Maps (SOM) to vsualse the user behavour durng a certan perod n a smple way. The SOM s an effectve tool for vsualsaton of hgh-dmensonal data [7, 8]. The prncpal goal of the SOM s to transform an ncomng sgnal pattern of arbtrary dmenson nto a one- or twodmensonal dscrete map, and to perform ths transformaton n a topologcally ordered fashon. The algorthm and detaled theory on the SOM can be found [7]. The user behavour vsualsaton component uses two-dmensonal SOM:s. The maps are constructed usng the whole data set for the whole perod, whch means the data for all the users for all the days n the perod. A large number of tests ndcated that maps of sze 18x14 gve suffcent accuracy. The type of lattce used s hexagonal (sx neghbours) and the neghbourhood functon type used s bubble [6, 7]. In ths paper the map has been labelled wth the user number and the number of hts on the neuron. These labels are separated by a _. The maps are vsualsed usng the U-matrx method [9, 10, 11]. In ths method the neurons are marked wth dots and the dstances between them are descrbed wth greyscales. The darker the cell between two neurons, the greater the dstance between them. In addton, the user behavour vsualsaton component provdes real values for the neurons of the map, a connecton to the real data and feature statstcs.

5 3 Usng the User Behavour Vsualsaton Component The user behavour vsualsaton component of the Anomaly Detecton System prototype has been n test usage durng a perod of several months. The followng user behavour clusterng examples llustrate the use of the user behavour vsualsaton component. 0_67 0_10 42_26 42_20 42_12 42_7 42_7 127_26 0_1 0_1 42_1 42_1 42_1 42_1 42_1 42_1 42_1 127_1 127_1 127_3 127_1 127_2 127_1 127_1 127_1 127_4 127_1 127_8 127_1 127_1 127_22 127_5 Fgure 1 Map of sze 18x14 traned wth the whole set of data and labelled wth the usage of user 0, 42 and left man rght Processcont Feature 2 CPU CPU05-11 CPU11-17 CPU17-23 CPU23-05 Characters Feature 9 Feature 10 Feature 11 Feature 12 Feature 13 Feature 14 Fnger Feature 16 Fgure 2 Feature dstrbuton for user 42 man usage cluster compared wth the devaton on the rght and the devaton on the left. The behavour of three users durng a perod of 79 days s vsualsed n Fgure 1. The behavour of user 42 s qute ncely clustered. Fgure 1 shows that user 42 has

6 one man usage cluster n the upper mddle of the map, but one notces that there are two devatons from ths cluster, one on the rght and one on the left. An explanaton to these devatons s gven n Fgure 2. In ths fgure the feature dstrbuton of the neuron wth 20 hts from the man usage cluster s compared wth the feature dstrbuton of the devatons on the left and on the rght. The network usage of the devaton on the rght s lghter than normal wth fewer processes, less CPU-tme and fewer characters transmtted. Ths devaton can be explaned wth a breakdown n the network. The devaton on the left, on the other hand, s a bt anomalous. The usage s heaver than normal, especally the CPU-tmes n the mornng and n the afternoon are bgger than normal. Another anomalous thng wth the devaton on the left s the hgh usage of the fnger servce compared wth no use at all for the normal behavour. The behavour of hundreds of users have been analysed durng the test usage. There s for example a well bounded no usage cluster n the upper rght corner of the map n Fgure 1. Users that rarely use the computer network are mostly mapped to ths cluster. Fgure 1 shows that user 127 has 26 days wth no usage durng the perod of 79 days. Another well-bounded cluster s located n the upper left corner of the map. A system demon d wth user number 0 s mapped to ths cluster (see Fgure 1). The behavour of ths d can be consdered very regular snce ts behavour s mapped to only four neurons on the map. A msuser or ntruder usng the d of the system demon would be very crtcal, snce t has greater prvleges n the network than the normal users. Devatons n the behavour of the system demon d can easly be noted usng the user behavour vsualsaton component. There are also users that are mapped to two clearly separated usage clusters, whch means that the users have two workng modes. An example of ths s user 127. Fgure 1 shows that the behavour of user 127 s clustered to three clearly separated clusters. One s the no usage cluster and the two others are the normal workng modes. These modes can be analysed further usng the same procedures as wth user 42 above. The phenomenon wth two workng modes may, for example, orgnate from the fact that the users work n more than one project. There are of course some users whose behavour s very rregular and ther behavour s therefore not so ncely clustered on the map. 4 Conclusons and Dscusson The ntal feedback from the test usage of the Anomaly Detecton System has been qute postve. Comments lke The user behavour vsualsaton component gves a quck overvew of the user behavour were qute encouragng. The test usage feedback also ndcated that the component s practcal when analysng user behavour that has been reported anomalous. Secton 3 showed how the Self Organsng Map n the user behavour vsualsaton component can be used to analyse user behavour. The user feedback also ncluded comments on necessary mprovements. Better connectons to the real data for further analyss were suggested and have now been mplemented. Improvements n the labellng of the map and map regon classfcaton have also been suggested and wll be mplemented n the future.

7 The general mpresson of the authors s that the Self Organsng Map provdes a good method for reducng the dmensons of the data and for comparng and vsualsng the behavour of network users. Examples and test usage gve just ndcatons of the performance of the user behavour vsualsaton component, though. Smulaton experments for further evaluaton of both the user behavour vsualsaton component and the automatc anomaly detecton component wll therefore be performed. A publcaton on the automatc anomaly detecton component s also under preparaton. References [1] Javtz H S, Valdes A, Lunt T F, Tamaru A, Tyson M, Lowrance J. Next generaton ntruson-detecton expert system (NIDES): Statstcal algorthms ratonale and ratonale for proposed resolver. Techncal report, Computer Scence Laboratory, SRI Internatonal, Menlo Park, Calforna, The USA [2] Kumar S, Spafford EH. A pattern matchng model for msuse ntruson detecton. In Proceedngs of the 17th Natonal Computer Securty Conference, October 1994, pp [3] Lankewcz L B, A nonparametrc pattern recognton approach to anomaly detecton. Doctoral Thess, Tulane Unversty, [4] Lunt T. F. A survey of ntruson detecton technques, Computers and Securty 1993; 12(4): [5] Höglund A. An Anomaly Detecton System for Computer Networks. Master s thess, Helsnk Unversty of Technology, Helsnk, [6] Kohonen T, Hynnnen J, Kangas J, Laaksonen J. Manual of SOM_PAK, The Self- Organsng Map Program Package, Verson 3.1, Aprl 7, 1995, [7] Kohonen T. Self Organsng Maps. Second Edton. Sprnger-Verlag, Hedelberg [8] Neural Networks Research Centre & Laboratory of Computer and Informaton Scence. Trennal Report Helsnk Unversty of Technology [9] Ivarnen J, Kohonen T, Kangas J, Kask S. Vsualsng the clusters on the Self Organsng Map. Multple Paradgms for Artfcal Intellgence (SteP94), Fnnsh Artfcal Intellgence Socety, [10] Kraajveld M A, Mao J, Jan A K. A non-lnear projecton method based on Kohonen's topology preservng maps. Proceedngs of the 11th Internatonal Conference on Pattern Recognton (11ICPR), 41-45, Los Alamtos, CA. IEEE Comput. Soc. Press, [11] Ultsch A, Self organsed feature maps for montorng and knowledge acquston of a chemcal process. Gelen S, Kappen B, edtors, Proceedngs of the Internatonal Conference on Artfcal Neural Networks (ICANN93), London. Sprnger-Verlag, 1993, pp