Diagnosing breast cancer with an improved artificial immune recognition system

Size: px
Start display at page:

Download "Diagnosing breast cancer with an improved artificial immune recognition system"

Transcription

1 Soft Comput DOI /s METHODOLOGIES AND APPLICATION Diagnosing breast cancer with an improved artificial immune recognition system Mahmoud Reza Saybani 1,2 TehYingWah 1 Saeed Reza Aghabozorgi 1 Shahaboddin Shamshirband 3 Miss Laiha Mat Kiah 3 Valentina Emilia Balas 4 Springer-Verlag Berlin Heidelberg 2015 Abstract Breast cancer is the top cancer in women worldwide. Scientists are looking for early detection strategies which remain the cornerstone of breast cancer control. Consequently, there is a need to develop an expert system that helps medical professionals to accurately diagnose this disease. Artificial immune recognition system (AIRS) has been used successfully for diagnosing various diseases. However, little effort has been undertaken to improve its classification accuracy. To increase the classification accuracy, this study introduces a new hybrid system that incorporates support vector machine, fuzzy logic, and real tournament selection mechanism into AIRS. The Wisconsin Breast Cancer data set was used as the benchmark data set; it is available through the machine learning repository of the University of California at Irvine. The classification performance was measured through tenfold cross-validation, student s t test, sensitivity Communicated by V. Loia. B Mahmoud Reza Saybani saybani@gmail.com B Shahaboddin Shamshirband shamshirband@um.edu.my 1 Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kula Lumpur, Malaysia 2 Department of Computer Networks, Markaz-e Elmi Karbordi Bandar Abbas 1, University of Applied Science, Bandar Abbas, Iran 3 Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 4 Department of Automation and Applied Informatics, Aurel Vlaicu University of Arad, Arad, Romania and specificity. With an accuracy of 100 %, the proposed method was able to classify breast cancer dataset successfully. Keywords Artificial intelligence Artificial immune recognition system Breast cancer diagnosis Classification Data mining 1 Introduction It is an unfortunate fact that breast cancer is increasing worldwide. Mammography screening is very expensive and only countries with a good health system can benefit from it. Therefore, early detection strategies are needed which remain the cornerstone of breast cancer control (WHO 2012). Artificial intelligence and data mining techniques have been increasingly applied in the medical domain to identify hidden regularities or patterns and help medical practitioners to make decisions (Elouedi et al. 2014). Although there are advancements in cancer treatment, early diagnosis is still very important for increasing the survival rates of breast cancer patients (Lavanya and Rani 2012). Medical practitioners use their experience to draw diagnostic inference from patients examination, physical condition, and history. The obtained information can sometimes be uncertain, insufficient, and misleading; this makes the diagnostic tasks difficult even for experienced doctors (Sarkar and Leong 2009). Modern computer technologies have enabled medical doctors and scientists to store huge amount of records. It is obviously impossible for an unassisted human to process this tremendous amount of data (Cios and William Moore 2002). Combinations of mathematical models, biology, and artificial intelligence have entered a period of rapid discovery in

2 M. R. Saybani et al. medicine and are used to help physicians to more accurately diagnose various diseases. An expert system helps medical professionals to diagnose diseases faster and reduce human errors (Polat et al. 2007). The human immune system acts like a distributed learning system. It learns how to identify patterns and then uses memory cells to remember previously identified patterns (Dasgupta 1998). Scientists have been motivated by these very significant computational features to create artificial immune systems (AISs) (Timmis et al. 2004). AIS isa hybrid system (De Castro and Timmis 2002) and a class of adaptive algorithm (Brownlee 2005) that abstracts the structure and function of the vertebrate immune system into computational systems (Cai 2014; Yang 2011). AIS was built chiefly for feature extraction or data clustering, therefore its performance for classification was not satisfactory. Watkins (2001) introduced Artificial Immune Recognition System (AIRS) and focused on classification problems (Brownlee 2005). AIRS improved significantly the classification performance (Polat et al. 2007). A human being uses his intelligence to make decisions such as categorizing items or classifying patients. Applications of artificial intelligence techniques in diagnosing diseases have been increasing, and AIRS is one of these techniques that has been successfully used in medical classification problems (Chikh et al. 2012). There are different classification tools that offer different advantages; for instance, k-nearest neighbor is one of the best classifiers for specific problems; however, it is computationally expensive (Goodman et al. 2002). Artificial neural networks perform well, once trained, and can generalize from the training data set; however, very often the best network architecture for a specific task is unknown (Goodman et al. 2002). One of the advantages of AIRS is that one does not need to know the appropriate settings in advance and most importantly, it is self-determined. It has been shown that on certain problems, random settings of AIRS parameters lead to classification accuracies that are only a few percentage point less accurate than when optimized parameter settings are used (Goodman et al. 2002). AIRS is considered to be a clever (Brownlee 2011) supervised classifier that has shown significant achievements on a wide range of classification problems (Brownlee 2005; Pendharkar et al. 1999). Cuevas et al. (2012) argued that bio-inspired computing has proven to be useful for classification problems. Some of applications of AIRS will be discussed later. Although AIRS has been shown to have many good features, it still has potential to perform even better. Jenhani and Elouedi showed that relatively few efforts have been made to improve the classification accuracy of AIRS. The majority of published articles have applied AIRS to solve a particular classification problem (Jenhani and Elouedi 2012). AIRS2 is a more efficient version of the AIRS algorithm. AIRS3 keeps track of the number of antigens for each memory cells, and while AIRS2 uses regular k-nearest Neighbor for classification, AIRS3 uses weighted k-nearest Neighbor (Jenhani and Elouedi 2012). Golzari et al have identified that AIRS selection pressure is very high during resource competition, and this leads to loss of diversity, and may cause premature memory cells. The consequence of high selection pressure is decreased classification accuracy (Golzari et al. 2009a). Polat et al. (2007) have identified that AIRS classification performance can be improved by reducing the number of resources and training time. In addition, AIRS uses k-nearest neighbor (knn) as a classifier, and in machine learning, knn was developed in such a way that it identifies patterns of data without demanding for an exact match to any cases or stored patterns, which means relatively speaking, low accuracy. Furthermore, selecting k for knn affects the performance of knn, if k is chosen to be too small, the result might be sensitive to noise in data, and if k is chosen to be too large, then the neighborhood may include points from other classes (Wu et al. 2008). This paper proposes the following modifications to the process of AIRS2 with the goal of improving its classification accuracy. To reduce and control the resource numbers, this paper uses Fuzzy logic in resource allocation phase of the algorithm; To reduce loss of diversity and obtaining premature memory cells, real-world tournament selection (RWTS) method is introduced into the resource competition phase of the algorithm, and instead of using knn as a classifier, this paper uses support vector machine (SVM) which is a very robust classifier and has shown good performance in many applications (Joachims 1998). The effects of these changes were analyzed by applying the new model to realworld Wisconsin breast cancer dataset (WBCD). The WBCD was taken from UCI machine learning repository (Frank and Asuncion 2010). Obtained classification accuracy was then compared with other classifiers using the same dataset. We call this new hybrid FUZZY-SVM-RWTS-based AIRS2 system FSRAIRS2 and show that it obtains the highest classification accuracy among the classifiers reported by other researchers. The rest of this paper is divided into four major parts. The first part, background, presents human immune principles and their applications in the artificial immune systems. The second part, literature review, summarizes and analyzes the work of previous related studies. The third part, material and methods, describes the methodology and the algorithm in detail. The fourth part, the results, presents the result of this study and compares it with the results obtained through other well-known algorithms. This section is followed by the discussion and conclusion of the study.

3 Diagnosing breast cancer with an improved... Table 1 Mapping between the immune system and AIRS Immune system Antibody Recognition ball (RB) Shape-space Clonal expansion Antigen Affinity maturation Immune memory Metadynamics AIRS Feature vector Combination of feature vector and vector class The possible values of the data vector Reproduction of ARBs that are well matched with antigens Training data Random mutation of ARB and removal of the lowest stimulated ARBs Memory set of mutated ARBs Continual removal and creation of ARBs and Memory cells 2 Background 2.1 Human immune principles and artificial immune systems When infectious organisms or other invaders enter human body, the immune system s task is to attack and try to eliminate infectious organisms or other pathogens that enter human body (Timmis et al. 2004). The human immune system contains B- and T- lymphocytes. B lymphocytes originate from bone marrow where they can mature into B cells, and T lymphocytes originate from thymus where they can mature into T cells (Clark 2007). Once pathogens have been identified, B cells are stimulated and use defense mechanism to lock onto pathogens, with the help of T cells. B cells undergo somatic and hyper-mutation cloning and antibodies are produced. T cells have the function of destroying the invading germs which have been identified by the immune system (Timmis and Neal 2001). Once antibodies are produced, they will be distributed all over the body, so that if the same antigen attacks the body, the immune system uses the existing antibodies to lock onto the antigens and asks T cells to destroy them. Detailed information about immune system can be found in Sompayrac (2012) and Travers et al. (2008). Table 1 shows the mapping between terminologies used for immune system and AIRS (Watkins and Timmis 2002). 2.2 Artificial immune recognition system AIRS is a supervised learning algorithm that was inspired by natural immune system, it was developed by Watkins and Boggess (2002) who draw the idea from the work of Timmis and Neal (2001). This was the first version of AIRS that later was dubbed as AIRS1. Watkins and Boggess (2002) showed that performance of AIRS1 when applied on various benchmark classification problems was comparable to other highly regarded supervised learning techniques for the same benchmarks. Based on the experience gained from AIRS1, Watkins and Timmis decided to refine the process of AIRS1 by reducing the complexity of AIRS1 s approach while maintaining the accuracy of results, and introduced AIRS2 (Brownlee 2011). Watkins et al. showed that AIRS2 is simpler and computationally more efficient than AIRS1 and declared AIRS2 as the standard AIRS implementation (Joachims 1998). Details about the differences between AIRS1 and AIRS2 and information about AIRS2 s algorithm can be found in Watkins and Timmis (2002). Since this paper introduces a new hybrid system, the FSRAIRS2, it is important to note that also AIRS is an assembly of elements and procedures that were developed for supervised and unsupervised AIS algorithms (Brownlee 2005). Watkins used knn inside AIRS to perform the classification tasks. Here, knn is one of the components that have been added to other elements and procedures to make what is known as AIRS; therefore, it could also be interpreted as a hybrid algorithm of AIRS-KNN. Some researchers consider AIRS as pre-processor for KNN (Seeker and Freitas 2007). Literature review reveals that most researchers have used the term AIRS in their work; however, most of them do not explicitly differentiate whether they refer to AIRS1 or AIRS2. The focus of this paper is to make changes to AIRS2 algorithm. 2.3 Literature review Goodman et al. compared the performance of AIRS with other classifiers on the same multiple-class problems and concluded that AIRS was competitive with the top five to eight of those classifiers (Meng et al. 2005; Goodman et al. 2003). Comprehensive benchmark studies by Meng et al. (2005) on AIRS revealed that AIRS delivers reasonable results and can be used for real-world classification tasks. Marwah and Boggess (2002) investigated several different algorithms for circumstances when two or more classes were tied on the number of memory cells among the k strongest stimulated memory cells. Their report showed that accuracy of AIRS was in average higher for one of the testbeds than any reported at the UCI repository for that testbed.

4 M. R. Saybani et al. Goodman et al. investigated the behavior of AIRS on a number of publicly available classification problems. As they increased the number of classes, they kept the number of features constant and compared their result with Kohonen s learning vector quantization (LVQ), which is a well-known classifier. They traced the behavior of both classifiers across those publicly available classification problems. They concluded that AIRS s average performance on one of the problems was the best for that problem (Goodman et al. 2002). Conducting another study, Goodman et al. (2003) examined AIRS empirically. They replaced one of the two possible sources of its classification power with alternative modifications. They reported that results were slightly less effective, however, not statistically significantly so. Their modifications provided fast test version of AIRS. They concluded that AIRS s classification power lies in its replacement and maintenance of its memory cell population. The effects of adding non-euclidean distance measures to the classical AIRS algorithm were explored by Hamaker and Boggess they used four famous classification problems with various proportions of nominal, real and discrete features. They suggested to search further for non-euclidean distance metric that has the best performance for a certain data set (Hamaker and Boggess 2004). In another research, Polat et al. increased classifier s performance by incorporating fuzzy logic into resource allocation mechanism of AIRS. They argued that linearity of resource allocation mechanism of classical AIRS is the reason for excessive resource usage and therefore the classifier takes longer for training and requires a higher number of memory cells. They stated that their algorithm achieved higher classification accuracy than other known classifiers on the same data sets (Polat et al. 2007; Polat and Güneş 2008). Golzari et al. incorporated the real-world tournament selection algorithm into the resource competition phase of AIRS1 and were able to achieve higher classification accuracy. According to their report during competition phase of AIRS1, the algorithm uses a mechanism that has high selection pressure and causes loss of diversity and thus it may generate premature memory cells. They concluded that their experimental results on benchmark datasets from the UCI machine learning repository obtained significantly higher classification accuracy than AIRS1 in all cases (Golzari et al. 2009a, b). A hybrid method involving AIRS and fuzzy-weighted preprocessing was developed by Polat et al. (2007). They used this method to classify thyroid disease dataset. A comparison of their method with the classical AIRS on the same data set showed that their methodology performed better. Saidi et al. and/or Chikh et al. used fuzzy k-nearest neighbor in connection with the reduced memory cell pool of AIRS2 to assign a class membership to each instance, they claimed that the classification time of knn depends on the number of used data points; they argued that reducing this number is useful for the algorithm. They compared their results with results obtained from AIRS2 and claimed to have achieved higher classification accuracy than AIRS2 (Chikh et al. 2012; Saidi et al. 2011). Another application of AIRS can be seen in the work of Forouzideh et al. They used AIRS for text-document classification. Forouzideh et al. compared different versions of AIRS with multi-layer perceptron (MLP) and RBF as a simple neural approach; they concluded that different versions of AIRS performed better than both MLP and RBF. They claimed that because of high performance of AIRS, more successful applications of AIRS in a wide range of content mode classification are expected in the future (Forouzideh et al. 2011). Chen et al. (2008) reported a promising ability of AIRS in analyzing microarray data. In another study, Le and Mo- Yuen incorporated E-algorithm into AIRS in order to create a new Fuzzy-AIRS (FAIRS); they used the new algorithm to identify the cause of power outage. According to their report, results of FAIRS were compared with results of AIRS and E-algorithm. They reported that FAIRS achieved comparable performance with other algorithms; however, FAIRS was significantly faster in computing time than E-algorithm (Le and Mo-Yuen 2008). 2.4 Application of AIRS in diagnosing diseases Classification system has found many applications with significant results in diagnosing various diseases. One of the most important decision making tools in medicine is classification (Kahramanli and Allahverdi 2008), and data classification problems are common in many fields including medicine (Tunç 2012). Huang et al. (2012) indicated that for saving patients lives, classification accuracy of medical data sets has to be optimal. Kodaz et al. (2009) developed an AIRS-based model for diagnosing thyroid disease. They reported to have achieved a classification accuracy of %. Polat et al. (2006) developed a hybrid method of fuzzy-weighted pre-processing and AIRS to diagnose heart disease and reported their method was capable of achieving a classification accuracy of 96.3 %. Latifoǧlu et al. (2007) showed that their AIRS-based method could assist physicians in making the final judgment in diagnosing atherosclerosis patients. They achieved a classification accuracy of %. Another medical application of AIRS can be seen in the work of Kara et al. (2009) they used information gain-based AIRS to classify microorganism species, they have achieved a classification accuracy of %. A recent application of AIRS for use in the field of medical diagnosis was developed by Chikh et al. (2012) who presented a model to classify dia-

5 Diagnosing breast cancer with an improved... betes diseases. They have reported that application of their model, MAIRS2, on diabetes data set has achieved an accuracy of 89.1 %. In another study on predicting type 2 diabetes among pregnant women, Lin et al. (2011) used AIRS and compared its result with that of SVM and logistic regression; according to their report, AIRS achieved highest classification accuracy among these three classifiers, the obtained accuracy with AIRS was 62.8 % on the diabetes data set. Recently, Shamshirband et al. (2014) used AIRS and fuzzy labeling to diagnose tuberculosis disease. They have reported that their method achieved an accuracy of %. 2.5 Related work with Wisconsin breast cancer dataset There are many researches on medical diagnosis of breast cancer using WBCD in the literature. The following shows the classification accuracy achieved by the researchers, when they applied their models against WBCD. Golzari et al. (2009b, 2011) achieved accuracies of 97.17, 97.04, 97.1, and % with AIRS1, EXP1AIRS, EXP2AIRS, and RWT- SAIRS1, respectively. Daoudi et al. used median filtering for cloning in their AIS-based model and achieved a classification accuracy of 94.4 % for WBCD data set (Daoudi et al. 2013). Lavanya and Rani (2012) introduced a hybrid CART classifier with feature selection ad bagging techniques and applied their model against WBCD data set, they have reported a classification accuracy of %. Elouedi et al. (2014) proposed a hybrid clustering model to prognose the breast cancer using WBCD data set. They have achieved a classification accuracy of %. Polat et al. (2007) introduced Fuzzy-AIRS and applied their model against WBCD, and reported a classification accuracy of %. Accuracies of 96.7, 96.8, and 97.2 % obtained through Optimized-LVQ, big-lvq, and AIRS methods respectively were reported by Goodman et al. (2002). Setiono (2000) achieved with his application of neuro-rule method an accuracy of 98.1 %. Classification accuracy obtained by Peña-Reyes and Sipper (1999) was % they used Fuzzy-Genetic Algorithm (Fuzzy-GA). A classification accuracy of 97.2 % was reported by Bennett and Blue (1998)they used SVM as a classifier. Nauck and Kruse (1999) used neuro-fuzzy techniques and obtained an accuracy of %. Hamilton et al. (1996) reported a classification accuracy of % they used Rule Induction through Approximate Classification (RIAC) method, they achieved a classification accuracy of 96 % for C4.5 for the same dataset. Using Linear Discriminant Analysis (LDA) method, Ster and Dobnikar (1996) reported an accuracy of 96.8 %. Simplified Artificial Immune System (SAIS) was developed by Leung et al. (2007). To ensure global optimization of AIS, this classifier was also tested on WBCD, the classification accuracy that was reported for this data set was 96.6 %. Oliveira et al. (2012) developed a new algorithm by the name of Clonal Selection Classifier with Data Reduction (CSCDR) with the goal of optimizing the clonal selection process, when this algorithm was applied to WBCD, it achieved a classification accuracy of %. An approach to classify breast cancer using advanced multidimensional fuzzy neural network was introduced by Naghibi et al. (2012) they used Fuzzy Gaussian potential neural network (FGPNN) and hierarchical fuzzy neural network (HFNN) to classify breast cancer. They reported that simulation results demonstrated the effectiveness of both FGPNN and HFNN; highest reported accuracy on WBCD is 98.2 %. 3 Material and methods 3.1 Fuzzy resource allocation Fuzzy logic (FL) was invented by Zadeh (1965), he presented a new way of processing data, instead of using a crisp set of rules, he proposed using a partial set membership. FL has found applications in various fields including artificial intelligence. One of the stages of the AIRS algorithm deals with resource competition. The goal of resource competition is to improve the selection probability of high-affinity ARBs for subsequent steps. Resource competition depends on the number of allocated resources for each ARB, this allocation is based on the affinity between the ARB and the antigen, and its class. The resource allocation for each ARB in AIRS2 is calculated by multiplying the stimulation value by clonal rate as it is shown in Eq. (1): ARB.res = ARB.stim clonalrate (1) where ARB.res, ARB.stim and clonal rate are the number of allocated resources, stimulation value and a user defined value respectively. Marwah and Boggess (2002) employed a different resource allocation method, there the antigen classes that occurred more frequently obtained more resources. Both AIRS2 and the algorithm proposed by Marwah use linear resource allocation, and this linearity results in a higher number of memory cells and longer classification time (Polat et al. 2007; Golzari et al. 2008). To solve these problems, Eq. (1) is replaced with Eq. (2). ARB.res = FuzzyStimulation (ARB.stim) clonalrate (2) where FuzzyStimulation method was developed to control the stimulation value in a non-linear way. Figure 1 illustrates the pseudocode for resource allocation of FSRAIRS2. The algorithm uses fuzzy control language to convert the input variable stimulation into degrees of membership for the membership functions defined on this variable; the term resources has been chosen for the output

6 M. R. Saybani et al. Fig. 1 Resource allocation for FSRAIRS2 FOREACH ARB DO FIS fis null // FuzzyStimulation fis FIS.load (control file) //LOAD fuzzy control file (see Figure 2) input stimulation //set input variable fis.evaluate()//evaluate output fis.resources //return resources // use output (= resources) to // calculate resource allocation res output * ClonalRate resources resources + res ARB.orderByResources //order by allocated resources return resources ENDFOR variable because this study wants to use the output value for controlling resource allocation of FSRAIRS2. The linguistic variable for resources has to be converted into a value; this is done through a process called defuzzification. Figure 2 shows the fuzzy control language that is used for resource allocation of FSRAIRS2. To simplify the defuzzification, special membership functions, the singletons, are used. This paper uses Center of gravity for singletons (COGS) as a defuzzification method, which is defined as: ARB.stim on the x-axis gets values between 0 and 1, and its membership value is determined through the input membership functions. The amount of allocated resource is calculated through membership functions; it can be any number between 0 and 10 and is denoted on the x-axis of Fig. 4. The FuzzyStimulation method returns a real value that is then used to calculate the allocated resource according to Eq. (2). Result of defuzzication = ni=1 r i μ i ni=1 μ i (3) where r is output variable, n is the number of singeltons, and μ is membership function. Selection of linguistic values was chosen in a way that ARBs with stimulation values between 0.5 and 1 would get a higher allocated resource number than those ARBs with stimulation values between 0 and 0.5. For the case that no rule activates the defuzzifier, the default value for the output variable is set to zero. Figure 2 shows the rules that were used in this study to infer the fuzzy algorithm. This study used MIN fuzzy operator for activation process, by which the degree of fulfillment of rules acts on the output. MIN operator is defined as: Min(μ 1 (x), μ 2 (x)). MAX fuzzy operator was used for the accumulation of final results. MAX operator is defined as: Max(μ 1 (x), μ 2 (x)). Figures 3 and 4 illustrate the membership functions for the input and output variables stimulation and resources respectively. 3.2 Resource competition with real-world tournament selection mechanism One of useful, popular and robust mechanisms in genetic algorithm is tournament selection. The selection pressure of tournament selection depends directly on the tournament size. When the number of competitors increases, the resulting selection pressure increases too (Miller and Goldberg 1995). Lee et al. (2008) compared RWTS and conventional tournament selection (TS) mechanism and found that the former has a higher selection pressure with a relatively small loss of diversity and higher sampling accuracy than the later. They concluded that under similar selection pressure RWTS maintains more diversity than TS, and that higher pressure and sampling accuracy would improve the performance in a selection strategy. Lee et al. (2008) also argued that excessive use of high pressure in a selection strategy is not appropriate because it may cause a situation of premature convergence. Their results showed that RWTS causes only a small loss of

7 Diagnosing breast cancer with an improved... Fig. 2 Fuzzy control language for resource allocation of FSRAIRS2 FUNCTION_BLOCK Fuzzy_Command_File // Define input variables VAR_INPUT stimulation: REAL; END_VAR // Define output variable VAR_OUTPUT resources : REAL; END_VAR // Fuzzify input variable 'stimulation' FUZZIFY stimulation TERM VeryLow := (0, 1) (0.2, 0) ; TERM Low := (0.1, 0) (0.2,1) (0.3,1) (0.4,0); TERM Medium := (0.3, 0) (0.4,1) (0.5,1) (0.6,0); TERM High := (0.5, 0) (0.6,1) (0.7,1) (0.8,0); TERM VeryHigh := (0.7, 0) (0.8,1) (1,1); END_FUZZIFY // Defuzzify output variable 'resources' DEFUZZIFY resources TERM VeryLow := 0.1; TERM Low := 0.35; TERM Medium := 0.5; TERM High := 0.75; TERM VeryHigh := 1; // Use 'Center Of Gravity Singleton' as // defuzzification method METHOD : COGS; // Default value is 0 (if no rule activates defuzzifier) DEFAULT := 0; END_DEFUZZIFY RULEBLOCK RB // Use 'min' for 'and' AND: MIN; // Use 'min' for activation method ACT: MIN; // Use 'max' for accumulation method ACCU: MAX; RULE 1: IF stimulation IS VeryLow THEN resources IS VeryLow; RULE 2: IF stimulation IS Low THEN resources IS Low; RULE 3: IF stimulation IS Medium THEN resources IS Medium; RULE 4: IF stimulation IS High THEN resources IS High; RULE 5: IF stimulation IS VeryHigh THEN resources IS VeryHigh; END_RULEBLOCK END_FUNCTION_BLOCK diversity and this is an indication that RWTS does not have an excessive high selection pressure. RWTS is widely used to find a winner in a sport game. The players or teams are randomly grouped pair wise in sequence with a neighbor. If the number of players is an odd number, the last individual will be paired with someone from the same tournament level randomly; pairs compete and the winner of each pair goes to the next tournament level. In the next round this procedure repeats by building new pairs and starting the competition. At the end, one champion will be identified. The champion of the tournament is the player with the best fitness value among all participating players. For the resource competition, ARBs of each class represent individuals. ARBs enter the competition. Each ARB competes with a neighbor. This algorithm considers ARB A as a neighbor of ARB B, if ARB B is the first ARB from the same class as ARB A,

8 M. R. Saybani et al. Fig. 3 Membership function for stimulation Fig. 4 Membership function for resources and it is generated in the system after ARB A. The competition depends on the amount of allocated resources for ARBs. This amount is calculated for each ARB during the resource allocation process. This paper uses real-world tournament selection (RWTS) which has shown good result in connection with AIRS1 in theworkofgolzari et al. (2009a) and Golzari et al. (2009b). Figure 5 illustrates the pseudocode for resource competition of FSRAIRS2. As seen in Fig. 5, resources are allocated to ARBs, based on their fuzzy stimulation value and then pairs of ARBs are built as described above. These pairs undergo a competition, and the winner goes to next level and the looser will be removed from the list. At the end the strongest will survive, and resources allocated to the looser are removed. 3.3 Support vector machines Support vector machine (SVM) has a root in advanced statistical learning theory (SVMS.org 2010) and is known as one of the most accurate and robust algorithms among all well-known algorithms. It has profound theoretical foundations, is insensitive to the number of dimensions, and needs only a dozen instances for training (Wu et al. 2008). Superior performances of SVMs in comparison with other algorithms have been documented in many types of data including biomedical data (Statnikov 2011). SVM is able to maximize the predictive accuracy of a model without overfitting the training instances. Applications of SVM can be found in various disciplines, including bioinformatics, bio sequence analysis, classification, customer relationship management, concept extraction, text mining, character recognition, and voice and speech recognition. Vapnik et al. at AT&T Bell Laboratories introduced SVM in 1995; however, the soft margin version of SVM was introduced later by Cortes and Vapnik (1995). SVM tries to find an optimal hyperplanes in a multidimensional space which separate cases of different classes. An ideal hyperplane can be constructed through an iterative training algorithm. In a two-class learning task, for example, SVM makes sure that the best hyperplane is found through maximizing the margin between the two classes. The margin is the shortest distance between the closest data points to a point on the hyperplane (Wu et al. 2008). In cases when training data cannot be separated without error, SVM needs to separate the training set with a minimal number of errors, which are also known as classification error (Cortes and Vapnik 1995). SVM tries to exclude training errors from the training set. The error-free part of the dataset can be separated without errors (Cortes and Vapnik 1995). SVM models use a cost parameter, C, to allow some flexibility in separating the classes. C creates a soft margin that allows some misclassifications (Cortes and Vapnik 1995). Increasing C improves the classification accuracy for the training data; however, this may also lead to overfitting, because it increases the cost of misclassifying points too (Dtreg.com 2012). The problem of finding the optimal hyperplane is an optimization problem and Cortes et al. showed that the solution of this optimization problem is obtained through the saddle point of Lagrangian function (Cortes and Vapnik 1995). This classical optimization problem is solved by standard Quadratic Programming (QP) techniques and programs (Cortes and Vapnik 1995). The complexity of calculations does not depend on the dimension of the input space but on the number of support vectors (SVs), which is a small subset of the training vector (Laura Auria 2008) they assists model interpretation (Campbell and Ying 2011). Similarity and dissimilarity of data objects are quantified by kernels, which can be constructed for a broad range of data objects (Campbell and Ying 2011). There are a number of kernels that can be used in SVM models. These include linear, polynomial, radial basis function (RBF) and sigmoid (Statsoft 2013). This research uses RBF which is the most popular choice of kernel types used in SVMs. This is chiefly due to their finite and localized

9 Diagnosing breast cancer with an improved... Fig. 5 Pseudocode for resource competition of FSRAIRS2 Resources Allowed Total Resources Fuzzy-Stimulate ARBs //Allocate resources to ARBs based on their Fuzzy-Stimulated value Allocated Resource Sum of allocated resources of ARBs Current pair first ARB pair //Continue until the resources are below a threshold DOWHILE (Allocated Resource > Allowed Resource) NumResToBeRemoved Allocated Resource Resource Allowed FOR Current pair MinARB.resource Minimum (of resources allocated to the elements of the pair) //Check if element can be removed IF MinARB.resource NumResToBeRemoved Allocated Resource Allocated Resource MinARB.resource Delete MinARB.resource from population ELSE //Decrement resources MinARB.resource MinARB.resource NumResToBeRemoved Allocated Resource Allocated Resource NumResToBeRemoved ENDIF IF current pair is last ARB pair GOTO first ARB pair ELSE GOTO next ARB pair ENDIF ENDFOR ENDDOWHILE //Compare Allocated Resources of ARBs //The best ARB is always the one with the most resources BestARB ARB with the most Allocated Resources Return BestARB responses across the full range of the real x-axis (Statsoft 2013). The RBF is defined as: K (X i, Y i ) = exp( γ X i Y i 2 ), where γ is a kernel parameter. Increasing the value of γ improves the classification accuracy, but this can also lead to overfitting. A comprehensive explanation of SVM can be found in the work of Statnikov (2011), Cortes and Vapnik (1995) and Chang and Lin (2011). 3.4 Pseudocode of FSRAIRS2 In general, AIRS algorithm was developed with the goal of setting memory cells that can be used to classify data. The artificial memory cells represent several characteristics of human immune systems. These memory cells represent memory B Cells which have undergone a maturation process in the human body. FSRAIRS2 has inherited many features from AIRS2 and AIRS1, therefore following description of pseudocode has common structure that was introduced by Watkins et al. (2004). Figure 6 illustrates the pseudocode for FSRAIRS2 algorithm that is used in this paper. This is a new hybrid algorithm that uses fuzzy logic, RWTS, SVM, and general concept of AIRS2. FSRAIRS2 consists of five stages: in the first stage, data are normalized and initialized and there is an option for seeding the memory cell pool. This pool represents a collection of recognition elements that form the classifier which is produced at the end of training phase. In the second stage, the algorithm uses training data only once to prepare the classifier. One at a time, each antigen gets exposed to the memory pool. The antigen stimulates the recognition cells in the pool, and each cell is assigned a stimulation value. For the affinity maturation process, the memory cell with the maximum stimulation value is selected as the best match memory cell. The selected memory cell is used to create a number of mutated clones which are then added to the ARB pool. FSRAIRS2 uses this pool to refine the mutated clones of the best match memory cell for a specific antigen. In the third stage, the algorithm deals with competition for resources in the development process of a candidate memory cell. After the ARB pool has been generated, the process

10 M. R. Saybani et al. I. Initialize and normalize dataset II. Seed the memory cell pool (M), if desired. III. For each training example (antigen) do the following: 1. If M is empty, add antigen to M. 2. Select the memory cell (mc) in M of the same classification having the highest affinity to antigen. 3. Clone mc in proportion to its affinity to antigen. 4. Mutate each clone and add to the B-cell pool (ARB). 5. Use Fuzzy-logic to allocate resources to ARB Use RWTS-mechanism for removing the weak cells (population control of ARB). 6. Calculate the average stimulation of ARB to antigen and check for termination. If the termination condition is satisfied, goto step Clone and mutate a random selection of B-cells in ARB based upon their stimulation. 8. Loop back to step Select the B-cell in ARB with the highest affinity to antigen (candidate). If candidate has a higher affinity to antigen than mc add candidate to M. If mc and candidate are sufficiently similar, then remove mc from M. return M //prepare content of M for SVM IV. Transfer M to SVM format //run SVM as classifier V. Perform SVM classification using M. Fig. 6 Pseudocode for FSRAIRS2 of competition starts. Competition for resources is necessary to control the size of the ARB pool, as well as to promote such ARBs that have greater affinity (stimulation) to the antigen being trained on. The goal of this stage is to develop a memory cell that is most successful in accurately classifying a given antigen. To achieve this goal the algorithm uses three mechanisms. The first mechanism is about system wide resource competition. For a given ARB, resources are allocated. The criterion for the size of resource allocation is normalized stimulation value of the ARB, this value also represents the fitness of ARB to become a recognizer of an antigen. The second mechanism is about using mutation for shape-space exploration and diversification. The third mechanism determines the stop condition which depends on the average stimulation threshold. In step 5 of the 3rd stage, FSRAIRS2 uses fuzzy logic to refine resource allocation for ARBs and then it uses RWTS for determining and removing the weak cells during resource competition (population control of memory cell pool). Then the potential candidate memory cell is introduced into the set of already established memory cells for training. If the memory cell candidate is more stimulated by the training antigen than memory cell match, then the memory cell candidate will be added to the set of memory cells. If the affinity between memory cell candidate and memory cell match is less than a threshold, then memory cell candidate replaces memory cell match in the pool of memory cells. The above process repeats until all antigens have been introduced to the system.

11 Diagnosing breast cancer with an improved... Table 2 Parameters used for AIRS1, AIRS2 and FSRAIRS2 algorithms Parameters AIRS1 AIRS2 FSRAIRS2 Affinity threshold scalar (ATS) Clonal rate Hyper-mutation rate Seed cell Stimulation value Total resources kofknn 3 3 n/a SVM Type n/a n/a V-SVC Kernel function n/a n/a RBF γ n/a n/a 1 C n/a n/a 7 Cash memory size n/a n/a 100 MB In stage 4, the proposed algorithm prepares the content of M for the SVM classifier by reformatting the content to the format used by the SVM. In stage 5, when training for all antigens is completed, this algorithm presents the formatted M to the SVM classifier for classification. This research used WLSVM which is a wrapper of Lib- SVM (Chang and Lin 2011)insideWEKA(Hall et al. 2009) to address the classification problem. 3.5 Used data set This study used WBCD, taken from UCI machine learning repository (Frank and Asuncion 2010). The dataset consists of 699 samples that were collected by Dr. W.H. Wolberg at the University of Wisconsin Madison Hospitals taken from needle aspirates from human breast cancer tissue. The WBCD consists of nine features obtained from fine needle aspirates, each of which is represented as an integer value between 1 and 10. The measured features are as follows: (F1) Clump Thickness, (F2) Uniformity of Cell Size, (F3) Uniformity of Cell Shape, (F4) Marginal Adhesion, (F5) Single Epithelial Cell Size, (F6) Bare Nucleoi, (F7) Bland Chromatin, (F8) Normal Nucleoi, and (F9) Mitoses. 458 (65.5 %) of the samples belong to benign, and remaining 241 (34.5 %) samples belong to malignant class. Table 2 shows parameter settings that were used to run the algorithms. The values for the parameters were obtained through stepwise modifications of each parameters aiming at maximizing the classification accuracy. For determining a value for the ATS, we used values in the range of [0.1, 0.5] with an interval of 0.1; for the clonal rate, we used the values in the range of [8, 12] with an interval of 1; for the hypermutation rate, we used values in the range of [0.1, 3] with the of interval 0.1; for the seed cell, we used values in the range of [1, 10] with and interval of 1; for the stimulation value, we used values in the range of [1, 10] with an interval of 1; for the total resources, we used values in the range of [100, 300] with an interval of 50; for the k, we used the values in the range of [1, 5] with an interval of 1; for the SVM Type, we used C-SVC and nu-svc; for the kernel function, we used linear, RBF, polynomial and sigmoid functions; and for the γ, we used values in the range of [1, 5] with an interval of 1; for the C, we used values in the range of [1, 10] with an interval of 1; and for the memory size, we used values in the range of [50, 200] with an interval of 50. To reduce the number of all possible situations for so many parameters, as soon as we reached optimal solution with the first parameter, we kept this constant and moved to vary the second parameter. As soon as the optimal value was reached, we kept the first two parameters constant and moved to the next parameter. We repeated this process for all parameters, when all values were determined then we moved to determine values for the SVM part. The displayed values are the optimal values we obtained for each algorithm (AIRS1, AIRS2, and FSRAIRS2). As it can be seen from the Table 2, FSRAIRS2 has more parameters than the AIRS1 and the AIRS2 that can be set by the user, this gives the user more flexibility in optimizing the model. Best results were obtained with the above settings. 3.6 Applied parameters 4 Performance and evaluation measurements 4.1 Classification accuracy The accuracy of a classifier refers to the ability of a given classifier to correctly predict the class label of new or previously unseen data. The accuracy is calculated by dividing the number of truly classified instances by the number of instances

12 M. R. Saybani et al. in the test phase. The classification accuracy for the dataset was measured according to Eq. (4)(Watkins 2001): T i=1 accuracy (T ) = assess (t i), t i T (4) T { 1, if classify (t) = t c aasset (t) = (5) 0, otherwise where T is the set of data items to be classified (the test set), t T, t c is the class of the item t, and classify (t) returns the classification of t by AIRS. 4.2 N-fold cross-validation Classification accuracy is a critical design goal of any classifier (Watkins 2001). To demonstrate that test results are more valuable, researchers use n-fold cross-validation. It helps to minimize the bias associated with the random sampling used during training phase (Delen et al. 2005). N-fold cross-validation is a robust approach to estimate the predictive accuracy of classification algorithms. In this approach, the instances are randomly divided to n equal stratified subsets. Each instance is put in one subset. At each iteration, n 1 subsets are merged to form the training set and the classification accuracy of the algorithm is measured on the remaining subset. This process is repeated n times, choosing a different subset as the test set each time. Therefore, all data instances have been used n 1 times for training and once for testing. The final predictive accuracy is computed over all folds in the usual manner with dividing the number of correct classifications taken over all folds by the number of data instances in all folds. This study used tenfold cross-validation for evaluation purposes. 4.3 Data reduction One important feature of AIRS2 is its capability to perform the classification task with the reduced number of data instances. These reduced data sets are evolved memory cells that are produced from the original training instances, which are used to characterize a given class of data from the original training data to the evolved set of memory cells (Watkins et al. 2004). Therefore, it is important to verify how the proposed algorithm performs. Data reduction percentage is calculated from the Eq. (6)(Brownlee 2005). Data ( Reduction = 1 Total Number Memory Cells Total Number Training Instances ) 100 (6) 4.4 Student s t test This paper uses Student s t test for testing the statistical significance between the mean of two results (Jackson 2012; Ian et al. 2011). In order to make decisions, it is common to make guesses or assumptions about the populations involved. Such assumptions are called statistical hypotheses, which may or may not be true. The hypotheses consist of a null hypothesis (Ho) which is tested against an alternative hypothesis denoted by (H 1 ). There are two types of errors when we make decisions. Type I error occurs when we decide to reject a hypothesis when it should be accepted; however, when we accept a hypothesis when it should be rejected, then we say, we made a Type II error. In both cases a wrong decision has been made. In practice, the maximum probability with which we accept to make a Type I error is known as the level of significance of the test. It is common to represent this probability by α, and (1 α) represents the level of confidence of the test. Equations (7) and (8) present the null hypothesis and its alternative hypothesis: H 0 : AIRS2 performance mean = Proposed Algorithm performance mean (7) H 1 : AIRS2 performance mean = Proposed Algorithm performance mean (8) In other words this paper examines the hypothesis (H 0 ) that the mean values of two results are the same. The two-tailed t test is used in this paper to decide whether we can accept or reject the null hypothesis. We determine the p value which is the probability that the two means are the same. If p <α, then we infer that this probability is not enough for the difference between the means to be based merely on chance, and therefore we conclude that there is a statistically significant difference in the means. This paper uses α = Results Comparing the mean classification accuracy for the WBCD (Original) obtained with the AIRS2 (n 1 = 100, M = %, SD = 2.19) to the one obtained with the FSRAIRS2 (n 2 = 100, M = 100 %, SD = 0.0, p < 0.05) the results show that the differences are statistically significant at the 95 % significant level (α = 0.05) for this data set and that FSRAIRS2 achieved significantly higher classification accuracy than AIRS2. Both sensitivity and specificity for FSRARIAS2 on WBCD were 100 %. For the WBCD data set, reported and obtained classification accuracies of AIRS-based classifiers are shown in Table 3: Polat et al. (2007) achieved an accuracy of 97.2 and % for AIRS and Fuzzy-AIRS, respectively. Golzari Hormozi (2011) reported accuracies of 97.1 % and %

13 Diagnosing breast cancer with an improved... Table 3 Comparison of classification accuracy obtained by AIRS-based models on WBCD Researcher Method Classification accuracy (%) This study FSRAIRS Polat et al. (2007) Fuzzy-AIRS Golzari Hormozi (2011) RWTSAIRS Polat et al. (2007) AIRS Goodman et al. (2002) AIRS 97.2 Golzari Hormozi (2011) AIRS This study AIRS This study AIRS Table 4 Comparing classification accuracy of FSRAIRS2 with other classifiers on WBCD Researcher Method Classification accuracy (%) This study FSRAIRS2* 100 Polat et al. (2007) Fuzzy-AIRS* Naghibi et al. (2012) FGPNN, FNN 98.2 Setiono (2000) Neuro-Rule 2a 98.1 Peña-Reyes and Sipper (1999) Fuzzy-GA Bennett and Blue (1998) SVM This study AIRS1* Ster and Dobnikar (1996) LDA* 96.8 Goodman et al. (2002) Big-LVQ* 96.8 Goodman et al. (2002) Optimized LVQ* 96.7 This study MLP* 96.6 This study AIRS2* Nauck and Kruse (1999) NEFCLASS* Hamilton et al. (1996) RIAC* Quinlan (1996) C4.5* This study Naïve Bayes* 93.3 This study J48* 93 This study LibSVM* * tenfold cross-validation, +5-fold cross-validation for AIRS1 and RWTSAIRS1 respectively. Goodman et al. (2002) obtained an accuracy of 97.2 %. All results of this study were obtained from averaging 10 runs and 10-way cross-validation. The high classification accuracy obtained with the proposed model can be explained as the positive effects of multi parameters, our results for data reduction (see below subsection Data Selection) suggest that some unimportant memory cells have been discarded and this may have led to improved classification performance. This research introduced SVM into AIRS and SVM is also known to be a robust classifier that has shown good performance in many applications (Joachims 1998). Although both SVM and KNN are considered to be among the top 10 classifiers (Wu et al. 2008), independent of the application of KNN in AIRS, result of studies has shown that SVM outperforms knn (Huang et al. 2011) especially when the number of features increases (Hmeidi et al. 2008). Researchers of a recent study have also argued that using knn reduces the classification accuracy of AIRS (Jenhani and Elouedi 2012). Putting together, improving data reduction capability and using SVM with optimized parameter settings have led to this high classification performance of the proposed model. 5.1 Comparing classification accuracy of FSRAIRS2 with other classifiers The classification accuracy obtained with FSRAIRS2 for the Wisconsin breast cancer dataset is the highest among the classifiers reported by other researchers. The comparison of FSRAIRS2 with these classifiers with respect to the classification accuracy is illustrated in Table Data reduction Results show that both AIRS2 and FSRAIRS2 are capable of reducing the data sets needed to perform the classification tasks. As Table 5 shows, FSRAIRS2 was able to perform the

An Efficient and Effective Immune Based Classifier

An Efficient and Effective Immune Based Classifier Journal of Computer Science 7 (2): 148-153, 2011 ISSN 1549-3636 2011 Science Publications An Efficient and Effective Immune Based Classifier Shahram Golzari, Shyamala Doraisamy, Md Nasir Sulaiman and Nur

More information

ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS

ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS 1 ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS DONALD E. GOODMAN, JR. Mississippi State University Department of Psychology Mississippi State, Mississippi LOIS C. BOGGESS Mississippi

More information

ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC

ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Hiba Khelil, Abdelkader Benyettou SIMPA Laboratory University of Sciences and Technology of Oran, PB 1505 M naouer, 31000 Oran, Algeria hibakhelil@yahoo.fr,

More information

AUTOMATED INTERPRETABLE COMPUTATIONAL BIOLOGY IN THE CLINIC: A FRAMEWORK TO PREDICT DISEASE SEVERITY AND STRATIFY PATIENTS FROM CLINICAL DATA

AUTOMATED INTERPRETABLE COMPUTATIONAL BIOLOGY IN THE CLINIC: A FRAMEWORK TO PREDICT DISEASE SEVERITY AND STRATIFY PATIENTS FROM CLINICAL DATA Interdisciplinary Description of Complex Systems 15(3), 199-208, 2017 AUTOMATED INTERPRETABLE COMPUTATIONAL BIOLOGY IN THE CLINIC: A FRAMEWORK TO PREDICT DISEASE SEVERITY AND STRATIFY PATIENTS FROM CLINICAL

More information

Diagnosis of Breast Cancer by Combining the Techniques of Data Mining and Artificial Immune System

Diagnosis of Breast Cancer by Combining the Techniques of Data Mining and Artificial Immune System Diagnosis of Breast Cancer by Combining the Techniques of Data Mining and Artificial Immune System 1 Esmat Banihashem; 2 Touraj Banirostam 1 Department of Computer Engineering, Electronic Branch, Islamic

More information

AN ALGORITHM FOR REMOTE SENSING IMAGE CLASSIFICATION BASED ON ARTIFICIAL IMMUNE B CELL NETWORK

AN ALGORITHM FOR REMOTE SENSING IMAGE CLASSIFICATION BASED ON ARTIFICIAL IMMUNE B CELL NETWORK AN ALGORITHM FOR REMOTE SENSING IMAGE CLASSIFICATION BASED ON ARTIFICIAL IMMUNE B CELL NETWORK Shizhen Xu a, *, Yundong Wu b, c a Insitute of Surveying and Mapping, Information Engineering University 66

More information

MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE

MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE Wala Abedalkhader and Noora Abdulrahman Department of Engineering Systems and Management, Masdar Institute of Science and Technology, Abu Dhabi, United

More information

SEISMIC ATTRIBUTES SELECTION AND POROSITY PREDICTION USING MODIFIED ARTIFICIAL IMMUNE NETWORK ALGORITHM

SEISMIC ATTRIBUTES SELECTION AND POROSITY PREDICTION USING MODIFIED ARTIFICIAL IMMUNE NETWORK ALGORITHM Journal of Engineering Science and Technology Vol. 13, No. 3 (2018) 755-765 School of Engineering, Taylor s University SEISMIC ATTRIBUTES SELECTION AND POROSITY PREDICTION USING MODIFIED ARTIFICIAL IMMUNE

More information

Predictive Analytics Using Support Vector Machine

Predictive Analytics Using Support Vector Machine International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 02, March 2017 ISSN: 2455-3778 http://www.ijmtst.com Predictive Analytics Using Support Vector Machine Ch.Sai

More information

BIOINFORMATICS THE MACHINE LEARNING APPROACH

BIOINFORMATICS THE MACHINE LEARNING APPROACH 88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and

More information

Immune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor

Immune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor Immune Programming Payman Samadi Supervisor: Dr. Majid Ahmadi March 2006 Department of Electrical & Computer Engineering University of Windsor OUTLINE Introduction Biological Immune System Artificial Immune

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

Swarm Intelligence Approach for Breast Cancer Diagnosis

Swarm Intelligence Approach for Breast Cancer Diagnosis Swarm Intelligence Approach for Breast Cancer Diagnosis Hoda Zamani Faculty of Computer Engineering, Najafabad branch, Islamic Azad University, Najafabad, Iran Mohammad-Hossein Nadimi-Shahraki* Faculty

More information

Breast Cancer Diagnostic Factors Elimination via Evolutionary Neural Network Pruning

Breast Cancer Diagnostic Factors Elimination via Evolutionary Neural Network Pruning Breast Cancer Diagnostic Factors Elimination via Evolutionary Neural Network Pruning Adam V. Adamopoulos Democritus University of Thrace, Department of Medicine, Medical Physics Laboratory, 681 00, Alexandroupolis,

More information

Random forest for gene selection and microarray data classification

Random forest for gene selection and microarray data classification www.bioinformation.net Hypothesis Volume 7(3) Random forest for gene selection and microarray data classification Kohbalan Moorthy & Mohd Saberi Mohamad* Artificial Intelligence & Bioinformatics Research

More information

When to Book: Predicting Flight Pricing

When to Book: Predicting Flight Pricing When to Book: Predicting Flight Pricing Qiqi Ren Stanford University qiqiren@stanford.edu Abstract When is the best time to purchase a flight? Flight prices fluctuate constantly, so purchasing at different

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING

CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING 79 CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING The present chapter proposes a hybrid intelligent approach (IPSO-AIS) using Improved Particle Swarm Optimization (IPSO) with

More information

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New

More information

Artificial Immune Systems

Artificial Immune Systems Artificial Immune Systems Dr. Mario Pavone Department of Mathematics & Computer Science University of Catania mpavone@dmi.unict.it http://www.dmi.unict.it/mpavone/ Biological Immune System (1/4) Immunology

More information

Data Selection for Semi-Supervised Learning

Data Selection for Semi-Supervised Learning Data Selection for Semi-Supervised Learning Shafigh Parsazad 1, Ehsan Saboori 2 and Amin Allahyar 3 1 Department Of Computer Engineering, Ferdowsi University of Mashhad Mashhad, Iran Shafigh.Parsazad@stu-mail.um.ac.ir

More information

An optimization framework for modeling and simulation of dynamic systems based on AIS

An optimization framework for modeling and simulation of dynamic systems based on AIS Title An optimization framework for modeling and simulation of dynamic systems based on AIS Author(s) Leung, CSK; Lau, HYK Citation The 18th IFAC World Congress (IFAC 2011), Milano, Italy, 28 August-2

More information

Fraud Detection for MCC Manipulation

Fraud Detection for MCC Manipulation 2016 International Conference on Informatics, Management Engineering and Industrial Application (IMEIA 2016) ISBN: 978-1-60595-345-8 Fraud Detection for MCC Manipulation Hong-feng CHAI 1, Xin LIU 2, Yan-jun

More information

Artificial Immune Systems Tutorial

Artificial Immune Systems Tutorial Artificial Immune Systems Tutorial By Dr Uwe Aickelin http://www.aickelin.com Overview Biological Immune System. Artificial Immune System (AIS). Comparison to other Algorithms. Applications of AIS: Data

More information

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST Introduction to Artificial Intelligence Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST Chapter 9 Evolutionary Computation Introduction Intelligence can be defined as the capability of a system to

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Index Terms: Customer Loyalty, SVM, Data mining, Classification, Gaussian kernel

Index Terms: Customer Loyalty, SVM, Data mining, Classification, Gaussian kernel Volume 4, Issue 12, December 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Gaussian Kernel

More information

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1 International Conference on Management Science and Management Innovation (MSMI 2014) A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG

More information

Prediction of Success or Failure of Software Projects based on Reusability Metrics using Support Vector Machine

Prediction of Success or Failure of Software Projects based on Reusability Metrics using Support Vector Machine Prediction of Success or Failure of Software Projects based on Reusability Metrics using Support Vector Machine R. Sathya Assistant professor, Department of Computer Science & Engineering Annamalai University

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Biological immune systems

Biological immune systems Immune Systems 1 Introduction 2 Biological immune systems Living organism must protect themselves from the attempt of other organisms to exploit their resources Some would-be exploiter (pathogen) is much

More information

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions. Aickelin, Uwe (2003) Artificial Immune System and Intrusion Detection Tutorial. In: Introduction Tutorials in Optimization, Search and Decision Support Methodologies, Nottingham, UK. Access from the University

More information

An Implementation of genetic algorithm based feature selection approach over medical datasets

An Implementation of genetic algorithm based feature selection approach over medical datasets An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,

More information

Strength in numbers? Modelling the impact of businesses on each other

Strength in numbers? Modelling the impact of businesses on each other Strength in numbers? Modelling the impact of businesses on each other Amir Abbas Sadeghian amirabs@stanford.edu Hakan Inan inanh@stanford.edu Andres Nötzli noetzli@stanford.edu. INTRODUCTION In many cities,

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

Immune Network based Ensembles

Immune Network based Ensembles Immune Network based Ensembles Nicolás García-Pedrajas 1 and Colin Fyfe 2 1- Dept. of Computing and Numerical Analysis University of Córdoba (SPAIN) e-mail: npedrajas@uco.es 2- the Dept. of Computing University

More information

Implementation of Artificial Immune System Algorithms

Implementation of Artificial Immune System Algorithms Implementation of Artificial Immune System Algorithms K. Sri Lakshmi Associate Professor, Department of CSE Abstract An artificial immune system (AIS) that is distributed, robust, dynamic, diverse and

More information

Predicting Restaurants Rating And Popularity Based On Yelp Dataset

Predicting Restaurants Rating And Popularity Based On Yelp Dataset CS 229 MACHINE LEARNING FINAL PROJECT 1 Predicting Restaurants Rating And Popularity Based On Yelp Dataset Yiwen Guo, ICME, Anran Lu, ICME, and Zeyu Wang, Department of Economics, Stanford University Abstract

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

A Review of a Novel Decision Tree Based Classifier for Accurate Multi Disease Prediction Sagar S. Mane 1 Dhaval Patel 2

A Review of a Novel Decision Tree Based Classifier for Accurate Multi Disease Prediction Sagar S. Mane 1 Dhaval Patel 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 02, 2015 ISSN (online): 2321-0613 A Review of a Novel Decision Tree Based Classifier for Accurate Multi Disease Prediction

More information

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm Kyung-Joong Kim and Sung-Bae Cho Department of Computer Science, Yonsei University, 134 Shinchon-dong,

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

A Soft Classification Model for Vendor Selection

A Soft Classification Model for Vendor Selection A Soft Classification Model for Vendor Selection Arpan K. Kar, Ashis K. Pani, Bijaya K. Mangaraj, and Supriya K. De Abstract This study proposes a pattern classification model for usage in the vendor selection

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Fuzzy Logic based Short-Term Electricity Demand Forecast

Fuzzy Logic based Short-Term Electricity Demand Forecast Fuzzy Logic based Short-Term Electricity Demand Forecast P.Lakshmi Priya #1, V.S.Felix Enigo #2 # Department of Computer Science and Engineering, SSN College of Engineering, Chennai, Tamil Nadu, India

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 International

More information

Artificial Immune Systems and Data Mining: Bridging the Gap with Scalability and Improved Learning

Artificial Immune Systems and Data Mining: Bridging the Gap with Scalability and Improved Learning Artificial Immune Systems and Data Mining: Bridging the Gap with Scalability and Improved Learning Olfa Nasraoui, Fabio González Cesar Cardona, Dipankar Dasgupta The University of Memphis A Demo/Poster

More information

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM Abbas Heiat, College of Business, Montana State University-Billings, Billings, MT 59101, 406-657-1627, aheiat@msubillings.edu ABSTRACT CRT and ANN

More information

Abdulrazzaq A. Abdulrazzaq and Jaffer K. Ali Basrah, Iraq

Abdulrazzaq A. Abdulrazzaq and Jaffer K. Ali Basrah, Iraq International Journal of Mechanical Engineering and Technology (IJMET) Volume 10, Issue 01, January 2019, pp. 1686-1709, Article ID: IJMET_10_01_169 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=10&itype=01

More information

Feature selection methods for SVM classification of microarray data

Feature selection methods for SVM classification of microarray data Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments

More information

Using AI to Make Predictions on Stock Market

Using AI to Make Predictions on Stock Market Using AI to Make Predictions on Stock Market Alice Zheng Stanford University Stanford, CA 94305 alicezhy@stanford.edu Jack Jin Stanford University Stanford, CA 94305 jackjin@stanford.edu 1 Introduction

More information

Amit Kumar Nandanwar A.P. CSE Department, VNS College, Bhopal, India

Amit Kumar Nandanwar A.P. CSE Department, VNS College, Bhopal, India Volume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Support Classifier

More information

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications

More information

Molecular Diagnosis Tumor classification by SVM and PAM

Molecular Diagnosis Tumor classification by SVM and PAM Molecular Diagnosis Tumor classification by SVM and PAM Florian Markowetz and Rainer Spang Practical DNA Microarray Analysis Berlin, Nov 2003 Max-Planck-Institute for Molecular Genetics Dept. Computational

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION This thesis is aimed to develop a computer based aid for physicians. Physicians are prone to making decision errors, because of high complexity of medical problems and due to cognitive

More information

Feature Selection Algorithms in Classification Problems: An Experimental Evaluation

Feature Selection Algorithms in Classification Problems: An Experimental Evaluation Feature Selection Algorithms in Classification Problems: An Experimental Evaluation MICHAEL DOUMPOS, ATHINA SALAPPA Department of Production Engineering and Management Technical University of Crete University

More information

MINIMIZE THE MAKESPAN FOR JOB SHOP SCHEDULING PROBLEM USING ARTIFICIAL IMMUNE SYSTEM APPROACH

MINIMIZE THE MAKESPAN FOR JOB SHOP SCHEDULING PROBLEM USING ARTIFICIAL IMMUNE SYSTEM APPROACH MINIMIZE THE MAKESPAN FOR JOB SHOP SCHEDULING PROBLEM USING ARTIFICIAL IMMUNE SYSTEM APPROACH AHMAD SHAHRIZAL MUHAMAD, 1 SAFAAI DERIS, 2 ZALMIYAH ZAKARIA 1 Professor, Faculty of Computing, Universiti Teknologi

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

Solving Protein Folding Problem Using Hybrid Genetic Clonal Selection Algorithm

Solving Protein Folding Problem Using Hybrid Genetic Clonal Selection Algorithm 94 Solving Protein Folding Problem Using Hybrid Genetic Clonal Selection Algorithm Adel Omar Mohamed and Abdelfatah A. Hegazy, Amr Badr College of Computing & Information Technology, Arab Academy Abstract:

More information

An Introduction to Artificial Immune Systems

An Introduction to Artificial Immune Systems An Introduction to Artificial Immune Systems Jonathan Timmis Computing Laboratory University of Kent at Canterbury CT2 7NF. UK. J.Timmis@kent.ac.uk http:/www.cs.kent.ac.uk/~jt6 AIS October 2003 1 Novel

More information

A New Approach to Artificial Immune Systems and its Application in Constructing On-line Learning Neuro-Fuzzy Systems

A New Approach to Artificial Immune Systems and its Application in Constructing On-line Learning Neuro-Fuzzy Systems The Open Artificial Intelligence Journal, 2008, 2, -0 A New Approach to Artificial Immune Systems and its Application in Constructing On-line Learning Neuro-Fuzzy Systems Mu-Chun Su *,, Po-Chun Wang 2

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision Trees

Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision Trees Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision Trees Daishi Yajima, Takenao Ohkawa, Kouhei Muroi, and Hiromasa Imaishi Abstract Clarifying the interaction between cytochrome P450 (P450)

More information

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 MATLAB 을이용한머신러닝 ( 기본 ) Senior Application Engineer 엄준상과장 2015 The MathWorks, Inc. 2 Machine Learning is Everywhere Solution is too complex for hand written rules or equations

More information

Predicting Corporate 8-K Content Using Machine Learning Techniques

Predicting Corporate 8-K Content Using Machine Learning Techniques Predicting Corporate 8-K Content Using Machine Learning Techniques Min Ji Lee Graduate School of Business Stanford University Stanford, California 94305 E-mail: minjilee@stanford.edu Hyungjun Lee Department

More information

Internet Electronic Journal of Molecular Design

Internet Electronic Journal of Molecular Design CODEN IEJMAT ISSN 1538 6414 Internet Electronic Journal of Molecular Design November 2006, Volume 5, Number 11, Pages 542 554 Editor: Ovidiu Ivanciuc Special issue dedicated to Professor Lemont B. Kier

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.

More information

Neural Networks and Applications in Bioinformatics

Neural Networks and Applications in Bioinformatics Contents Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Biological problem: promoter modeling Basics of neural networks Perceptrons

More information

Forecasting Electricity Consumption with Neural Networks and Support Vector Regression

Forecasting Electricity Consumption with Neural Networks and Support Vector Regression Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 58 ( 2012 ) 1576 1585 8 th International Strategic Management Conference Forecasting Electricity Consumption with Neural

More information

NEW RESULTS IN COMPUTER AIDED DIAGNOSIS (CAD) OF BREAST CANCER USING A RECENTLY DEVELOPED SVM/GRNN ORACLE HYBRID

NEW RESULTS IN COMPUTER AIDED DIAGNOSIS (CAD) OF BREAST CANCER USING A RECENTLY DEVELOPED SVM/GRNN ORACLE HYBRID 1 NEW RESULTS IN COMPUTER AIDED DIAGNOSIS (CAD) OF BREAST CANCER USING A RECENTLY DEVELOPED SVM/GRNN ORACLE HYBRID WALKER H. LAND, JR., Research Professor Binghamton NY TIMOTHY MASTERS President,TMAIC

More information

A resource limited artificial immune system algorithm for supervised classification of multi/hyper-spectral remote sensing imagery

A resource limited artificial immune system algorithm for supervised classification of multi/hyper-spectral remote sensing imagery International Journal of Remote Sensing Vol. 28, No. 7, 10 April 2007, 1665 1686 A resource limited artificial immune system algorithm for supervised classification of multi/hyper-spectral remote sensing

More information

Department of Economics, University of Michigan, Ann Arbor, MI

Department of Economics, University of Michigan, Ann Arbor, MI Comment Lutz Kilian Department of Economics, University of Michigan, Ann Arbor, MI 489-22 Frank Diebold s personal reflections about the history of the DM test remind us that this test was originally designed

More information

A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes

A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes M. Ghazanfari and M. Mellatparast Department of Industrial Engineering Iran University

More information

Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development

Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development 1 Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development Alexander Statnikov M.S., Constantin F. Aliferis M.D.,

More information

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS Liviu Lalescu, Costin Badica University of Craiova, Faculty of Control, Computers and Electronics Software Engineering Department, str.tehnicii, 5, Craiova,

More information

Classification in Marketing Research by Means of LEM2-generated Rules

Classification in Marketing Research by Means of LEM2-generated Rules Classification in Marketing Research by Means of LEM2-generated Rules Reinhold Decker and Frank Kroll Department of Business Administration and Economics, Bielefeld University, D-33501 Bielefeld, Germany;

More information

Permutation Free Encoding Technique for Evolving Neural Networks

Permutation Free Encoding Technique for Evolving Neural Networks Permutation Free Encoding Technique for Evolving Neural Networks Anupam Das, Md. Shohrab Hossain, Saeed Muhammad Abdullah, and Rashed Ul Islam Department of Computer Science and Engineering, Bangladesh

More information

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS Darie MOLDOVAN, PhD * Mircea RUSU, PhD student ** Abstract The objective of this paper is to demonstrate the utility

More information

Gene Selection in Cancer Classification using PSO/SVM and GA/SVM Hybrid Algorithms

Gene Selection in Cancer Classification using PSO/SVM and GA/SVM Hybrid Algorithms Laboratoire d Informatique Fondamentale de Lille Gene Selection in Cancer Classification using PSO/SVM and GA/SVM Hybrid Algorithms Enrique Alba, José GarcíaNieto, Laetitia Jourdan and ElGhazali Talbi

More information

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Predictive Genomics, Biology, Medicine Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Ex. Find mean m and standard deviation s for

More information

The usage of Big Data mechanisms and Artificial Intelligence Methods in modern Omnichannel marketing and sales

The usage of Big Data mechanisms and Artificial Intelligence Methods in modern Omnichannel marketing and sales The usage of Big Data mechanisms and Artificial Intelligence Methods in modern Omnichannel marketing and sales Today's IT service providers offer a large set of tools supporting sales and marketing activities

More information

Study on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining

Study on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining International Journal of Statistical Distributions and Applications 2018; 4(1): 22-28 http://www.sciencepublishinggroup.com/j/ijsda doi: 10.11648/j.ijsd.20180401.13 ISSN: 2472-3487 (Print); ISSN: 2472-3509

More information

Reliable classification of two-class cancer data using evolutionary algorithms

Reliable classification of two-class cancer data using evolutionary algorithms BioSystems 72 (23) 111 129 Reliable classification of two-class cancer data using evolutionary algorithms Kalyanmoy Deb, A. Raji Reddy Kanpur Genetic Algorithms Laboratory (KanGAL), Indian Institute of

More information

Systems Identification

Systems Identification L E C T U R E 3 Systems Identification LECTURE 3 - OVERVIEW Premises for system identification Supporting theories WHAT IS IDENTIFICATION? Relation between the real object and its model - process of matching

More information

PCA and SOM based Dimension Reduction Techniques for Quaternary Protein Structure Prediction

PCA and SOM based Dimension Reduction Techniques for Quaternary Protein Structure Prediction PCA and SOM based Dimension Reduction Techniques for Quaternary Protein Structure Prediction Sanyukta Chetia Department of Electronics and Communication Engineering, Gauhati University-781014, Guwahati,

More information

Introduction to Machine Learning for Longitudinal Medical Data

Introduction to Machine Learning for Longitudinal Medical Data Introduction to Machine Learning for Longitudinal Medical Data Orlando Doehring, Ph.D. Unit 2, 2A Bollo Lane, London W4 5LE, UK orlando.doehring@phastar.com Machine learning for healthcare 1 Machine Learning

More information

How hot will it get? Modeling scientific discourse about literature

How hot will it get? Modeling scientific discourse about literature How hot will it get? Modeling scientific discourse about literature Project Aims Natalie Telis, CS229 ntelis@stanford.edu Many metrics exist to provide heuristics for quality of scientific literature,

More information

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT 8.1 Introduction Customer Relationship Management (CRM) is a process that manages the interactions between a company and its customers.

More information

Identifying Gene Subnetworks Associated with Clinical Outcome in Ovarian Cancer Using Network Based Coalition Game

Identifying Gene Subnetworks Associated with Clinical Outcome in Ovarian Cancer Using Network Based Coalition Game Identifying Gene Subnetworks Associated with Clinical Outcome in Ovarian Cancer Using Network Based Coalition Game Abolfazl Razi, Fatemeh Afghah 2, Vinay Varadan Case Comprehensive Cancer Center, Case

More information

Metodi e tecniche di ottimizzazione innovative per applicazioni elettromagnetiche

Metodi e tecniche di ottimizzazione innovative per applicazioni elettromagnetiche Metodi e tecniche di ottimizzazione innovative per applicazioni elettromagnetiche Algoritmi stocastici Parte 3 Artificial Immune Systems M. Repetto Dipartimento Ingegneria Elettrica Industriale - Politecnico

More information

GA-ANFIS Expert System Prototype for Prediction of Dermatological Diseases

GA-ANFIS Expert System Prototype for Prediction of Dermatological Diseases 622 Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed

More information

Applications of Machine Learning to Predict Yelp Ratings

Applications of Machine Learning to Predict Yelp Ratings Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer

More information

Promoting safety at railroad crossings by reducing traffic delays

Promoting safety at railroad crossings by reducing traffic delays Promoting safety at railroad crossings by reducing traffic delays A.G. Hobeika', and L. Bang2 'Department of Civil and Environmental Engineering Virginia Tech, Virginia, USA 2Systems Division, I7T Industries

More information

Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods

Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods Shusaku Tsumoto, Shoji Hirano, and Hidenao Abe Department of Medical Informatics, Faculty of Medicine,

More information