Raj Kumar 1, Ms. Sonia 2 1 Assistant Professor, 2 M.Tech student. Department of CSE Jind Institute of Engineering & Technology, Jind(Haryana)

Size: px
Start display at page:

Download "Raj Kumar 1, Ms. Sonia 2 1 Assistant Professor, 2 M.Tech student. Department of CSE Jind Institute of Engineering & Technology, Jind(Haryana)"

Transcription

1 SVM & KNN Based Classification of Health Care Data Set Using WEKA Tools Raj Kumar 1, Ms. Sonia 2 1 Assistant Professor, 2 M.Tech student Department of CSE Jind Institute of Engineering & Technology, Jind(Haryana) Abstract: We had focused to compare a variety of techniques, approaches and different tools and its impact on health care sector. Goal of data mining application is to turn that data are facts, numbers, or text which could be processed by a computer into knowledge or information. Main purpose of data mining application in healthcare systems is to develop an automated tool for identifying and disseminating relevant healthcare information. In this research we have reduced the error rate and increase the speed of in existing SVM-KNN. We have used Matlab to make comparative analysis in performance of Traditional model and proposed model. 1. INTRODUCTION Purpose of data mining is to extract useful information from large databases or data warehouses. Data mining applications are used for commercial and scientific sides. This study mainly discusses Data Mining applications in scientific side. Scientific data mining distinguishes itself in sense that nature of datasets is often very different from traditional market driven data mining applications. In this work, a detailed survey is carried out on data mining applications in healthcare sector, types of data used and details of information extracted. Data mining algorithms applied in healthcare industry play a significant role in prediction and diagnosis of diseases. There are a large number of data mining applications are found in medical related areas such as Medical device industry, Pharmaceutical Industry and Hospital Management. In health care institutions leak appropriate information systems to produce reliable reports within respect to other information in purely financial and volume related statements. Data mining tools to answer question that traditionally was a time consuming and too complex to resolve. They prepare databases for finding predictive information. Some sample data mining applications are: Developing models to detect fraudulent phone or credit-card activity Predicting good and poor sales prospectus. Predicting whether a heart attack is likely to recur among those within cardiac disease. Identifying factors that lead to defects in a manufacturing process. 2. Data Mining to Improve Primary Care Reporting The first initiative mines historical EDW data to enable primary care providers (PCPs) to meet population health regulatory measures. This clinic s PCPs must demonstrate to regulatory bodies that they are giving appropriate screenings & treatment to certain populations of patients. Their focus to date had been on A1cscreenings, mammograms for women over 40, & flu shots. EDW & analytics applications had enabled PCPs to track their compliance rate & to take measures to ensure patients receive needed screenings. Health Catalyst Advanced Application for Primary Care shows trending of compliance rates & specific measurements over time. So, clinic could view how a patient s A1c or LDL results are trending. 3. WEKA Waikato Environment for Knowledge Analysis (Weka) is a popular suite of machine learning software written in Java, developed at University of Waikato, New Zealand. It is free software licensed under GNU General Public License. DESCRIPTION Weka (pronounced to rhyme within Mecca) is a workbench [1] that contains a collection of visualization tools & algorithms for data analysis & predictive modeling, together within graphical user interfaces for easy access to these functions. Free availability under GNU General Public License. Portability, since it is fully implemented in Java programming language & thus runs on almost any modern computing platform. A comprehensive collection of data preprocessing & modeling techniques. Ease of use due to its graphical user interfaces. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, DOI: /IJCSC Page 50

2 Classification, regression, visualization, & feature selection. All of Weka's techniques are predicated on assumption that data is available as one flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). 4. SVM SVM can be used in linear or non-linear ways with the use of a Kernel, when you have a limited set of points in many dimensions SVM tends to be very good because it should be able to find the linear separation that should exist. SVM is good with outliers as it will only use the most relevant points to find a linear separation (support vectors) sensitive to outliers and removing them before using KNN tends to improve results. 5. RESULTS OF COMPARATIVE STUDY In research we have discussion a comparative study of diabetes in healthcare sector weka tools. Mainly weka tool are used to predict the successful results from the data recorded on healthcare problems. Different data mining tools are used to predict the accuracy level in different healthcare problems. In this study, the following list of medical problems has been analyzed and evaluated as following: Preg, Plas, Pres, Skin, insu, mass, pedi, age, class etc. The attributes are 9 & instances are 768. The totals tested are 500 & positive are 268. SVM needs to be tuned, the cost "C" and the use of a kernel and its parameters are critical hyperparameters to the algorithm. KNN KNN needs to be carefully tuned, the choice of K and the metric (distance) to be used are critical. Data points KNN has performance problems. If you are in a very low dimensional space you can use a RP-Tree or KD-Tree to improve performance, if you have a higher number of dimensions then you need an approximation to the nearest neighbours problems and whenever we use an approximation we have to think if KNN with the NN approximation is still better than other algorithms. 1. For each training example <x,f(x)>, add the example to the list of training_examples. 2. Given a query instance xq to be classified, 3. Let x1,x2...xk denote the k instances from training_examples that are nearest to xq. 4. Return the class that represents the maximum of the k instances. Fig 1 Show the different diabetes in positive & negative tested If K = 5, then in this case query instance xq will be classified as negative since three of its nearest neighbors are classified as negative. Comparison between SVM and KNN KNN is a lazy learning method but intuitive SVM is an eager learner also intuitive. KNN has some nice properties: it is automatically non-linear, it can detect linear or non-linear distributed data, it tends to perform very well with a lot of data points. KNN is also very sensitive to bad features (attributes) so feature selection is also important. KNN is also Fig 2 Weka Classifiers in rules DOI: /IJCSC Page 51

3 Fig 3 Show the tested negative & positive Fig 5 Show age mean & stranded dev. Fig 6 weka Explorer Proposed work In the tangent distance case, it is quite remarkable that SVM-KNN can improve on the performance of NN with very small additional cost, even though the latter is performing very well already in comparison to human performance. Fig 4 Show weka in cluster on EM Fig 7 SVM-KNN Error rate and Time cost Result and discussion We are therefore encouraged to think that SVM/KNN is an ideal classification method when the proper.invariance. Structure of the underlying data set is captured in the distance function. DOI: /IJCSC Page 52

4 In this research we have reduced the error rate and increase the speed of in existing SVM-KNN. We have used Matlab to make comparative analysis in performance of Traditional model and proposed model. Matlab code to compare error rate between both models >> x=[ ] >> y1=[ ] >> y2=[ ] >>plot(x,y1,'r+-') >> hold on >> plot(x,y2,'b+-') >> legend('traditional','proposed') >> xlabel('k') >> ylabel('error Rate') >> title('comparision between traditional and proposed error rate') Fig 8 Comparison of error rate between both models Matlab code to compare time complexity between both models x=[ ] y1=[ ] y2=y1=[ ] plot(x,y1,'r+-') hold on plot(x,y2,'b+-') legend('traditional','proposed') xlabel('k') ylabel('cpu SEC') title('comparision between traditional and proposed Time complexity') Fig 9 Comparison of time complexity between both models 6. Conclusion The main aim of our research is to use data mining in Health care system. In this research we have reduced the error rate and increase the speed of in existing SVM-KNN. We have used Matlab to make comparative analysis in performance of Traditional model and proposed model. Internet of Things concept arises from need to manage, automate, & explore all devices, instruments, & sensors in world. Data mining involves discovering novel, interesting, & potentially useful patterns from data & applying algorithms to extraction of hidden in formation. The quantity & quality of many health care interventions are improved through results of science, such as advanced through medical model of health which focuses on eradication of illness through diagnosis & effective treatment. Reference 1. SVM-KNN: Discriminative Nearest Neighbor Classi_cation for Visual CategoryRecognition by Hao Zhang Alexander C. Berg 2. Data Mining Applications In Healthcare In 2005 International Journal of Innovative Research in Computer&Communication Engineering Vol. 7, Issue14 3. Boris Milovic*, Milan Milovic In 2012 Prediction&Decision Making in Health Care using Data Mining International Journal of Innovative Research in Computer&Communication Engineering) Vol. 13, Issue 15, DOI: /IJCSC Page 53

5 4. MD. Ezaz Ahmed In 2012 Knowledge Discovery in Health Care Datasets Using Data Mining Tools 5. 1,Salim Diwani, 2,Suzan Mishol, 3,Daniel S.Kayange In 2013 Overview Applications of Data Mining In Health Care:Case Study of Arusha Region 6. M. Durairaj, V. RanjaniIn 2013 Data Mining Applications In Healthcare Sector: A Study 7. Manaswini Pradhan In 2014 Data Mining& Health Care: Techniques ofapplication 8. Dalia AbdulHadi AbdulAmeer* Medical In 2015 Data Mining: Health Care Knowledge Discovery Framework Based On Clinical Big Data Analysis 9. M. Durairaj, V. Ranjani, "data mining applications in healthcare sector" essistant professor in dept. of computer science. 10. K. Srinivas, B. Kavitha Rani&Dr. A. Govrdhan, Applications of Data Mining Techniques in Healthcare&Prediction of Heart Attacks International Journal on Computer Science&Engineering (2010). 11. ShwetaKharya, Using Data Mining Techniques ForDiagnosis&Prognosis Of Cancer Disease, International Journal of Computer Science, Engineering&Information Technology (IJCSEIT), Vol.2, No.2, April Arvind Sharma&P.C. Gupta PredictingNumber of Blood Donors through their Age&Blood Group by using Data Mining Tool International Journal of Communication&Computer Technologies Volume 01 No.6, Issue: 02 September Nikita Jain&Vishal Srivastava, "data mining techniques" international journal of reaseach in engineering&technologys 14. N. Zhong&J. Liu (eds.), Intelligent Technologies for Information Analysis, New York: Springer, DOI: /IJCSC Page 54