A REVIEW OF DATA MINING TECHNIQUES FOR AGRICULTURAL CROP YIELD PREDICTION Akshitha K 1, Dr. Rajashree Shettar 2 1 M.Tech Student, Dept of CSE, RV College of Engineering,, Bengaluru, India 2 Prof: Dept. of Computer Science and Engineering, RVCE, Bengaluru, India ABSTRACT: Data Mining has become one of the most prominent research areas which have gained lot of attentions in the field of agricultural crop yield analysis. The prediction of agricultural crop yield has become very essential as it has a great impact on the yearly production of crop in a particular region. It is very essential for a farmer to compute the aggregate yield that is to be generated for that particular year. There are various data mining techniques such as K- Means, K-Nearest Neighbors, Artificial Neural Networks (ANN) and Support Vector Machine (SVM) which have been implemented very recently for yield data prediction in agricultural field. Yield prediction has been considered as an important agricultural problem which needs to be resolved with respect to the existing data mining techniques. This study considers the research issues associated with the existing state of art studies towards agricultural crop yield prediction mechanisms. The proposed study also aims to investigate the existing novel models which can be utilized in solving the yield prediction issues associated with any agricultural field. This study also aims to highlight the suitable data models which achieve very higher accuracy and generality in predicting the agricultural crop yield. The significant contribution of this paper is highlighted in the research gap section which deals with providing the information about the existing research issues and its impact on agricultural data mining applications.. Keywords: Data Mining Techniques, Agricultural Yield Prediction, Artificial Neural Network, K-Means, Support Vector Machine. [1] INTRODUCTION Since past two decades yield prediction in order to compute agricultural growth of a particular country as well as future direction towards investment plans on agricultural fields has been generalized by farmers based on their previous experiences. It leads to a situation where farmers fail to evaluate the accurate yield data e.g. Inaccurate estimation of future agricultural 134
A REVIEW OF DATA MINING TECHNIQUES FOR AGRICULTURAL CROP YIELD PREDICTION production based on past 5 year s data associated with rainfall and respective crop production in a particular field. The main aim of the agricultural production is maximizing the crop yield productivity with respect to minimum cost [1] [2]. There are various noteworthy evidences which show that the early detection and management towards crop yield issues may save the investment of a farmer on a particular field, it also helps to generate subsequent yield and profit yearly [3]. The recent analysis shows that there various factors such as regional climate changes, agricultural soil data sets certainly have some significant impact on agricultural growth and production. The early prediction of crop yield could be happened by managers to avoid the loses during any kind of unfavorable conditions [4]. In the earlier days most farmers used to rely on their long term experiences to figure out the prediction associated with crop yield which sometimes taken route towards a false direction. The existing research trends highlights that two different kind of approaches have been introduced in the past to achieve efficient crop yield prediction first one is some traditional mathematical approach and the second one is applications associated with artificial intelligence[5]. There are various Data Mining models also have been listed for crop yield prediction. Data mining is a process of extracting meaningful information from a set of data where each and every data belongs to that data set will have kind of correlation in between them. This proposed study aims to introduce various existing research trends associated with crop yield prediction in the field of agricultural data mining. It also highlights the significant contribution of the past 5 year s state of art data mining and classification techniques on the agricultural yield prediction such as K-mean, KNN etc. The significant contribution of this proposed study has been highlighted on the research gap section where it discusses about the various research issues of the data classification techniques and maximum adaptability of ANN concept in existing studies in order to achieve higher probability of yield prediction accuracy. The paper is organized as follows section II summarize the background of existing Data Mining techniques. Section III and IV illustrates applications of Data Mining in Agriculture and the existing techniques respectively whereas section V summarizes the whole paper. [2] APPLICATIONS OF DATA MINING IN AGRICULTURE It can be seen that there are various existing state of art experimental prototyping which have been carried out by the researchers in past to evaluate an efficient data mining technique for agricultural yield prediction. Naive Bayes Data Mining model is designed for classifying soil samples that can be used for analyzing large soil profile experimental data sets. [11]. Decision tree algorithm could also be used in data mining to predict the soil fertility [12]. The 135
overall objective of the research towards data yield prediction was to measure the accuracy of the land utilization for agriculture and non-agriculture areas for the past five years. The authors in [12] [13] have utilized k-means model for estimating the crop yield. Some data mining methodologies also have been designed to be utilized in agricultural domain are reviewed by the study in [14]. The fig. 2 represents applications of different types of data mining models. Fig 2: Application of Different types of Data Mining Models The application of k-means method towards agriculture is discussed below: The k-means algorithm has been developed to implement on soil classifications with the use of GPS-based technologies in [15]. Characterization of plant, soil, and deposit areas of enthusiasm by shading pictures, Grading apples before promoting, Monitoring water quality changes, Detecting weeds in exactness horticulture, forecast of wine yield, etc are some of the applications where k-means approach is used. Knowing ahead of time that the wine maturation procedure could get stuck or be moderate can help the farmers to take measures to get a guaranteed yield [16]. The application of k-nearest neighbor models towards the field of agriculture is as given below: The k-nearest algorithm has been used to simulate daily precipitations and other weather variables and Estimating soil water parameters and Climate forecasting [8]. 136
A REVIEW OF DATA MINING TECHNIQUES FOR AGRICULTURAL CROP YIELD PREDICTION The applications of neural networks towards agriculture in case of predicting the flowering and maturity dates of soybean and in forecasting of water resources parameters is discussed in [14]. Support Vector Machines (SVM) approach is used in agriculture towards the Classification of crop and in the analysis of the climatic parameters change scenarios. Fig. 3 demonstrates the design of product expectation which incorporates an information module which is in charge of taking data from farmer. In that the farmer needs to give region of area, district, financial status and city. Subsequent to selecting the city parameter in view of height, longitude and scope programmed climatic information will be reflected from yield learning base. The component determination module is in charge of subset choice of quality from yield information. The harvest learning base is comprises of homestead learning for eg. Area id, locale name, soil-sort, water ph, precipitation, mugginess, daylight, land data, ecological parameter, city, pesticides data, crop information such as product sort, seed sort. The learning base additionally incorporates the specimens of product with comparing ranch learning, natural parameter, and pesticides data. After subset determination of characteristic, the information goes to arrangement and affiliation principle for gathering comparable substance. At that point forecast tenets will be connected to yield of clustering to get results as far as harvest, pesticide and expense [13]. The existing data mining techniques used by researchers in past, highlighted in the review studies and shows a correlation should be possible. Different information mining methods are utilized to foresee diverse parameters of climate such as dampness, temperature, wind blast. Different credited utilized for the examination are applications, creators, information mining methods, calculations, characteristics, time period, dataset size, exactness rate, points of interest and hindrances. They yield diverse results with their cons and stars. The principle result of this is defined by the sans no lunch theorem, which expresses that there is no generally best information mining calculation. This triggers the need to choose the proper learning calculation for a given issue. For climate expectation, decision tree and k-mean grouping turns out to be great with higher forecast exactness than different strategies of information mining. Relapse procedure couldn't discover exact estimation of expectation. It is additionally watched that with the expansion in dataset size, the precision first increments however then reductions after a specific degree. One reason might be because of over fitting of preparing data sets. [15]. Table 1 tabulates the work done in the area of agriculture by different researchers and the application of various techniques to the agricultural data available. 137
Analysis of the current research trends highlights that most of the studies uses K-mean Clustering techniques with Neural networks where integration of feature selection technique along with efficient data prediction capability in remote area is needed. Most of the existing studies also found to enable decision tree mechanism with neural network model where data transformation, extra computation, handling of continuous data are required in order to execute an efficient yield prediction model. However, after evaluating so many exisitng studies it has been found that the combination of decision tree and neural network model configures the best network which can be utilized in order to gain higher accuracy in prediction. Author Sanjay et al [29] Somvanshi, et al. [30] Jagielska et al [31] Tellaeche [32], Verheyen et al [33] Urtubia et al [34], Veenadhari et al [35] Shalvi and Claris [36] Altannar et al [37] Rajagopalan and Lal [38] Problems of Interface Classification and Prediction Modeling and prediction Automated knowledge acquisition A vision-based hybrid classifier High resolution continuous soil classification Prediction of industrial wine problem fermentations Crop productivity mapping Medical data mining techniques Agricultural and Environmental Sciences Daily precipitation and other weather variable Techniques Applied Neural Networks Neural Networks K-means K-means Fuzzy set Fuzzy set K-nearest Neighbor K-nearest Neighbor Support Vector Machine Support Vector Machine Table 1: Application of Data Mining Techniques to agricultural data [6] CONCLUSION 138
A REVIEW OF DATA MINING TECHNIQUES FOR AGRICULTURAL CROP YIELD PREDICTION Agriculture is considered as one of the most noteworthy application region especially in the creating nations like India. Utilization of data innovation in horticulture can change the circumstance of choice making and agriculturists can yield in better way. Information mining assumes a critical part for choice making on a few issues identified with agribusiness field. It examines about the part of information mining in the farming field and their related work by a few creators in setting to horticulture space. It additionally examines on various information mining applications in tackling the distinctive horticultural issues. This paper coordinates as well as integrates the work of different researchers in one place so it is helpful for specialists to get data of current situation of information mining procedures and applications in setting to agricultural field. The proposed study highlights some of the significant contributions of Neural Network models in the field of agricultural data mining and also suggests the flexibility of ANN in the future research REFERENCES [1] Han, J, Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.s. [2] http://www.publishyourarticles.net/knowledge-hub/essay/essay-on-the-importance-ofagriculture-in-the-indian-economy.html [3] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. [4] Mucherino, A., Papajorgji, P., & Pardalos, P. (2009). Data mining in agriculture (Vol. 34). Springer. [5] Beniwal, S., & Arora, J. (2012). Classification and feature selection techniques in data mining. International Journal of Engineering Research & Technology (IJERT), 1(6).. [6] Lior Rokach, Oded Maimon. Clustering Methods. Chap-15 [7] Xu, R & Wunsch, D (2005). Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3), 645-678. [8] Periklis Andritsos Data Clustering Techniques. University of Toronto, Department of Computer Science. ftp://ftp.cs.toronto.edu/csrg-technical-reports/443/depth.pdf [9] Srikant, R V Q & Agrawal, R (1997, August). Mining Association Rules with Item Constraints. In KDD (Vol. 97, pp. 67-73). [10] Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In ACM SIGMOD Record (Vol. 22, No. 2, pp. 207-216). ACM. [11] Zaki, M J (1999). Parallel and distributed association mining: A survey. IEEE concurrency, 7(4), 14-25. 139
[12] [13] Jay Gholap. (2012). Performance tuning of j48 algorithm for prediction of soil fertility. Asian Journal of Computer Science And Information Technology 2: 8 (2012) 251 252. [14] Megala, S., & Hemalatha, M. (2011). A Novel Datamining Approach to Determine the Vanished Agricultural Land in Tamilnadu. International Journal of Computer Applications, 23. [15] D Ramesh, B Vishnu Vardhan, (2013). Data Mining Techniques and Applications to Agricultural Yield [16] Data. International Journal of Advanced Research in Computer and Communication Engineering 2(9). [17] V. Ramesh and K. Ramar, 2011. Classification of Agricultural Land Soils: A Data Mining Approach. Agricultural Journal, 6: 82-86. Author[s] brief Introduction Akshitha K M.tech, student RVCE, Flat N0.017,DSMAXSWASTIK, 20 TH CROSS,VEERNAJANEYA NAGAR,TURAHALLI,UTTARAHALLI HOBLI,BENGALURU 560061 PH.NO9880141284 140