World Academy of cence, Engneerng and Technology Usng Data Mnng Technques for Estmatng Mnmum, Maxmum and Average Daly Temperature Values. Kotsants, A. Kostoulas,. Lykouds, A. Argrou, K. Menagas Internatonal cence Index, Computer and Informaton Engneerng waset.org/publcaton/14600 Abstract Estmates of temperature values at a specfc tme of day, from daytme and daly profles, are needed for a number of envronmental, ecologcal, agrcultural and techncal applcatons, rangng from natural hazards assessments, crop growth forecastng to desgn of solar energy systems. The scope of ths research s to nvestgate the effcency of data mnng technques n estmatng mnmum, maxmum and mean temperature values. For ths reason, a number of experments have been conducted wth well-known regresson algorthms usng temperature data from the cty of Patras n Greece. The performance of these algorthms has been evaluated usng standard statstcal ndcators, such as Coeffcent, Root Mean quared Error, etc. Keywords regresson algorthms, supervsed machne learnng. W I. INTRODUCTION EATHER data are generally classfed as ether synoptc data or clmate data. ynoptc data s the real tme data provded for use n avaton safety and forecast modellng. Clmate data s the offcal data record, usually provded after some qualty control s performed on t. pecal networks also exst n many countres that may be used n some cases to provde supplementary clmate data. Knowledge of meteorologcal data n a ste s essental for meteorologcal, polluton and energy applcatons studes and development. Especally temperature data s used to determne thermal behavour (thermal and coolng loads, heat losses and gans) of buldngs [2]. It s also an explct requrement for szng studes of thermal [13] and/or PV systems [18], [5]. Another major sector where temperature data s fundamental s the estmaton of bometeorologcal parameters n a ste [16]. In advanced energy system desgns the profle of any meteorologcal parameter s a prerequste for systems operatng management on daly and/or hourly bass. Also, smulatons of long-term performance of energy plants requre detaled and accurate meteorologcal data as nput. Ths. Kotsants s wth the Department of Computer cence and Technology, Unversty of Peloponnese, Greece (phone: +302610-997833; fax: +302610-997313; e-mal: sotos@math.upatras.gr).. Lykouds s wth the Natonal Observatory of Athens, Insttute for Envronmental Research and ustanable Developement, GR-15236 Pala Pendel, Greece. A. Argrou s wth the Unversty of Patras, Department of Physcs, ecton of Appled Physcs, GR-26500 Patras, Greece. knowledge may be obtaned, ether by the elaboraton of data banks, or by the use of estmaton methodologes and technques, where no detaled data are avalable. As nowadays smart buldngs have became a realty, artfcal technques must be embedded s buldng management systems (BM), n order energy profle (loads, gans etc) of a followng tme perod (next hour, next day) to be predetermned. That wll lead to a more effectve energy management of the buldng or the energy plant. Weather data from automated weather statons have also become an mportant component for predcton and decson makng n agrculture and forestry. The data collected from such statons are used n predctons of nsect and dsease damage n crops, orchards, turfgrasses, and forests [4]; n decdng on crop-management actons such as rrgaton [1]; n estmatng the probablty of occurrence of forest fres [9]; and n many other applcatons [19]. The scope of ths research s to nvestgate the effcency of data mnng technques n estmatng mnmum, maxmum and average temperature values. A number of experments have been conducted wth well-known regresson algorthms usng temperature data from the cty of Patras n Greece. The performance of these algorthms has been evaluated usng standard statstcal ndcators. The followng secton descrbes the data set of our study. ecton III presents the expermental results for the representatve regresson algorthms. Fnally, secton IV dscusses the conclusons and some future research drectons. II. DECRIPTION OF OUR DATAET The values of temperature data used n ths paper were obtaned from the meteorologcal staton of the Laboratory of Energy and Envronmental Physcs of the Department of Physcs of Unversty of Patras. Collected data cover a four years perod (2002-2005). Ths staton records temperature, relatve humdty and ranfall data on hourly bass (8760 measurements per year). For the needs of ths work mnmum, maxmum and average temperature values for the cty of Patras were calculated, from the elaboraton of the data bank of that staton. The mnmum, maxmum and average temperature values values were nserted n new data banks wth reference to the day of the year (D) (1-365). The data were also elaborated per month. In that case average daly temperatures were regstered wth reference to the number of Internatonal cholarly and centfc Research & Innovaton 1(2) 2007 401
World Academy of cence, Engneerng and Technology Internatonal cence Index, Computer and Informaton Engneerng waset.org/publcaton/14600 the month (1-12), the number of the day of the month (1-30) and fnally to the day of the year (D) (1-365). Many methods have been proposed so far worldwde for the estmaton-predcton of monthly, daly or even hourly values of dfferent meteorologcal parameters [10], [11], [12], [14], based manly on past tme data analyss. uch a smple method s the one proposed by [15]. Ths method s the result of the elaboraton of temperature measurements made by the Hellenc Natonal Meteorologcal ervce (HNM) n dfferent stes of Greece. The analyss of ths data shows that the yearly varaton of the average, maxmum and mnmum values of daly temperature can be expressed by the followng equaton [15]: 360 T ( D) A + Bsn( D f ) (1) 365 where D s the day of the year (1-365), Α s the average yearly temperature n o C, B s the wdth of the yearly temperature varaton n o C and f s the phase shft expressed n degrees or days. These varables are typcal and have constant value dependng on the ste of the country. Ther values have been calculated for a number of Greek ctes usng the least square method. As far as Patras s concerned ther values for the calculaton of average daly temperature are gven n the table below (elaboraton of temperature data of the perod 1960-1974). The parameters of eq(1) have also been re-estmated usng the 2002-2005 data. TABLE 1:VALUE OF A,B AND F FOR THE CITY OF PATRA Based on 1960-1974 data Based on 2002-2005 data Α 17,339 18,351 Β -7,47-8,65 f -59,691-62,908 0.8872 0.8881 III. DATA MINING ALGORITHM UED The problem of regresson n data mnng conssts n obtanng a functonal model that relates the value of a target contnuous varable y wth the values of varables x 1, x 2,..., x n (the predctors). Ths model s obtaned usng samples of the unknown regresson functon. These samples descrbe dfferent mappngs between the predctor and the target varables. The tradtonal approach for predcton of a contnuous target s the classcal lnear least-squares regresson (LR) [7]. The model constructed for regresson n ths tradtonal approach s a lnear equaton. By estmatng the parameters of ths equaton wth a computatonally smple process on the tranng set, a model s created. However, the lnearty assumpton between nput features and predcted value ntroduces a large bas error for most domans. That s why most studes are drected to nonlnear and, non-parametrc technques for the regresson problem. For the am of our comparson the most common regresson technques namely Model Trees and Rules [20], nstance based learners [3], Artfcal Neural Networks, and addtve regresson [8] are used. Model trees are the counterparts of decson trees for regresson tasks. Model trees are trees that classfy nstances by sortng them based on attrbute values. Instances are classfed startng at the root node and sortng them based on ther attrbute values. The most well known model tree nducer s the M5 [20]. A model tree s generated n two stages. The frst bulds an ordnary decson tree, usng as splttng crteron the maxmzaton of the ntra-subset varaton of the target value [20]. The second prunes ths tree back by replacng subtrees wth lnear regresson functons wherever ths seems approprate. M5rules algorthm produces propostonal regresson rules n IF-THEN rule format usng routnes for generatng a decson lst from M5 Model trees [21]. The algorthm s able to deal wth both contnuous and nomnal varables, and obtans a pecewse lnear model of the data. Instance-based learnng algorthms are lazy-learnng algorthms, as they delay the nducton or generalzaton process untl regresson process s performed. k-nearest Neghbour (knn) s based on the prncple that the nstances wthn a dataset wll generally exst n close proxmty wth other nstances that have smlar propertes [3]. KNN algorthm frst fnds the closest nstances to the query pont n the nstance space accordng to a dstance measure, and then outputs the average of the target values of those nstances as the predcton for that query nstance. As the predcton of the target value of a query nstance requres to measure ts dstance to all tranng nstances, whch mght be a very huge set, the predcton n KNN s very costly. IB3 s a well known technque for nstance based learnng. Artfcal Neural Networks (ANNs) are another method of nductve learnng based on computatonal models of bologcal neurons and networks of neurons as found n the central nervous system of humans [17]. n wth a neural network takes place n two dstnct phases. Frst, the network s traned on a set of pared data to determne the nput-output mappng. The weghts of the connectons between neurons are then fxed and the network s used to predct the numercal class values of a new set of data. The most well-known learnng algorthm to estmate the values of the weghts of a neural network - the Back Propagaton (BP) algorthm [17] - was the representatve of the Neural Networks. Combnng models s not a really new concept for the statstcal pattern recognton, machne learnng, or engneerng communtes, though n recent years there has been an exploson of research explorng creatve new ways to combne models. Currently, there are two man approaches to model combnaton. The frst s to create a set of learned models by applyng an algorthm repeatedly to dfferent tranng sample data; the second apples varous learnng algorthms to the same sample data. The predctons of the models are then combned accordng to an averagng scheme. A method that uses dfferent subset of tranng data wth a sngle learnng method s the boostng approach [6]. The boostng approach uses the base models n sequental collaboraton, where each new model concentrates more on Internatonal cholarly and centfc Research & Innovaton 1(2) 2007 402
World Academy of cence, Engneerng and Technology Internatonal cence Index, Computer and Informaton Engneerng waset.org/publcaton/14600 the examples where the prevous models had hgh error. Although boostng for regresson has not receved nearly as much attenton as boostng for classfcaton, there s some work examnng gradent descent boostng algorthms n the regresson context. regresson [8] s a well known boostng method for regresson. IV. REULT For the regresson methods, there sn t only one regressor s crteron. Table 2 represents the most well known. Fortunately, t turns out for n most practcal stuatons the best regresson method s stll the best no matter whch error measure s used. In order to calculate the models regressor crtera for our experments, we used the free avalable source code for most of the algorthms by [21] for our experments. TABLE II: REGREOR CRITERIA (P: PREDICTED VALUE, A: ACTUAL VALUE) squared error PA R where PA P A P A ( p p)( a a), n 1 2 ( p p), n 1 2 ( a a) n 1 ( p a ) ( p a ) 2 2 + + 1 1 1 1 In the followng three tables we present the models regressor crtera n predctng average daly temperature values usng as nput a) the prevous year data (2004) n Table 3; b) the last two years (2003,2004) n Table 4 and c) the three last years (2002,2003,2004) n Table 5. TABLE III: PREDICTING AVERAGE DAILY TEMPERATURE VALUE UING A INPUT THE PREVIOU YEAR DATA (2004) n M5 0.9288 2.6131 M5rules 0.9233 2.7392 0.9279 2.5698 LR 0.8529 5.6715 IB3 0.9002 3.0715 BP 0.92 3.2724 TABLE IV: PREDICTING AVERAGE DAILY TEMPERATURE VALUE UING A INPUT THE LAT TWO YEAR DATA (2003-2004) M5 0.9442 2.357 M5rules 0.9417 2.4309 0.9319 2.6129 LR 0.8529 5.2118 IB3 0.9195 2.8602 BP 0.9099 3.7817 TABLE V: PREDICTING AVERAGE DAILY TEMPERATURE VALUE UING A INPUT THE LAT THREE YEAR DATA (2002-2004) M5 0.9336 2.5369 M5rules 0.9365 2.4557 0.9288 2.6334 LR 0.8529 5.5012 IB3 0.9205 2.7801 BP 0.9097 4.4836 As a result, the experts are n the poston usng the temperatures of prevous years, to predct average daly temperature values of the examned year wth suffcent precson, whch reaches 92% correlaton n the ntal forecasts (usng the data of the prevous of the examned year) and exceeds the 94% usng the data of the last two years before the examned year. In the followng three tables we present the models regressor crtera n predctng mnmum daly temperature values usng as nput a) the prevous year data (2004) n Table 6; b) the last two years (2003,2004) n Table 7 and c) the three last years (2002,2003,2004) n Table 8. Internatonal cholarly and centfc Research & Innovaton 1(2) 2007 403
World Academy of cence, Engneerng and Technology Internatonal cence Index, Computer and Informaton Engneerng waset.org/publcaton/14600 TABLE VI: PREDICTING MINIMUM DAILY TEMPERATURE VALUE UING A INPUT THE PREVIOU YEAR DATA (2004) M5 0.9437 2.2219 M5rules 0.9147 2.5995 0.9315 2.3552 LR 0.8648 5.2159 IB3 0.897 2.88 BP 0.9216 2.8999 TABLE VII: PREDICTING MINIMUM DAILY TEMPERATURE VALUE UING A INPUT THE LAT TWO YEAR DATA (2003-2004) M5 0.945 2.1386 M5rules 0.9424 2.182 0.9317 2.3735 LR 0.8648 4.814 IB3 0.9217 2.5578 BP 0.9137 3.9315 TABLE VIII: PREDICTING MINIMUM DAILY TEMPERATURE VALUE UING A INPUT THE LAT THREE YEAR DATA (2002-2004) M5 0.9395 2.246 M5rules 0.9391 2.2417 0.9322 2.3818 LR 0.8648 5.0424 IB3 0.9207 2.5569 BP 0.9137 4.1722 As a result, the experts are n the poston usng the temperatures of prevous years, to predct mnmum daly temperature values of the examned year wth suffcent precson, whch reaches 93% correlaton n the ntal forecasts (usng the data of the prevous of the examned year) and exceeds the 94% usng the data of the last two years before the examned year. In the followng three tables we present the models regressor crtera n predctng maxmum daly temperature values usng as nput a) the prevous year data (2004) n Table 9; b) the last two years (2003,2004) n Table 10 and c) the three last years (2002,2003,2004) n Table 11. TABLE IX: PREDICTING MAXIMUM DAILY TEMPERATURE VALUE UING A INPUT THE PREVIOU YEAR DATA (2004) M5 0.9053 3.2671 M5rules 0.8959 3.4213 0.9044 3.1572 LR 0.8231 6.4537 IB3 0.873 3.7743 BP 0.8958 4.0265 TABLE X: PREDICTING MAXIMUM DAILY TEMPERATURE VALUE UING A INPUT THE LAT TWO YEAR DATA (2003-2004) M5 0.9196 3.0779 M5rules 0.9168 3.0888 0.9119 3.2432 LR 0.8231 5.9401 IB3 0.8892 3.6713 BP 0.8826 7.5053 TABLE XI: PREDICTING MAXIMUM DAILY TEMPERATURE VALUE UING A INPUT THE LAT THREE YEAR DATA (2002-2004) M5 0.9165 3.1119 M5rules 0.9163 3.1261 0.9077 3.2618 LR 0.8231 6.2729 IB3 0.8903 3.5475 BP 0.7942 7.7678 As a result, the experts are n the poston usng the temperatures of prevous years, to predct maxmum daly temperature values of the examned year wth suffcent precson, whch reaches 90% correlaton n the ntal forecasts (usng the data of the prevous of the examned year) and exceeds the 92% usng the data of the last two years before the examned year. As a general concluson, t was found that the regresson algorthms could enable experts to predct mnmum, maxmum and average temperature values wth satsfyng accuracy usng as nput the temperatures of the prevous years. We beleve that usng as nput the temperatures of the two Internatonal cholarly and centfc Research & Innovaton 1(2) 2007 404
World Academy of cence, Engneerng and Technology Internatonal cence Index, Computer and Informaton Engneerng waset.org/publcaton/14600 prevous years gves suffcent results. There s no need for more hstorcal data. V. CONCLUION Ideally, the market needs tmely and accurate weather data. In order to acheve ths, data should be contnuously recorded from statons that are properly dentfed, manned by traned staff or automated wth regular mantenance, n good workng order and secure from tamperng. The statons should also have a long hstory and not be prone to relocaton. The collecton and archvng of weather data s mportant because t provdes an economc beneft but the local/natonal economc needs are not as dependent on hgh data qualty as s the weather rsk market. In ths study, t was found that the regresson algorthms could enable experts to predct mnmum, maxmum and average temperature values wth satsfyng accuracy usng as nput the temperatures of the prevous years. The methods used n ths work, for the case of Patras, should be tested and n other regons wth dfferent clmatc profle. Also, other methodologes (fuzzy logc technques etc) have to be valdated n many regons of the country coverng ts clmatc spectrum, ncludng not only temperature data (on any tme bass) but other meteorologcal parameters as well (wnd speed, solar radaton etc). REFERENCE [1] Acock M. C., Pachepsky Ya. A., Estmatng Mssng Weather Data for Agrcultural mulatons Usng Group Method of Data Handlng, Journal of Appled Meteorology: Vol. 39, No. 7, pp. 1176 1184, 2000. [2] Ashrae, Handbook of Fundamentals, Amercan ocety of Heatng, Refrgeratng and Ar Condtonng Engneers, New York: 1993 [3] Atkeson, C. G., Moore, A.W., & chaal,., Locally weghted learnng. Artfcal Intellgence Revew, 11, (1997) 11 73. [4] Dnell, D., 1995: What weather statons can do. Landscape Manage., 34 (3), 6G. [5] Duffe, J.A., and W.A Beckman. 1991. olar Engneerng of thermal processes. New York: John Wley and ons [6] Duffy, N. Helmbold, D., Boostng Methods for n, Machne Learnng, 47, (2002) 153 200. [7] Fox, J. (1997), Appled n Analyss, Lnear Models, and Related Methods, IBN: 080394540X, age Pubns. [8] Fredman J. (2002). tochastc Gradent Boostng, Computatonal tatstcs and Data Analyss 38(4):367-378. [9] Fujoka, F. M., 1995: Hgh resoluton fre weather models. Fre Manage. Notes, 57, 22 25. [10] Gelegens, J.J. 1999. Estmaton of hourly temperature data from ther month average values: case study of Greece. Renewable Energy 18, nos 1: 49-60 [11] Hall, I.J., Generaton of a Typcal Meteorologcal Year, Proceedngs of the 1978 annual meetng af A of IE, Denver UA, 1979 [12] Jan, P.C., Comparson of technques for the estmaton of daly global rradaton and a new model for the estmaton of hourly global rradaton. olar and Wnd Technology 1, nos. 2, 1984, pp.123-134 [13] Klen,.A, W.A Beckman and J.A. Duffe. 1985. A Desgn Procedure for olar Heatng systems. olar Energy 18: 113-127. [14] Knght, K.M., Klen,.A and Duffe, J.A., A methodology for the synthess of hourly weather data. olar Energy 46, nos 2, 1991, pp.109-120. [15] Kouremenos D.A, Antonopoulos K.A, Temperature data for 35 Greek ctes. In Greek. Athens 1993 econd Edton. [16] Matzaraks, A. 1995. Human-bometeorologcal assessment of the clmate of Greece. Ph.D. Dssertaton, Unversty of Thessalonk. [17] Mtchell, T., Machne Learnng, McGraw Hll, 1997. [18] Rahman. and Chowdhury B., mulaton of Photovoltac power systems and ther performance predcton. IEEE Transactons on Energy Converson 3,440-446 (1988) [19] Tugay Blgn and Ylmaz Çamurcu, 2004, A Data Mnng Applcaton on Ar Temperature Database, n LNC 3261 - Advances n Informaton ystems, prnger Berln / Hedelberg, IBN 978-3-540-23478-4, pp.68-76 [20] Wang, Y. & Wtten, I. H., Inducton of model trees for predctng contnuous classes, In Proc. of the Poster Papers of the European Conference on ML, Prague (pp. 128 137). [21] Wtten, I.H., Frank, E., "Data Mnng: Practcal machne learnng tools and technques", 2nd Edton, Morgan Kaufmann, an Francsco, 2005. otrs Kotsants receved a dploma n mathematcs, a Master and a Ph.D. degree n computer scence from the Unversty of Patras, Greece. He s an adjunct lecturer n the Department of Computer cence and Technology at the Unversty of Peloponnese, Greece Hs man research nterests are n the feld of machne learnng, data mnng and knowledge representaton. He has about 80 publcatons to hs credt n nternatonal journals and conferences. Internatonal cholarly and centfc Research & Innovaton 1(2) 2007 405