Available online at ScienceDirect. Procedia Computer Science 83 (2016 )

Size: px
Start display at page:

Download "Available online at ScienceDirect. Procedia Computer Science 83 (2016 )"

Transcription

1 Avaiabe onine at ScienceDirect Procedia Computer Science 83 (2016 ) Internationa Workshop on Big Data and Data Mining Chaenges on IoT and Pervasive Systems (BigD2M 2016) Towards Energy Efficiency Smart Buidings Modes based on Inteigent Data Anaytics Aurora Gonzáez-Vida, Victoria Moreno-Cano, Fernando Terroso-Sáenz, Antonio F. Skarmeta Computer Science Facuty. University of Murcia, Spain {aurora.gonzaez2, mvmoreno, fterroso, skarmeta}@um.es Abstract This work presents how to proceed during the processing of a avaiabe data coming from smart buidings to generate modes that predict their energy consumption. For this, we propose a methodoogy that incudes the appication of different inteigent data anaysis techniques and agorithms that have aready been appied successfuy in reated scenarios, and the seection of the best one depending on the vaue of the seected metric used for the evauation. This resut depends on the specific characteristics of the target buiding and the avaiabe data. Among the techniques appied to a reference buiding, Bayesian Reguarized Neura Networks and Random Forest are seected because they provide the most accurate predictive resuts. c 2016 The Authors. Pubished by by Esevier Esevier B.V. B.V. This is an open access artice under the CC BY-NC-ND icense ( Peer-review under responsibiity of the Conference Program Chairs. Peer-review under responsibiity of the Conference Program Chairs Keywords: pervasive computing, smart buidings, energy efficiency, inteigence data anaysis techniques 1. Introduction With the advent of the new advances and techniques on Information and Communications Technoogies (ICT), every pace, everything and everyone can be embraced by embedded technoogies aowing connection and communication between them in a non-intrusive and efficient way. This is the technoogica basis promoted by the so popuar Internet of Things (IoT) 1. The high voume of data that can be generated nowadays in urban environments, coming from different data sources, provides a great scenario to achieve inteigent and efficient management systems of energy consumption based on IoT. Big data anaytics 3 heps us to everage the huge amounts of data provided by IoT-based ecosystems in order to revea insights that hep expain, expose and predict knowedge from them. 2 Aurora Gonzáez; Te.: E-mai address: aurora.gonzaez2@um.es The Authors. Pubished by Esevier B.V. This is an open access artice under the CC BY-NC-ND icense ( Peer-review under responsibiity of the Conference Program Chairs doi: /j.procs

2 Aurora Gonzáez-Vida et a. / Procedia Computer Science 83 ( 2016 ) Specificay, in the fied of smart buidings - which are a key piece of smart cities - it is increasingy common to appy inteigent agorithms to generate behavioura buiding modes for soving probems ike energy efficiency and comfort provisioning 4, 5. In this paper, a first approach to mode the energy consumption of smart buidings is proposed considering a context that provides a reduced set of data to generate the mode and the use of inteigent techniques to identify patterns that coud hep in the modeing of the smart buiding status reated to its energy consumption. After seecting a set of recommended techniques, we propose a genera procedure to anayze the performance once they have been appied to specific buidings, and the criteria to consider for seecting the optima one according to the achieved resuts. The structure of this paper is: Section 2 reviews the main techniques proposed in iterature in order to mode the smart buiding energy consumption. Section 3 shows how to proceed during the data processing to generate accurate buiding energy consumption modes. Section 4 presents the reference buiding and its avaiabe data, describes the appication of different techniques as we as the way to seect the optima one. Finay, Section 5 gives some concusions and an outook of our future work in this area. 2. Inteigent Data Anaysis Techniques for Buidings Modeing Inteigent data anaysis techniques have been wisey introduced to mode severa buiding systems scenarios. Artificia neura networks (ANN) are abe to earn the key information patterns within a mutidimensiona domain. These have been appied in the fied of soar energy, for modeing and design of a soar steam generating pant, for the estimation of heating-oads of buidings, etc. Aso in heating, ventiating and air-conditioning systems, soar radiation, modeing and contro of power-generation systems, oad-forecasting and refrigeration 6. The ANN used is Muti Layer Perceptron (MLP). Aso, Bayesian Reguarized Neura Network (BRNN) method has been used in the prediction of a series of buiding energy oads from an environmenta input set 7 and the Random Forest mode has been appied in order to predict energy consumption in residentia buidings 8. Likewise, Support Vector Machines (SVM) are proposed - and evauated- to predict both the tota short-term eectricity oad and the short-term oads of individua buiding service systems (air conditioning, ighting, power, and other equipment) in buidings that have eectricity sub-metering systems instaed 9. Another common technique for non-inear regression proposed in iterature to be appied are Gaussian Processes with Radia Basis Function Kerne (RBF) 10. It has aready been used to forecast eectrica oad 11 or to estimate the number of occupants in a room according to data reated to the room status: motion detection, CO 2 reading, sound eve, ambient ight and door state sensing Genera Data Modeing Process The five techniques that are introduced in the previous section as reference are used in order to train the energy consumption prediction of buidings, ooking for the optima configuration of their hyperparameters. For this purpose, we use the R 13 package named CARET 14. This package is a set of functions that attempts to streamine the process for creating predictive modes. The five techniques impemented in R enabe us to adjust their tuning parameters. Tabe 1 shows the different vaues taken for each technique s hyperparameters. For exampe, we train the MLP mode using the size vaues from 33 to 40 and seect the one that reaches better resuts according to the evauation metric. Using each of these techniques, different buiding modes are generated foowing the next steps (shown expicity in Fig. 1): 1. Ceaning and transformation: seecting predictive variabes, deeting energy consumption outiers that cannot be reated to outiers in the rest of the variabes, transforming categorica into numerica variabes, and dividing the set of data into train (75%) and test (25%). 2. Standarization: transform the variabes to have zero mean and unit standard deviation. 3. A common technique appied to data is the transformation of the data space using the so caed Principa Components Anaysis (PCA) 15. PCA is a widey used technique for reducing dimensionaity, identifying the directions in which the variance of the observations is accumuated. 4. Vaidation method: 10-fod cross vaidation and 5 repetitions over the training data set.

3 996 Aurora Gonzáez-Vida et a. / Procedia Computer Science 83 ( 2016 ) Tabe 1. Features of the evauated agorithms Technique Function in R Tuning parameter Vaues for tuning Muti-Layer Perceptron (MLP) mp size {33, 34, 35, 36, 37, 38, 40} Support Vector Machines with Radia Basis Function Kerne (SVM) svmradiacost cost {1, 2, 3, 4, 5, 8, 10} Gaussian Process with Radia Basis Function Kerne (Gauss) gaussprradia sigma {0.01, 0.05, 0.1, 0.5} Bayesian Reguarized Neura Networks (BRNN) brnn neurons {2, 3, 4, 5, 10} Random forest (RF) rf mtry {2, 3, 4, 5, 6, 7} 5. Evauation metric: RMSE (Root-Mean-Square Error) and R-Squared. The formua yieds the vaues in the same units as the output of the estimators -KWh in this case- so the resuts can be interpreted easiy. The coefficient of variation (CV) of RMSE, that indicates the uncertainty in the mode, is the reference metric. Fig. 1. Expicit data modeing process 4. Appication in the Reference Buiding The reference buiding in which the proposed procedure has been carried out to generate accurate buiding modes is the Technoogica Transfer Centre (TTC) of the University of Murcia *. This buiding is used by technoogica companies and some research groups that coaborate with companies deveoping industria scientific projects. The buiding has a wide depoyment of sensors and devices integrated in a home automation system which is working to improve indoor comfort at the same time that energy is saved. The home automation system instaed in this reference scenario is caed City exporer, which is composed by programmabe ogic controers (PCL) and an SCADA system. On the one hand, the PLC is abe to monitor the sensor status and reguate the infrastructures connected to City exporer. On the other hand, the SCADA system coects data and intercommunicates the PLCs with the actuators of the buiding. City exporer has been designed and deveoped at the University of Murcia, and is currenty a commercia product provided by Odin Soutions S.L. ** The avaiabe data The avaiabe data for this buiding can ony represent the very minima situations in order to appy any agorithm to obtain some predictions onto its goba energy consumption. These data are the environmenta outdoor observations and the tota energy consumption of the buiding from 1st December, 2014 to 18th February, 2016 in intervas of 8 hours. In tota, 952 observations. When energy is measured by commercia power meters, usuay many variabes are provided but Active Energy is our target. Active Energy is the active power (KW) consumed per time unit and it depends on the interva of time * **

4 Aurora Gonzáez-Vida et a. / Procedia Computer Science 83 ( 2016 ) Energy_consumption Energy_consumption tmean tmax tmin hrmean hrmax hrmin radmean radmax wsmean wsmax wdir prec dewpt dpv tmax tmin hrmean hrmax hrmin radmean radmax wsmean wsmax wdir prec dewpt dpv tmean Fig. 2. Correation heatmap between consumption and outdoor environmenta conditions because it accumuates its vaue. Hence, in order to have an accurate and meaningfu measure of energy consumption (KWh), the intervas of time between observations have to be equa. Measures are considered in intervas of 8 hours, and the origin of the consumption (HVAC, ighting or other eectrica equipment) is unknown. Outdoor environmenta measures are acquired from externa sources. In this case, the IMIDA (The Research Institute of Agricuture and Food Deveopment of Murcia) has provided us with an houry historica set of data incuding the foowing variabes: temperature (mean, min and max) ( C), humidity (mean, min and max) (%), radiation (mean and max) (w/m 2 ), wind speed (mean and max) (m/s 2 ), wind direction (mean) (degrees), precipitation (mm), dew point ( C) and vapour pressure deficit (kpa) Correations between consumption and outdoor environmenta conditions Fig. 2 shows the pairwise correations between a variabes invoved in the probem. Focusing on the firs row, we see that energy consumption correates significanty (α = 0.95) and positivey (bue circe) with temperature, radiation, wind speed variabes, vapour pressure deficit and dew point, and negativey (red circe) with wind direction and humidity variabes. This means that we can use safey these variabes as inputs of the energy consumption mode of our reference buiding, because they a have cear impact in the energy consumption except precipitations (crossed out because the are not significant) Occupation of the buiding Having as a goa to generate the most basic case of study and taking into account that occupation information it is not usuay avaiabe in buidings - it requires an exhaustive sensor depoyment - we have dispayed an outine based on basic and ogic usabiity estimations of the buiding: Moment 1: hoidays, weekends and nights (22:00 PM- 06:00 AM) Moment 2: reguar mornings (06:00 AM - 14:00 PM) Moment 3: reguar afternoons (14:00 PM - 22:00 PM) Looking at Fig. 3, it is possibe to appreciate differences between moments but in order to have statistica support to define those groups a Kruska Wais H 16 was performed in order to check if there are differences in energy consumption between them. The test reveas that, indeed, there is a significant difference between groups (H(2) = 547.7, p-vaue < 0.01). An anaysis of the differences by pairs performing the post-hoc Wicoxon test 16, determines that it is

5 Aurora Gonzáez-Vida et a. / Procedia Computer Science 83 ( 2016 ) Moments (a set of data) January KWh KWh Moment Fig. 3. Boxpot of the energy consumption by moments considering a data (eft); and, the time series of the energy consumption by moments during January (right) possibe to divide data in those moments. This reasoning eads us to suggest three different modes corresponding to the just mentioned partitions. Mode 1. Range of energy consumption = [3.578, 14.1] KWh, mean of energy consumption = KWh. Mode 2. Range of energy consumption = [26.01, 86.19] KWh, mean of energy consumption = KWh. Mode 3. Range of energy consumption = [6.357, ] KWh, mean of energy consumption = KWh Resuts Every mode gathers the energy consumption during 8 hours (as described in subsection 4.3), so we have 8 different observations for each environmenta input (one each hour). Aso, we create two new variabes for every attribute by taking its mean and median. Just to carify the considered inputs, for situation 1 and, for exampe, temperature, we wi have 11 attributes: temperature at 6 AM, at 5 AM,... at 22 PM, mean of temperature (from 6AM to 22PM) and median of temperature. After training the modes using severa combinations of inputs we achieve the best resuts using day of the week, month, season, mean temperature and mean humidity with the Random Forest (RF) agorithm for situation 1 (mtry = 4, RMSE = 1 KWh) and situation 3 (mtry = 2, RMSE = 3.87 KWh) and Bayesian Reguarized Neura Networks (BRNN) for situation 2 (number of neurons = 2, RMSE = 7.08 KWh) representing a these vaues between a 12.09% and a 12.86% of error (CVRMSE). 5. Concusion and Future Work In this paper, we have estabished a basic and successfu procedure that can be used at the initia stages of the anaysis of energy consumption in smart buidings. This process has been carried out in a reference buiding from which we have generated different energy consumption modes. Among the techniques anayzed, BRNN and RF provided the most accurate resuts (mean errors within [1, 7.08] KWh). This procedure wi be enriched progressivey with the addition of more data sources. The immediate step is the usage and vaidation of the mode trying to predict energy consumption for future days using environmenta outdoor predictions. Future work is designing a strategy of contro based on this mode to save energy in buidings. Acknowedgements This work has been partiay funded by MINECO TIN R project (grant BES ) and ERDF funds, by the European Commission through the H2020-ENTROPY EU Project, and the Spanish Seneca Foundation by means of the PD program (grant 19782/PD/15).

6 Aurora Gonzáez-Vida et a. / Procedia Computer Science 83 ( 2016 ) Tabe 2. Resuts obtained for each moment (see technique s acronym in Tabe 1) Moment Technique Best Parameter RMSE (KWh) CV RMSE (%) R 2 1 Gauss σ = MLP size = SVM cost = BRNN neurons = RF mtry = Gauss σ = MLP size = SVM cost = BRNN neurons = RF mtry = Gauss σ = MLP size = SVM cost = BRNN neurons = RF mtry = References 1. L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer networks 54 (15) (2010) T. A. R. (auth.), Data Anaytics: Modes and Agorithms for Inteigent Data Anaysis, 1st Edition, Vieweg+Teubner Verag, C. L. Stimme, Big Data Anaytics Strategies for the Smart Grid, CRC Press, L. G. Swan, V. I. Ugursa, Modeing of end-use energy consumption in the residentia sector: A review of modeing techniques, Renewabe and sustainabe energy reviews 13 (8) (2009) G. K. Tso, K. K. Yau, Predicting eectricity energy consumption: A comparison of regression anaysis, decision tree and neura networks, Energy 32 (9) (2007) S. A. Kaogirou, Appications of artificia neura-networks for energy systems, Appied Energy 67 (1) (2000) D. MacKay, Bayesian non-inear modeing for the 1993 energy prediction competition, Maximum Entropy and Bayesian Methods (1993) F. Wahid, D.-H. Kim, Prediction methodoogy of energy consumption based on random forest cassifier in korean residentia apartments. 9. Y. Fu, Z. Li, H. Zhang, P. Xu, Using support vector machine to predict next day eectricity oad of pubic buidings with sub-metering devices, Procedia Engineering 121 (2015) C. K. Wiiams, D. Barber, Bayesian cassification with gaussian processes, Pattern Anaysis and Machine Inteigence, IEEE Transactions on 20 (12) (1998) D. J. Leith, M. Heid, J. V. Ringwood, Gaussian process prior modes for eectrica oad forecasting, Probabiistic Methods Appied to Power Systems (2004) S. Mamidi, Y.-H. Chang, R. Maheswaran, Improving buiding energy efficiency with a network of sensing, earning and prediction agents, in: Proceedings of the 11th Internationa Conference on Autonomous Agents and Mutiagent Systems-Voume 1, Internationa Foundation for Autonomous Agents and Mutiagent Systems, 2012, pp R Core Team, R: A Language and Environment for Statistica Computing, R Foundation for Statistica Computing, Vienna, Austria (2015). URL M. Kuhn, Buiding predictive modes in R using the caret package, Journa of Statistica Software 28 (5) (2008) H. Abdi, L. J. Wiiams, Principa component anaysis, Wiey Interdiscipinary Reviews: Computationa Statistics 2 (4) (2010) J. M. Andy Fied, Z. F. Nibett, Discovering Statistics Using R, 1st Edition, Sage Pubications Ltd, 2012.