Comprehensive Energy Efficiency Analysis of Buildings Based on FP-Growth Association Rules

Size: px
Start display at page:

Download "Comprehensive Energy Efficiency Analysis of Buildings Based on FP-Growth Association Rules"

Transcription

1 International Journal of Electrical Energy, Vol. 3, No. 3, September 2015 Comprehensive Energy Efficiency Analysis of Buildings Based on FP-Growth Association Rules Shunfu Lin, Chao Hao, Dongdong Li, and Yang Fu Shanghai University of Electric Power, Shanghai, China Xiaodong Tang Shanghai Electrical Apparatus Research Institute, Shanghai, China Abstract Energy efficiency analysis is an important basis for the establishment of a sound scientific energy saving mode and optimization operation scheme. Data mining technology is suitable to be applied in the comprehensive energy efficiency analysis because of its capability of processing large volume of data, eliminating redundant information, looking for hidden information and other unique advantages. This paper presents a comprehensive energy efficiency analysis technique based on frequent pattern growth (FP-Growth) association rules for buildings. The technique can effectively analyze the association relationship among the sub-metering data and provide support for the further development of energy saving programs. The proposed method was used in the energy efficiency analysis program of to a commercial building in Shanghai. The results prove the effectiveness and practicality of the proposed technique. the automated classification of energy consumption profiles of buildings, including feature extraction of the daily load profiles, the outlier detection, and the canonical variation analysis, and a simple classifier. Finally, a case study is presented to demonstrate the performance of the method. Ref. [3] proposes a method for analyzing 15-minute interval electric load data from commercial and industrial facilities. The analysis results help building managers better understand their facility s power consumption overtime, and contribute to building demand response, peak load management and electricity waste elimination. However, the existing research mainly paid attention on the whole buildings or the facilities neglecting the association relationships among the subbrunch energy consumption systems. It is essential to learn the knowledge of the association relationships among the sub-metering data to establish scientific energy saving mode and conduct optimization operation. Association analysis is a data mining technique to discover meaningful relationships hidden in large data sets which are expressed in the form of association rules or frequent item sets. Association rules have been widely used in the financial, medical, power marketing, and equipment fault detection fields [4]-[7]. However, association analysis is seldom applied in the energy efficiency analysis of buildings. The association analysis is applied in the power marketing to search association relationships between electricity sales and the electricity price, temperature, precipitation etc. [8]. Ref. [9] proposes a real-time data analysis system based on the association analysis to determine the optimum operation parameters of generator sets and evaluate the operation level of plants. The paper presents a comprehensive energy efficiency analysis technique based on PF-Growth associate rules to reveal the association relationships among the electricity consumption of each sub-system as well as temperature data. This paper is organized as follows: Section II presents a procedure of the comprehensive energy efficiency analysis of building. Section III introduces the foundation of the cluster analysis and association analysis technique. Section IV presents a case study to illustrate the procedure of the comprehensive energy efficiency Index Terms comprehensive energy efficiency analysis, data mining, association rules, FP-growth, cluster analysis I. INTRODUCTION The statistics show that the electricity consumption of the government office buildings and large public buildings accounts for about 22% of the total electricity consumption in China. The average annual electricity consumption per square meters reached 85.4kWh in China, 10 to 20 times of normal residential dwellings, and 1.5 to 2 times of similar buildings in Europe and Japan [1]. The electricity consumption of air conditioners increased substantially in summer and winter, which become the major electrical equipment. The comprehensive energy analysis of buildings based on sub-metering data is an important basis for the establishment of a sound scientific energy saving mode and optimization operation scheme. It can monitor energy consumption status, discover and correct energy wasting in real time. The energy efficiency analysis of buildings focuses on the analysis and prediction of the energy consumption characteristics, and the fault detection of main equipment. Ref. [2] proposes an intelligent data analysis method for Manuscript received May 5, 2015; revised August 3, International Journal of Electrical Energy doi: /ijoee

2 International Journal of Electrical Energy, Vol. 3, No. 3, September 2015 analysis. Section VI summarizes the main findings of this paper. II. algorithm to obtain strong association rules among the sub-metering data. 4) Analysis results: the analysis results are given based on the strong association rules. ENERGY EFFICIENCY ANALYSIS PROCEDURE Energy efficiency analysis is to analyze and predict the energy consumption characteristics of buildings. It can reveal the effect laws of the environment and temperature on the energy consumption through the fusion and integration of the meteorological factors such as temperature, humidity, sunshine, etc. and the submetering energy consumption. Based on the analysis results, the building managers can effectively manage the electricity demand and optimize the operation scheme to reduce the power consumption and achieve green buildings [10]-[11]. III. A. Cluster Analysis Cluster analysis is to group the set of data such as the daily load profiles, electricity consumption data of subsystems or weather temperature in such a way that the data in the same group (called a cluster) are more similar to each other than those in other groups (clusters). The greater similarity within the same groups and the greater difference within different groups, the better the clustering results are [16]. The K-means algorithm is adopted in the paper. The detailed steps are as follows: 1) For a given data set consisted of n d dimension samples X={x1, x2,, xi,, xn}, where xi Rd, select K samples in the data set as the initial clustering centers, each of which represents a center μk (k=1, 2,..., K) of the clusters. 2) Calculate the Euclidean distance between each sample to the center μk, and assign each sample to the corresponding cluster represented by the center μk with the nearest distance law to form K clusters C={ck, k=1, 2,, K}, where ck represents a cluster. Then calculate the sum J(ck) of the square of the distance between each sample to the corresponding clustering center μk as: Database Data pre-processing Total electricity consumption data Electricity consumption of sub-systems KEY TECHNOLOGIES Temperature, etc. Data Generalization xi k 2 Association rules J (ck ) Analysis results 3) Calculate the sum J(C) as follows K K J (C ) J (ck ) xi k Figure 1. Procedure of energy efficiency analysis. k 1 A procedure of energy efficiency analysis based on cluster and association rules for buildings is presented in the paper as shown in Fig. 1, including four steps: 1) Data pre-processing: the sub-metering data may be missed or abnormal during the sampling, transmission and recording because of the disturbances or faults. In addition, the characteristic laws of the load profiles may vary during the holidays and weekends, which will affect the potentiality of the data mining. Therefore, it should perform data pre-processing before the cluster and association rule analysis. The main task of the data preprocessing is to complete missing data and clean abnormal data. It is suggested to use the methods of mean value or trend proportion to modify the abnormal data with clear cause, and to delete the abnormal data with unclear cause [12]-[15]. After the data pre-processing, the complete energy consumption data including the total electricity consumption data, the electricity consumption data of the sub-metering systems, and weather temperature data can be obtained. 2) Data generalization: it performs generalization processing on the data after pre-processing by adopting K-means algorithm to obtain generalization data sets. 3) Association rules: it performs association rule analysis on the generalization data sets with FP-Growth 2015 International Journal of Electrical Energy xi Ck k 1 xi Ck 2 (1) K n k 1 i 1 d ki xi uk 2 (2) where xi ci if dki=1; xi ci if dki=0. 4) Go back to the step 2 if J(C) changes otherwise end the clustering [17]-[18]. B. Association Rules Let D={M1, M2,, Mn} be a set of transactions called the database, where each transaction Mj (j=1, 2,, n) in D contains a subsets of the items ik. Let I={i1, i2,, im} be the set of items. A rule is defined as an implication of the form X Y where X, Y I and X Y=. The support and confidence of association rules are adopted as the constraints to select interesting rules from the set of all possible rules as shown in Table I, where σ(x) is the number of specific item sets expressed as X {T X Tj, Tj D} j (3) The mining process of association rules mainly consists of the following two separate steps: 1) Generate the frequent itemsets. The minimum support is applied to find all the frequent itemsets; 2) Form the strong association rules. Extract all high confidence rules from frequent items to form strong association rules. 198

3 TABLE I. PROBABILITY FORMULAS OF ASSOCIATION RULES Name Distribution formula Explanation Support Confidence ( X Y) s( X Y) P( X Y) D ( X Y) c( X Y) P( Y X) ( X ) C. FP-Growth Algorithm The probability of the transaction in D that contains both itemset X and Y, representing significance of the given items The probability of the transaction that contains both X and Y in the transaction that contains X, representing the estimate of the conditional probability In this paper, in the process of analyzing electricity consumption for office buildings, all frequent item sets are found by FP-Growth algorithm and then strong association rules are generated. 1) Basic steps of FP-growth algorithm Enter for a data set including the building s total electricity consumption, electricity consumption of air conditioning, air conditioning switch status, weather temperature, etc., and a minimum support threshold. Output for frequent item set l. Step 1: Construct FP-Tree (a) Scan the data set for first time, select all frequent item sets F whose support is greater than the minimum support threshold, then arrange the frequent item sets in descending order. The results are recorded as frequent item list L. (b) Create a tree root represented by null. Scan the database for second time, frequent item sets selected create a branch and insert FP-Tree by the order of frequent items in list L. Step 2: Obtain association rules through FP-Tree mining (a) If the tree contains only a single path, output frequent itemsets assembled by each nodes in the path. (b) If the tree contains multiple paths, start from frequent pattern whose length is 1(the initial suffix pattern), and collect its prefix path (condition pattern base). Then build the condition FP tree, which is mined recursively. Frequent patterns are generated by connecting suffix pattern with frequent item sets tree [19]. 2) Generate strong association rules Frequent item sets l are generated by FP-Growth algorithm, the process of generating strong association rules is as follows: (a) Generates all non-empty sets of l; (b) For each non-empty sets of l, complement r=l-s, if () l min_conf, output the rule s r. in which () s min_conf is the minimum confidence threshold. Because the rule is generated by frequent item sets l meeting the minimum support threshold, it is necessarily strong association rule [20]. IV. CASE STUDY The data in the case study comes from office buildings in Shanghai from July to September including electricity consumption of building s and air conditioning units. Fig. 2 is a block diagram of the distribution of office buildings. Sub-metering meters includes the total meter of the office buildings, the 19th Building meter, the Building C meter, the Building D meter, the total meter of air conditionings and 4 air-conditioning units meters. 10kV Transf ormer V CI 400V CII 400V DI 400V DII AC1# AC2# loads AC3# AC4# loads 19F Building C Transfo rmer Transf ormer 10kV loads Building D Transf ormer loads Figure 2. Block diagram of the distribution of buildings. A. Typical Load Profiles Extracting The data obtained are readings of meters per minute, so it was converted to power data of 15-minute interval firstly. Divide the meter readings into groups of every 15 minutes, and the 15th minute meter reading minus the 1st minute meter reading and then dividing the time of every group to obtain the average load data. In order to show a clear trend of the load, we use electricity consumption of 15-minute interval as each point of load profiles. A total of 42 days of load data from July 26 to September 5 was obtained after data pre-processing. Fig. 3 is the office buildings working load curve for 30 days only considering the load of workdays. Load(kwh) :00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 24:00 Time(hour) Figure 3. Load profile of workdays. The load profiles are mined with K-means algorithm to get quite different types of profiles. To determine the best clustering results, the DB index is adopted to assess the result of clustering analysis. The smaller DB index is, the better clustering analysis will be. Table II is DB indexes 2015 International Journal of Electrical Energy 199

4 when k = 2, 3, 4. It is clear that the DB index is minimum when k = 3. So the load profiles of workdays are clustered into three categories. Fig. 4 is the 3 categories of load profile of workdays after normalized when k = 3. Load(kwh) TABLE II. Figure 4. Load profile of workdays after clustering. CLUSTER VALIDITY INDEX EVALUATION OF THE TOTAL METER Index k=2 k=3 k=4 DB B. Generalized Data Sets cluster1 cluster2 cluster3 0 00:00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 24:00 Time(hour) The power consumption and temperature data were categorized to get generalized data sets, which can reduce the number of continuous property values. Continuous data is discreted by K- means algorithm to get the labels of clusters instead of the actual data values. Structured, clear logic discreted data sets constitute the data sources of association rules, and prepare the necessary conditions for data mining. Specific process is as follows: Daily consumption of office buildings, air conditioning, and temperature are respectively divided into three intervals: a low level, middle level and high level, which are analyzed with K- means algorithm. In order to study the correlation between the four air conditioning units, is treated as on when daily consumption of air conditioning units is bigger than or equal to 100 kwh, the air conditioner is treated as off when its daily consumption is less than 100kWh. So we get generalization grade range as shown in Table III. Power data and temperature data were classified according to their level of generalization grade ranges, and we obtain Generalization datasets as Table IV. In Table IV, E1, E2, E3 respectively stand for the low level, the middle level and the high level of the electricity of whole building. A1, A2, A3 stand for three levels of total electricity consumption of air conditioners. Three levels of temperature are represented by T1, T2, T3. And a, b, c, d respectively represent switch condition of 4 air conditioners, a1 represents 1 # air conditioner is on, a0 represents 1 # air conditioner is off, and so on. TABLE III. GENERALIZATION GRADE RANGES Generalization items Generalization grade Low level Middle level High level Electricity consumption of the buildings on 9500~ ~ ~19500 workdays(kwh) Electricity consumption of the air conditioners on 1300~ ~ ~8800 workdays(kwh) temperature( ) 23.5~ ~ ~36.5 Sub air conditioner ON( 100) OFF(<100) (1# 2# 3# 4#) 1 0 Date Total consumption of the buildings Total consumption of the Air conditioners TABLE IV. GENERALIZATION DATASETS 1# 2# 3# 4# Temperature 2014/7/28 E3 A3 a1 b1 c1 d0 T2 2014/7/29 E3 A3 a1 b1 c1 d0 T3 2014/7/30 E3 A3 a1 b1 c1 d0 T3 2014/7/31 E3 A3 a1 b1 c1 d0 T3 2014/8/1 E3 A3 a1 b1 c1 d1 T2 2014/8/4 E3 A3 a1 b1 c1 d0 T3 2014/8/5 E3 A3 a1 b1 c1 d1 T3 2014/8/6 E3 A3 a1 b1 c1 d0 T3 2014/8/7 E3 A3 a1 b1 c1 d0 T3 2014/8/8 E3 A3 a1 b1 c1 d0 T2 2014/8/11 E2 A2 a1 b1 c1 d0 T2 2014/8/12 E2 A2 a0 b1 c1 d0 T2 2014/8/13 E2 A2 a0 b1 c1 d1 T2 2014/8/14 E2 A2 a0 b1 c1 d0 T2 2014/8/15 E2 A2 a0 b1 c1 d0 T2 2014/8/18 E1 A1 a1 b1 c1 d0 T1 2014/8/19 E1 A1 a1 b1 c0 d0 T International Journal of Electrical Energy 200

5 2014/8/20 E1 A1 a1 b1 c0 d0 T1 2014/8/21 E1 A1 a0 b1 c1 d0 T2 2014/8/22 E2 A2 a0 b0 c1 d0 T2 2014/8/25 E2 A2 a1 b0 c1 d0 T2 2014/8/26 E2 A2 a1 b0 c1 d0 T2 2014/8/27 E1 A1 a1 b0 c1 d0 T1 2014/8/28 E2 A2 a0 b0 c1 d0 T2 2014/8/29 E2 A2 a0 b0 c1 d0 T1 2014/9/1 E2 A2 a0 b0 c1 d0 T2 2014/9/2 E2 A2 a1 b0 c1 d0 T2 2014/9/3 E2 A2 a1 b0 c1 d0 T1 2014/9/4 E2 A2 a0 b0 c1 d0 T2 2014/9/5 E2 A2 a0 b0 c1 d0 T2 C. Obtain Association Rules Enter generalized data sets and the minimum support threshold into FP-Growth algorithm, run in the Visual C++ platform, to generate frequent sets, then frequent sets were mined to generate strong association rules, which are shown in Table V as results of comprehensive energy-efficiency analysis. TABLE V. STRONG ASSOCIATION RULES Rule Strong association Rule Support Confidence 1 A1 E A2 E A3 E d0 c a1 c b1 c T2 c T1 E T2 E T3 E A2, c1 E T2, d0 c d0, a1 c a1 d0, c D. Analysis of Rules 1) Effect of air conditioners on the total electricity consumption of the building Through analyzing Rule 1, Rule 2 and Rule 3, when power consumption of air conditioner are respectively at a high, medium and low level, total electricity consumption in buildings must be at corresponding levels. So the load of air conditioners in summer has a very important impact on the total load of the building. 2) Associations among the air conditioning units Through analyzing Rule 4, Rule 5 and Rule 6, the air conditioner 4 # (d0) has a greater impact on the air conditioner 3 # (c1), that is the association of airconditioner 3 # and 4 # is highest, as the environment and the air temperature is more similar in the same branch. 3) Effect of temperature on the air conditioners Through analyzing Rule 7, Rule 12, Rule 7 shows that when the temperature is in the range of , air conditioner 3 # will be on. And the support threshold of Rule 12 is slightly lower than that of Rule 7, down from to 0.5, indicating that under the premise of temperature T2, the time of air conditioning 3# in the on state is mainly the time of air conditioner 4# on the off state and the confidence threshold reach to 1. 4) Associations between temperature and the total electricity consumption of the building Through analyzing Rule 8, Rule9 and Rule 10, the temperature and the electricity consumption of the building have a very close relationship. Rule 9 shows that temperature and power consumption is at the middle level in most of the time, that is to say when the temperature is in the range of , the total electricity consumption of the building will be in the ~ kwh in probability of 76.5%. Rule 10 shows that once the temperature reaches above 31.5 degrees, electricity consumption must be at a high level. V. CONCLUSION AND FURTHER WORK This article s proposes a method of energy efficiency of buildings based on the data mining, and researches the dynamic comprehensive energy efficiency analysis for office buildings based on the demand response. The total electricity consumption of the office building, power consumption of air conditioners and meteorological data were mined by cluster analysis and association analysis and got some strong association rules, which provide evidence of improving energy efficiency of the building. During the study, data preprocessing plays a vital role in improving the accuracy of the results, and the amount of data objects is not too much have an impact on the applicability of the results. Further work will focus on addressing the lack of historical data and considering more factors of affecting electricity consumption of the building, so we can get richer and more accurate association rules. ACKNOWLEDGMENT This research is supported by the National Natural Science Foundation of China ( ), Science and Technology Commission of Shanghai (14DZ , 13DZ ), and Shanghai University of Electric Power (Z ). REFERENCES [1] J. Lausten, Energy efficiency requirements in building codes: Energy efficiency policies for new buildings, International Energy Agency Information Paper, [2] Building Energy Research Centre of Tsinghua University, China Building Energy Saving Annual Development Report 2013, Beijing: China Building Industry Press, International Journal of Electrical Energy 201

6 [3] B. Sun, P. B. Luh, Q. S. Jia, Z. O Neill, and F. Song, Building energy doctors: An SPC and Kalman filter-based method for system-level fault detection in HVAC systems, IEEE Transactions on Automation Science and Engineering, vol. 11, no. 1, pp , [4] X. Li, C. P. Bowers, and T. Schnier, Classification of energy consumption in buildings with outlier detection, IEEE Transactions on Industrial Electronics, vol. 57, no. 11, pp , [5] J. L. Mathieu, P. N. Price, S. Kiliccote, and M. A. Piette, Quantifying changes in building electricity use, with application to demand response, IEEE Transactions on Smart Grid, vol. 2, no. 3, pp , [6] H. Doukas, K. Patlitzianas, K. Iatropoulos, and J. Psarras, Intelligent building energy management system using rule sets, Building and Environment, vol. 42, no. 10, pp , [7] H. Doukas, C. Nychtis, and J. Psarras, Assessing energy-saving measures in buildings through an intelligent decision support model, Building and Environment, vol. 44, no. 2, pp , [8] M. Chai and S. Song, Application of association rules in stock analysis, Computer Applications, vol. 25, no. 4, pp , [9] X. Hou, B. Tian, S. Ge, and Z. Lu, Application of association rules techniques in electric marketing analysis, Proceedings of the CSU-EPSA, vol. 17, no. 2, pp , [10] P. Wang, Q. Chen, Y. Dong, L. Li, and P. Fan, Data mining and its application of performance analysis in thermal power units, Automation of Electric Power Systems, vol. 28, no. 8, pp , [11] Z. Ma and Y. Zhao, Model of next generation energy-efficient design software for buildings, Tsinghua Science And Technology, vol. 13, no. S1, pp , [12] Y. Wang, D. Huang, H. Xiong, and Y. Niu, Using relational analysis and multi-variable grey model for electricity demand forecasting in smart grid environment, Power System Protection and Control, vol. 40, no. 1, pp , [13] H. Li and Z. Cai, Application of association rules in medical data analysis, Microcomputer Development, vol. 13, no. 6, pp , [14] W. Bao, D. Yu, W. Wang, and Z. Xu, Sensor fault detection in thermal power plants based on association rule, Proceedings of the CSEE, vol. 23, no. 12, pp , [15] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Beijing: China Machine Press, [16] Z. Guo and Z. J. Wang, Home appliance load modeling from aggregated smart meter data, IEEE Transactions on Power Systems, vol. 30, no. 1, pp. 1-9, [17] T. Logenthiran, D. Srinivasan, and T. Z. Shun, Demand side management in smart grid using heuristic optimization, IEEE Transactions on Smart Grid, vol. 3, no. 3, pp , [18] D. D. Silva, X. Yu, D. Alahakoon, and G. Holmes, A data mining framework for electricity consumption analysis from meter data, IEEE Transactions on Industrial Informatics, vol. 7, no. 3, pp , [19] P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, 2nd ed., Beijing: Posts & Telecom Press, [20] J. Xiao, J. Zhang, T. Zhu, C. Shi, and H. Zhang, Analysis of urban power load based on association rules, Automation of Electric Power Systems, vol. 32, no. 17, pp, , Shunfu Lin received his B.S. in 2002 and Ph.D. in 2007 from the University of Science and Technology of China. He worked for the Corporate Technology of Siemens Limited China as a research scientist in power monitoring and control from July 2007 to September He was a post-doctoral fellow at University of Alberta, Canada from October 2009 to October Dr. Lin is currently a distinguished professor at the Shanghai University of Electric Power. His research interests include power quality and smart grid technology of LV distribution system. Chao Hao was born in Hebei, China, in He received the B.S. in electrical engineering and automation from Shijiazhuang Tiedao University, Hebei, China, in He is currently studying for his M.E. degree in power electronics and power transmission at Shanghai University of Electric Power, Shanghai, China. His major research interest is user-side technology of smart power International Journal of Electrical Energy 202