Mining Heterogeneous Urban Data at Multiple Granularity Layers

Size: px
Start display at page:

Download "Mining Heterogeneous Urban Data at Multiple Granularity Layers"

Transcription

1 Mining Heterogeneous Urban Data at Multiple Granularity Layers Antonio Attanasio Supervisor: Prof. Silvia Chiusano Co-supervisor: Prof. Tania Cerquitelli

2 collection Urban data analytics Added value urban services for citizens CO 2 provisioning PM 10 exploitation SERVICES CITIZENS ANALYTICS Credits: realtyplusmag.com feedback processing DATA

3 Urban data analytics Analysis of data collected in the urban context BENEFITS DEFINE BETTER PLANNING POLICIES OPTIMIZE RESOURCES UTILIZATION INCREASE THE WELL-BEING OF CITIZENS ISSUES VOLUME HETEROGENEITY SPARSITY INACCURACY need for Efficient data analytics algorithms Innovative storage and computing infrastructures

4 PhD research activity Credits: realtyplusmag.com URBAN DATA ANALYTICS Heterogeneous urban data management and analysis BUILDINGS ENERGY EFFICIENCY Operational rating with real energy-related data PEOPLE'S PERCEPTION Characterization of relevant topics from social networks Real time predictive analytics Asset rating modeling

5 URBAN DATA ANALYTICS

6 Urban data management and analysis S M A R T C I T Y C O N T E X T Manifold data from heterogeneous sources Different spatio-temporal representations Different integration techniques MULTIPLE SPATIO-TEMPORAL PATTERNS

7 Urban data management and analysis Original contribution Distributed on-the-fly space-time data aggregation with MapReduce Distributed data analysis with MapReduce Fast exploration of different space and time resolutions

8 MuSTLE: Multiple Spatio-Temporal Layers Explorator Data at highest resolution Time-space heterogeneity Test different resolutions on the fly Need multiple time-space resolutions Dynamic datasets STORAGE AGGREGATION ANALYSIS DATA SOURCES Data volume Distributed doc-oriented DB Sharded Cluster Elaboration time MapReduce linear scalability

9 Results of analysis DATA: mobility, weather, air pollutants, urban facilities CORRELATION ANALYSIS REGRESSION ANALYSIS KPIs COMPUTATION (e.g., Total Heat Loss Coefficient)

10 MuSTLE computing performance Computation of KPIs of buildings thermal energy efficiency almost linear speed-up > 2k buildings > 50M records 1 to 5 computing nodes Attanasio, A.; Cerquitelli, T.; Chiusano, S. (2016), Supporting the analysis of urban data through NOSQL technologies. In: The 107th International Conference on Information, Antonio Attanasio Intelligence, - Mining Systems Urban and Data Applications, Chalkidiki, 18/06/2018 Greece, July, pp. 1-6

11 BUILDINGS ENERGY EFFICIENCY

12 Buildings energy efficiency A N A LY T I C S D Descriptive Predictive A T Real energy measures Operational rating Real time predictive analytics A Building features Asset rating modeling

13 Buildings energy efficiency A N A LY T I C S D Descriptive Predictive A T Real energy measures Operational rating Real time predictive analytics A Building features Asset rating modeling

14 Operational rating with real energy data Motivation Support the management of energy systems Help reducing inefficiency and energy waste Optimal sizing of the Heating Distribution Network

15 Context for operational rating MONITORED DATA Credits: wunderground.com Weather: Outdoor temperature, humidity, precipitations, wind speed, etc. WEATHER WEB SERVICES Heating Distribution Network Heating system: Energy, power, water temperature, etc. Environment: Indoor temperature, humidity Credits: withenergy.co.uk DISTRICT HEATING SYSTEM Contextual data: location of Buildings and sensors Building volume, floor area, type, etc.

16 Operational rating with measured data Original contribution KPIs at different space-time granularities computed with the MuSTLE framework Novel algorithm to predict instantaneous power demand values Acquaviva, A.; Apiletti, D.; Attanasio, A.; Baralis, E.; Bottaccioli, L.; Castagnetti, F. B.; Cerquitelli, T.; Chiusano, S.; Macii, E.; Martellacci, D.; Patti, E. (2015), Energy Signature Analysis: Knowledge at Your Fingertips. In: IEEE International Congress on Big Data (BigData Congress) 2015, New York, USA, 27 June July pp A. Acquaviva; D. Apiletti; A. Attanasio; E. Baralis; L. Bottaccioli; S. Chiusano; E. Macii; E. Patti, Forecasting Heating Consumption in Buildings: a Scalable Full-Stack Distributed Engine. In: Expert Systems with Applications 16 Antonio Attanasio (SUBMITTED) - Mining Urban Data 18/06/2018

17 Real time predictive analytics Prediction of buildings thermal power demand buildings have different daily heating cycles during steady state power is highly variable during transient state peak power value is hardly predictable single cycle double cycle triple cycle

18 Real time predictive analytics Approach composed by three algorithms Status and Outlier Detection (SOD): predicts the operational state Peak Detection (PD): predicts the peak power value in the ON-line transient state Power Prediction (PP): predicts the power profile in the ON-line phase OFF-line ON-line transient ON-line steady

19 Power Prediction Assumption: power exchange correlated with surrounding weather conditions Method: building models based on linear regression (Energy Signature) Target variable: power Explanatory variables: temperature, humidity, wind speed, precipitations, pressure One building model for each operational phase Real time approach: building models periodically updated with new measures

20 Power Prediction Model learning Prediction horizon Training window Current time Prediction target Array of weights W = [w 1,, w n ] OUTPUT For linear regression Slot duration t c t j y = n w i x i i=0 Prediction at time t j Predicted power value at t j Forecasted value of weather feature i at t j Weight for weather feature i computed during model learning

21 Experimental results Input data: 12 buildings with different heating cycles Original time granularity: 5 minutes Results: Best time granularity: 30 minutes Best training window: 7 days Overall 1st cycle 2nd cycle 3rd cycle SMAPE 9.62% 10.80% 10.41% 9.51%

22 Buildings energy efficiency A N A LY T I C S D Descriptive Predictive A T Real energy measures Operational rating Real time predictive analytics A Building features Asset rating modeling

23 Asset rating with building features Motivation Building designing: how building features affect the energy efficiency Existing buildings: estimate the improvements of a refurbishment plan

24 Buildings Energy demand modeling Objective of the activity Quantify and explain building efficiency based on data from Energy Performance Certificates Building geometric features Physical features of building envelope Building historical info Energy related variables Estimation of Primary Energy Demand for space heating (PED h ) Description of its relationship with building features

25 Buildings Energy demand modeling Original contribution Methodology: two-layer approach to characterize the energy demand of buildings Analysis: interpretable classification model [8] A. Attanasio, M. Savino Piscitelli, Silvia Chiusano, Alfonso Capozzoli, Tania Cerquitelli, Data mining analysis of a dataset 25 of residential buildings energy certificates: characterization and estimation of heating energy 18/06/2018 demand, In: Energies, (SUBMITTED)

26 Buildings Energy demand modeling Two-layers approach Segment estimation (classification) Local energy demand prediction (regression) Algorithms Artificial Neural Networks Support Vector Machines Reduced Error Pruning Tree Random Forest

27 Buildings Energy demand modeling Segment estimation: classification results ANN REPT RF SVM Accuracy [%]

28 Buildings Energy demand modeling Local energy demand prediction: regression results ANN REPT RF SVM MAPE [%] w.r.t. single layer approach MAPE 16.64% vs 29.82% (REPT over the entire dataset) w.r.t. related work Smaller set of explanatory variables Algorithms trained in few seconds Interpretable models provided to domain experts

29 PEOPLE'S PERCEPTION

30 Analysis of relevant topics from social networks Motivation Understand collective dynamics of people s interests and needs Highly debated topics on social networks Devise targeted actions in the management of a city Characterize patterns of user interests across different cities

31 Analysis of relevant topics from social networks SOCIAL NETWORK SITES User-generated media time place text URLs videos photos

32 Analysis of relevant topics from social networks Objective of the activity Characterization of popular topics from Twitter posts Discovery of text-spatio-temporal patterns of tweets Cohesive clusters of tweets Significant association rules

33 Analysis of relevant topics from social networks Original contribution Extensible framework Different clustering algorithms Different distance measures TASTE distance measure: text, space, and time features Characterization through association rules Xin Xiao, Antonio Attanasio, Silvia Chiusano, Tania Cerquitelli, Twitter data laid almost bare: An insightful exploratory analyser, In Expert Systems with Applications, Volume 90, 2017

34 Analysis of relevant topics from social networks Tweet representation Twitter Streaming API Cleaning (stopwords, ) TF-IDF

35 Analysis of relevant topics from social networks TASTE: Text And Spatio-TEmporal distance d TASTE τ i, τ j = d W (k s e p s d s + k t e p t d t ) TEXT inverse of cosine similarity SPACE Haversine distance TIME Euclidean distance

36 DISTANCE MEASURES Analysis of relevant topics from social networks Comparative analysis of distance measures d TASTE τ i, τ j = d W (k s e p s d s + k t e p t d t ) d Kim τ i, τ j = d s d Arc τ i, τ j = [max (d s, d t )] β d Lee τ i, τ j = d W e ζ d t M d Cun τ i, τ j = w W d W + w t d t + w s d s + w so d so τ i, τ j : tweets representations d s : Space distance d t : d w : Time distance Text distance

37 Analysis of relevant topics from social networks Distance Measure Comparative analysis of distance measures Mean time distance (min) Mean space distance (km) Mean text distance (rad) TASTE Kim 3, Arc Lee Cun

38 Analysis of relevant topics from social networks Clusters characterization with association rules Rules with highest support and lift Different topics in different cities Different evolutions of topics over time in different cities RULE CENTROID TIME CENTROID LOCATION SUPPORT LIFT {uruguay} {england} Perth (UK) 5% 2.4 {suarez,someone} {bite} Rugeley (UK) 3% 26.9 {nigeria} {argentina} Banning (US) 2.1% 7.2

39 Conclusion Urban data management and analytics (MuSTLE) distributed data aggregation and analysis on the fly exploration of multiple combinations of spacetime data granularities almost linear computational scalability Credits: realtyplusmag.com

40 Conclusion Urban services: buildings energy efficiency analysis of real consumption and of building features prediction of power levels and energy demand Actionable knowledge from interpretable models of energy demand Credits: withenergy.co.uk

41 Conclusion People's feedback from social media Extraction of clusters of tweets focused on relevant topics Space-time-text characterization of clusters New distance measure based on space, time and text features

42 Publications CONFERENCES/WORKSHOPS [1] Enhancing energy awareness through the analysis of thermal energy consumption. In: The 4th workshop on Energy Data Management, Brussels, March 27th [2] Energy Signature Analysis: Knowledge at Your Fingertips. In: IEEE International Congress on Big Data (BigData Congress) 2015, New York, USA, 27 June July [3] Fast and Effective Decision Support for Crisis Management by the Analysis of People's Reactions Collected from Twitter. In: 2nd International Workshop on Big Data Applications and Principles (BigDap 2015), Poitiers, September 8-11, 2015 [4] Container-Based Orchestration in Cloud: State of the Art and Challenges. In: 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, Blumenau, 2015 [5] Supporting the analysis of urban data through NOSQL technologies. In: The 7th International Conference on Information, Intelligence, Systems and Applications, Chalkidiki, Greece, July, pp. 1-6 JOURNALS: [6] Twitter data laid almost bare: An insightful exploratory analyser. In: EXPERT SYSTEMS WITH APPLICATIONS, vol. 90, pp ISSN [7] A Knowledge-Based Multi-Criteria Decision Support System Encompassing Cascading Effects for Disaster Management. In: International Journal of Information Technology & Decision Making (ACCEPTED)