Big Data, Smart Energy, and Predictive Analytics. Dr. Rosaria Silipo Phil Winters

Size: px
Start display at page:

Download "Big Data, Smart Energy, and Predictive Analytics. Dr. Rosaria Silipo Phil Winters"

Transcription

1 Big Data, Smart Energy, and Predictive Analytics Dr. Rosaria Silipo Phil Winters

2 Hot Topics Telemetry Data Time Series Analysis SENSIBLE usages of Big Data Measurable / Applied to the Business Use public data please so we can all learn 2

3 Industries with these challenges: Manufacturing Chemical Life Science Transportation Utilities Automotive Cyber Security 3

4 Energy Industry Complex Networks Regulation Green Initiatives Competition Smart Meters / Transmission 4

5 The Irish Energy Trials Smart Meters >5000 homes and businesses Electricity and Gas Surveys 1 Year Trial 176 million Rows, 40 G of Lovely Data.. 5

6 Overview Import Data Build clusters of similar meter IDs Forecast energy usage per cluster Error Evaluation overall This part would benefit from a big data strategy Quantify Energy Usage per Meter ID On peaks only

7 Import Data knime: protocol to read data from KNIME server as knime://knime05/rosaria/timeseries/ 7

8 Transform Data Energy Measures per meterid: - Average kw used per hour, day, week, month, year - Intra-day % electricity usage with respect to full day usage - Week day % electricity usage with respect to full week usage - Total kw used in trial time window 8

9 Import and Transform Data

10 Clustering Meter IDs 30 clusters with k-means on average daily, monthly, hourly,... kw values and percent intra-day and weekday values 10

11 Clustering Meter IDs Average hourly time series cluster by cluster K-Means

12 Learning from the Clusters 30 clusters, 4 groups: - Night Owls - Late Evening People - All Rounders - Active at Day only - small/big businesses and households as sub-groups Note. Only clusters covering business entities show a differentiated usage of electricity during business days and weekends. 12

13 Energy and Forecasting Accuracy An improvement in forecasting accuracy of 1% was estimated to yield a saving in operating costs of approximately 10 million per year Bunn and Farmer 1984 Hourly: Protection from unexpected Peaks Daily: Optimal Scheduling, Allocation Weekly: Purchase Policies, Maintenance Monthly, Yearly: Strategic Planning and Production

14 Many Time Series Time Series Prediction Past Future x(t-n) x(t-n-1)... x(t-2) x(t-1) x(t) y(t-n) y(t-n-1)... y(t-2) y(t-1) y(t) Numerical Methods... z(t-n) z(t-n-1)... z(t-2) z(t-1) z(t) Only one time series Input Target 14

15 The Lag Column node Building x(t-3*4), x(t-2*4), x(t-4), x(t) lag = 3 lag interval = 4

16 Simple Auto-Regressive Model Auto-correlations between x(t) and its past Select only one time series Linear Regression of x(t) on its past MSE on test set Transforms data from x(t) to: x(t-lag)... x(t-1) x(t) Polynomial Regression of x(t) on its past

17 Select One Time Series Column Filter Quickform in Prepare Data meta-node: Configuration Window of metanode Sub-workflow of metanode 17

18 Simple Auto-Regressive Model Auto-correlations between x(t) and its past Select only one time series Linear Regression of x(t) on its past MSE on test set Transforms data from x(t) to: x(t-lag)... x(t-1) x(t) Polynomial Regression of x(t) on its past

19 Linear/Polynomial Regression Regression Target = cluster(t) All other columns, cluster(t-n), give the input values Mean Square Error 90% training set 10% test set

20 Seasonality Correction Seasonality Types: Winter vs. Summer Weekly 24h We need a template to just predict the differences: First 24h Average 24h on training set Previous 24h Previous Week (24h * 7)

21 Seasonality 24h * 7 Remove template from signal x(t) = x(t) x(t-24*7) Simple AR Model Add template back into predictions p(t) = p(t) + x(t-24*7)

22 Neural Networks 24h * 7 Remove template from signal x(t) = x(t) x(t-24*7) NN Prediction Model MSE Add template back into predictions p(t) = p(t) + x(t-24*7)

23 Cluster 1 The Night Owls 27

24 Cluster 1 28

25 Cluster 1 - Business (3 kw/hour, and 77 kw/day) - Energy Usage reduced on weekends - Consuming electricity mainly at night (65%) - Representing 53 meter IDs, a relatively common behavior among businesses 29

26 Nightly Owls Cluster 1 Jul kw Christmas 2009 Winter Christmas 2010 Jan Summer Date

27 Cluster 1 - Business (3 kw/hour, and 77 kw/day) - Energy Usage reduced on weekends - Consuming electricity mainly at night (65%) - Representing 53 meter IDs, a relatively common behavior among businesses - Reduced energy at Christmas and clear winter/summer seasonality 31

28 On a smaller time window... Cluster 1 Feb kw Wednesday Feb-03 Tuesday Feb-02 Friday Feb-05 Monthly Time Window Feb Monday Feb-01 Thursday Feb-04 Sunday Feb-07 Saturday Feb-06 Date

29 Cluster 1 - Business (3 kw/hour, and 77 kw/day) - Energy Usage reduced on weekends - Consuming electricity mainly at night (65%) - Representing 53 meter IDs, a relatively common behavior among businesses - Reduced energy at Christmas and clear winter/summer seasonality - Lower usage on weekends 33

30 On a smaller time window... Cluster 1 Sun Feb kw 1:00 00:00 Daily Periodicity 00:00 1:00 00:00 Sat Feb :00 00:00 Monday 11:00 16:00 Tuesday 10:00 Wednesday 10:00 Thursday 10:00 Friday 10:00 Saturday Sunday 4:00 19:00 20:00 Date + Hour of day

31 Cluster 1 - Business (3 kw/hour, and 77 kw/day) - Energy Usage reduced on weekends - Consuming electricity mainly at night (65%) - Representing 53 meter IDs, a relatively common behavior among businesses - Reduced energy at Christmas and clear winter/summer seasonality - Lower usage on weekends - Night active 35

32 Auto-correlations Cluster 1 View of the Linear Correlation node shows a 24- hour seasonality

33 Prediction Errors MSE on cluster 1 test set (mean signal 3.0 kw/h): MSE / Lag Seasonality Linear AR Polynomial* AR Linear AR First 24h Linear AR Previous 24h Linear AR 24h * NN 24h * *Polynomial Regression with degree = 3 4% error

34 Conclusions on cluster 1 - Pricing offer for medium size businesses in cluster 1 - lower price for night usage - different prices for different seasons - Lower price if paired together with cluster 21 - Auto-regressive and previous 24h seasonality correction for prediction of electricity usage - Only 4% prediction error -> priority in pricing offer? 38

35 Big Data Big ETL Big Analytics Big Data(bases)

36 Total execution time: ca 3 days Import + Aggregate Data (KNIME) On a 4-quad laptop with 8GB RAM

37 Total execution time: 1h 16min Import + Aggregate Data (DataRush) On a 4-quad laptop with 8GB RAM

38 Big Data Conclusions Accessing Data: Possibly Manipulating Data: Very Likely Beneficial Analyzing Data: Don t Bother Predictive Mining: Task Based! Execution: Possibly Real Time: Very Likely Beneficial With KNIME and RushAnalytics for KNIME, You can mix and match as required!!!

39 Next Developments - Winter/Summer seasonality in clustering and prediction - Include MA in auto-regressive model - Compare with micro-prediction and global prediction - Develop a meta-node frame to use any numerical mining algorithm for time series prediction 43

40 Conclusions Clustering to identify energy habits and sizes First Frame for Time Series Analysis with any numerical predictive model Optimization (seasonality and lag) of time series analysis based on cluster SENSIBLE usages of Big Data Tailored Contracts based on Clustering Scheduling and allocation based on Prediction 44

41 Where do I find more? Whitepaper and workflows: Open Source Software: KNIME Questions to rosaria.silipo@knime.com 45