K-Hop Learning: A Network-Based Feature Extraction for Improved River Flow Prediction

Size: px
Start display at page:

Download "K-Hop Learning: A Network-Based Feature Extraction for Improved River Flow Prediction"

Transcription

1 K-Hop Learning: A Network-Based Feature Extraction for Improved River Flow Prediction Kartikeya Bhardwaj and Radu Marculescu Electrical & Computer Engineering Carnegie Mellon University 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, CPS Week 2017, Pittsburgh, PA, April 2017

2 Motivation Forecasting short-term river flowrate is important for planning/preparation Run-of-River Hydropower River Flooding River flowrate timeseries shows: Non-stationary behavior Many crests (peaks) and troughs Predicting river flow peaks is very hard We predict river flowrate 24-hours ahead Special attention to accurate peak prediction Compare with prior art: Timeseries-based machine learning models (e.g., Autoregression) [Li et. al, 2016; Nguyen et. al, 2015] Image Source: U.S. Army Corps of Engineers Digital Visual Library TROUGHS PEAKS Image Source: United States Geological Survey (USGS) 2

3 Outline Motivation Approach Data: Spatiotemporal Characteristics River Network Construction K-Hop Learning Experimental Setup and Results Conclusion 3

4 Approach: Constructing River Networks and K-Hop Learning 4

5 Data: Spatiotemporal Characteristics WE ARE HERE! Zoom-in further Ohio River Basin, USA: 204,000 sq. miles 3681 locations (grid points) resolution (11-13 km) Each grid point has daily timeseries data (from ): 1. River Flowrate 2. Precipitation Data source: NASA NLDAS-2 (Land Data Assimilation System: 5

6 The River Network Fractal Characteristics! These complex network characteristics give rise to non-stationarity and hard to predict river flow peaks This work: Use Machine Learning and Networks to capture these hard to predict peaks Nodes: Geographical locations Links: Rivers flowing between nodes 6

7 River Network Construction E Node ID: 3187 A If rivers at A and B meet at C: [Flowrate at node A + Flowrate at node B] must be highly correlated (>0.99) with Flowrate at node C D C Node ID: 3108 B Assuming no more than 3 rivers meet at a place: The problem is to find which combination of 8 neighbors of node C can maximize this correlation Directions determined by small rivers combine to form large rivers 7

8 River Network Mathematical Formulation Mathematically, if r i (t) is river flowrate at node i at time t, then for each node i: max R(i,j) corr r i t, j N(i) subject to r i avg > j N(i) r j (t) R(i, j) r j avg R(i, j) σ j R i, j 3, R i, j 0,1, i, j Matrix R: Adjacency matrix of the river network R(i,j)=1 if nodes i and j are connected, 0 otherwise 8

9 A Quick Validation Ohio River from Pittsburgh to Cairo Directed path in river network from Pittsburgh ends at Cairo and traverses the Ohio river 9

10 K-Hop Learning Use river network to extract features that can accurately predict the incoming river flow peaks 24-hours ahead River flow signal will experience delay at every hop Find optimal number of hops, K, so that the total delay across K Hops is 24 hours K-Hops Node i K-Hop Parents Detect the incoming peaks 24-hours ahead 10

11 K-Hop Learning Real Data K-Hop Parents can detect peaks 24 hours ahead E C Note the delay at every hop 11

12 K-Hop Learning Mathematical Formulation We want to learn 24-hours ahead peaks at node i Learn: Delay at every hop depends on river s physical features like geography or topography which do not change quickly max K xcorr r i t, j P K i r j (t) subject to 21 τ max xcorr 25 3 K 12 Constraint 1: Delay across K-Hops is approximately 24 hours (23±2 hours) Constraint 2: #hops doesn t exceed 12 for significant rivers (avg. flowrate > 150 m 3 /s) 12

13 River Flowrate Prediction and Comparison 24-hour ahead predictions are obtained using a linear model with 2 features: K-Hop: 24-hour ahead flowrate prediction at node i = f( ) Daily Precipitation at node i Network Sum of flowrates features for of Node node i s K-Hop i (K-Hop) parents Prior work: Although we use a linear model, our features come from the complex river network which is responsible for non-stationarity and hard to predict river flow peaks Prior art: Timeseries-based machine learning models Autoregression, SVM-Regresssion, KNN-Regression, Random Forests 24-hour ahead flowrate prediction at node i = h( ) Daily Precipitation at node i Daily flowrate timeseries at node i Recall: Non-stationary river flow arises from complex river Only network timeseries: characteristics NO NETWORK!! (fractals): Complexity Makes of the predicting problem river (e.g., flow peaks) is very incorporated hard in Network-based K-Hop Features 13

14 Experimental Setup and Results 14

15 Experimental Setup Experiments conducted on 3 different sized rivers: 24-hour ahead prediction conducted using sliding window : 20 days training, test on the next day, move training set forward by 1 day, test on the next day data used for K-Hop Learning, 1997 data for prediction All codes implemented in MATLAB Size Average Flowrate Location/River Small 150 m 3 /s Intermediate 400 m 3 /s Large >1000 m 3 /s Hurricane Creek near Wayne, West Virginia Youghiogheny River near Greenock, Pennsylvania Ohio River near Pittsburgh, Pennsylvania 15

16 Results Large River River flow peaks are detected very accurately using K-Hop which is not the case for timeseries-based models 16

17 Results Intermediate Size River River flow peaks are detected very accurately using K-Hop which is not the case for timeseries-based models 17

18 Results Small River River flow peaks are detected very accurately using K-Hop which is not the case for timeseries-based models 18

19 Results Root Mean Square Error (RMSE) Comparison with other models Impact of varying training set size River Size RMSE (in m 3 /s) % Gain AR RF SVM-R KNN-R K-Hop Large river average flowrate = 1000 m 3 /s Small Prior art: predict 39.69within % of the actual flowrate We predict within 1.4% of the actual flowrate Med Timeseries-based methods are much more likely to inaccurately predict hydropower/floods Large % Improvement Error scales very well with increasing river sizes 19

20 Conclusion We have proposed K-Hop Learning Feature Extraction from river network for accurate 24-hour ahead flowrate prediction K-Hop Learning significantly outperforms timeseries-based machine learning methods such as SVM Regression 57% to 82% gains obtained in accuracy compared to prior timeseries-based models Very accurate peak prediction which is particularly important for Run-of-River Hydropower and Flood predictions Demonstrates how combining network science methods with machine learning can offer significant advantages in computational sustainability -- This research is supported by a NSF CyberSEES Grant 20

21 Thank You 21