Contents. A Study of Quality Control on Time Series of Water Level in Agricultural Reservoir Jehong Bang, Jin-Yong Choi, Maga Kim

Size: px
Start display at page:

Download "Contents. A Study of Quality Control on Time Series of Water Level in Agricultural Reservoir Jehong Bang, Jin-Yong Choi, Maga Kim"

Transcription

1 Contents A Study of Quality Control on Time Series of Water Level in Agricultural Reservoir Jehong Bang, Jin-Yong Choi, Maga Kim Ph.D. Course in Department of Rural Systems Engineering, College of Agriculture and Life Sciences, Seoul National University Background & Objectives Conclusion and Future study 2 Background Hydrologic Data Background Agricultural water-resource Different types of water related data are being measured for water resources management including precipitation, water level, discharge, flood, ground water etc. Hydrologic data are essential to analyze the periodicity and patterns of water resources for management purposes In South Korea, agricultural reservoirs and canals are main facilities for irrigation water supply Water gauges are installed for measuring real time water level for reservoirs and canal inlets Data are gathered at every 10 minutes, and transmitting to the TOMS KRC (Korean Rural Community Cooperation) Headquarter to monitor the water storage and supply condition 3 4 1

2 Background Objectives Data quality Finding the proper methods and processes for quality control of time-series data of water level in agricultural reservoirs Implementing the quality control methods and performance evaluation Graphs above represent raw data of agricultural reservoir water level Errors and outliers in data are being removed with hand-operations and Data Quality Assessment process is not settled down To achieve sound agricultural water management using the data, it is important to control the data quality Thus, it is required that proper methodology and procedures for the data quality control has to be introduced different methods are introduced - General outlier detection process - EPA decision tree - Own outlier detection process General outlier detection 3-sigma rule of thumb Moving average Outliers are values so markedly different from the rest of the sample that they raise the suspicion that they may be from a different population or that the may be in error, doubts that frequently are hard to clarify (USGS, 2017) In the case of normally distributed data, the 3-sigma rule of thumb expresses a conventional heuristic that nearly all values are taken to lie within three standard deviations of the mean (empirically 99.7% prob.) 7 8 2

3 EPA decision tree Walsh s Test EPA decision tree is a categorized table for data quality assessment to select a specific statistical method for a certain environmental dataset (EPA, 2006) Reservoir level data is one-sampled time series data and distribution cannot be assumed to one 9 Reservoir level data doesn t seems to be normally distributed Walsh s test is thought to be appropriate because it is a nonparametric test and may be used when the data is not normally distributed 10 Raw data Outlier detection process 1 st order filter Outlier threshold estimation EPA tree Reservoir water level data 10-minute interval data Reservoir level data represents artificial operation such as release and natural process (precipitation) Therefore, reservoir level data has different properties from other hydrologic data, and need different method for Quality Control(QC) In this study, 2-step QC process was applied to reservoir level fluctuation data 2 nd order filter Elimination of deviating data Application of moving average to filtered dataset Upper/lower bound band creation (±α) Extraction of data inside the band from raw data Outlier No outlier Outlier and missing data Spike noise

4 Outlier detection with 3-sigma rule of thumb Outlier detection with moving average (50) Upper bound(115.70m) and lower bound(109.41m) was determined Full level(118.50m) > upper bound(115.70m) 3-sigma rule of thumb cannot be applied to this case Moving average with 50 data is applied Some outliers were excluded, but others which are dense and have high deviation remained Outlier detection with Walsh s test Quality control mid-result Three methods (3-sigma rule of thumb, moving average, EPA decision tree) were applied to the reservoir water level The reservoir water level data itself was used for Q.C. and turned out to be inappropriate The reservoir level is changing depend on the time and operation rules so that one static methodology cannot complete the DQA Therefore, 2-step filtration method was proposed Upper bound(26.12m) was determined and data deviating it were classed as outliers Upper bound(26.12m) < full level(28.50m) Turn out to be inappropriate method Method should be applied to not reservoir level itself, but reservoir level fluctuation data

5 Yellow dots represent the 1 st filtered data which exclude the extraordinary reservoir level fluctuation data Most of outliers were eliminated, but some remained Moving average (50) is calculated With the 1 st filtered data, and a band is made by adding and subtracting a parameter (α) The band is applied to the raw data and data inside the band is extracted α = 1.13m α = 0.68m

6 α = 0.45m α = 0.23m Outlier detection was conducted properly Black line represents 1 st filtered data and red point represents 2 nd filtered data Data within the band was extracted and it seems that outlier detection was conducted properly Go-Gyoung reservoir and Gang-Chung reservoir data QC conducted Gang-Chung reservoir: most of the noise and outlier were excluded Go-Gyoung reservoir: still has spike noise and outlier

7 Conclusion Noise still exist but can be removed with proper value of α Go-Gyoung reservoir and Gang-Chung reservoir data QC conducted Gang-Chung reservoir: most of the noise and outlier were excluded Go-Gyoung reservoir: still has spike noise and outlier Quality control process such as 3-sigma, moving average, Walsh s test was conducted to reservoir level data, but most of the methods were turned out to be inappropriate for water level data itself In this study, 2-step outlier detection method was suggested and applied to reservoir level fluctuation data The performance of quality control process was determined depending on value of α In some cases, the suggested method works successfully, but in other cases, additive outlier and innovative outlier remained after the outlier detection process Future studies In this study, determination of parameter α and the number of data in a moving average n is thought to be key problem which can make model performance Values of parameters (α, n) will be different respectively by reservoirs Parameters need to be optimized with certain methods α can be optimized with variable application of rule of thumb Optimization method for parameter n need to be confirmed Full level and dead level of reservoir can be considered Time-series analysis can be considered to exclude outliers 27 7