Flood Frequency Analysis of Tel Basin of Mahanadi River System, India using Annual Maximum and POT Flood Data

Size: px
Start display at page:

Download "Flood Frequency Analysis of Tel Basin of Mahanadi River System, India using Annual Maximum and POT Flood Data"

Transcription

1 Available online at ScienceDirect Aquatic Procedia 4 (2015 ) INTERNATIONAL CONFERENCE ON WATER RESOURCES, COASTAL AND OCEAN ENGINEERING (ICWRCOE 2015) Flood Frequency Analysis of Tel Basin of Mahanadi River System, India using Annual Maximum and POT Flood Data Nibedita Guru a, Ramakar Jha b a Research Scholar, Civil Engineering, NIT Rourkela, India, nibeditaguru149@gmail.com b Professor, Civil Engineering, NIT Rourkela, India, rjha34@gmail.com Abstract Flood frequency analysis indicates the catchment characteristics, water availability and possible extreme hydrological conditions like floods and droughts at various locations of any river system. Such studies have been done in the past using long term annual maximum flood series for early warning, preparedness, mitigation and reduction of any kind of disasters. In the present study, Annual Maximum (AM) flood series and Peak over Threshold (POT) flood series were used to carry out flood frequency analysis for Tel basin of Mahanadi river system, India. The POT values were considered based on (a) commonly used standard practice and (b) flood values damaging the downstream areas and causing disaster in Mahanadi river system, India. To recognize the anomalies in tail behavior of the flood frequency distribution and for selecting appropriate flood frequency distributions, Quantile-Quantile plots (Q-Q plots) were used. The analysis was carried out for flood series data of two gauging stations Kesinga (upstream) and Kantamal (downstream) of Tel basin, Mahanadi river system, India for the years Fourteen different flood frequency distributions were tried for AM and POT flood series data for 31 years for Kesinga and 38 years for Kantamal. The results obtained using Generalized Pareto (GP) distribution shows better results for AM flood data series with all goodness of fit tests. However, for POT flood data series LogNormal (3P) distribution showed best results followed by GP distributions with all goodness of fit test. The distributions most suitable for POT data sets are same for the distribution being used globally for flood forecasting The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( The Authors. Published Elsevier B.V. Peer-review under under responsibility responsibility of organizing of organizing committee committee of ICWRCOE of ICWRCOE Keywords: Annual maximum series; Peak over threshold; flood frequency analysis; probability distribution; Q-Q plots * Nibedita Guru. Tel.: address: nibeditaguru149@gmail.com X 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( Peer-review under responsibility of organizing committee of ICWRCOE 2015 doi: /j.aqpro

2 428 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) Introduction High flow exceeding danger levels and entering in flood plains is the result of heavy or continuous rainfall exceeding the absorptive capacity of soil, and the flow capacity of the streams. It causes widespread damage to property and life in different parts of the catchment. Despite the fascinating achievements of science and technology in the 21st century, floods and droughts continue to hit every generation of human beings, bringing suffering, death, and material losses. The knowledge of magnitude-frequency relationships can be used in the design of dams, spillway of dams, highway, bridges, culverts, water supply systems and flood control structures. In the past, flood frequency analysis techniques were developed to relate the magnitude of floods with their frequency of occurrences (Hosking and Wallis, 1997). Such studies have also been done to estimate flood based on catchment characteristics and statistical analysis. It is understood that a minimum of years of records are needed for flood frequency analysis. If the length of records is too short, specifically on inadequate data situation, then regional flood frequency curves together with at-site mean provides consistent estimates of floods. Some of the studies carried out in the past are discussed below; In the year 1868, O Connell performed one of the earliest studies on regional analyses of stream flows with simple empirical formulas that attempted to connect discharge to drainage area. The approach was very simple and the proposed formula was Where = maximum discharge; A = drainage area; and C = coefficient related to the region. O Connell selected a value of 0.5 for the exponent, considering the relationship between discharge and area as parabolic in the absence of sufficient data. The application of probability theory in flood estimation procedures was introduced by Fuller (1914) who calculated floods of different return periods for catchments in the U.S. With nearly 50 years of additional data, Fuller (1914) analyzed long records of daily flows and peak flow particularly the data from the United States. He related the average of the maximum floods ( ) to the drainage area with an exponent of 0.8 Hazen (1921) revised his own work and found some data sets plotted are as curved lines in log normal distribution. He suggested to use a three-parameter distribution including skewness and plot it on logarithmic probability paper. He made a note saying that The coefficient of skewness is subjected to the objection that there is a tendency for its value to increase with the number of terms in the series. Foster (1924) introduced the Pearson type III (P3) frequency distribution for describing the flood data. Gumbel (1941) brought the basis of analysis to a new level by applying extreme value theory. Using the findings of Fisher and Tippett (1928), Gumbel (1941) introduced the Extreme Value Type I distribution (EV1) to flood frequency analysis. Chow et.al (1988), related the magnitude of such extreme events with their frequency of occurrence through the use of probability distributions. By fitting the past observations to selected probability distributions, the probability of future high flow events can be predicted. Cunnane (1988) reviewed twelve different methods of regional flood frequency analysis, including well known methods such as the USWRC (U.S Water Resources Council) method, different variants of index flood methods, Bayesian methods and the two-component extreme value (TCEV) method and he rated the index flood using a regional algorithm based on PWMs as the best one. POT series are also denoted by some authors as Partial Duration Series (PDS) because the flood peaks can be considered as the maximum flow values during hydrograph (1) (2)

3 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) periods of variable length. An important advantage of the POT series is that when the selected base value is sufficiently high, small events that are not really floods are excluded. With the annual series, non-floods in dry years may have an undue influence on shape of the distribution. In river flood applications, for instance, the U.S. Water Resources Council (1976) considers consecutive peak floods as independent if the inter-event time exceeds a critical time and if an inter-event discharge drops below a critical flow. The dependence between the POT or PDS values is a function of the hydrological independence criterion used to divide the full series in its partial durations or of the parameters (e.g. threshold level) used to define the particular POT values (e.g. Lang et al., 1999). This paper discusses the method of choosing the threshold (the optimal number of upper extremes) in POT analysis of samples and distributions from (a) commonly used standard practice and (b) flood values damaging the downstream areas and causing disaster. An extreme value analysis methodology was used to recognize the anomalies in tail behavior of the flood frequency distribution by means of Quantile-Quantile plots (Q-Q plots). 2. Study area and Data Collection The Tel River originates in plain of Koraput district of Odisha, about 32 km to the west of Jorigam (Figure 1).It is the second largest river of Orissa and is an important tributary of the Mahanadi River. The river traverses a total length of 296 km to join the Mahanadi River on the right bank, 1.6 km below Sonepur. The total drainage area of the Tel River is about 22,818 km 2, in which km 2 lies up to Kesinga and km 2 lies up to Kantamal gauging stations. The Tel sub-basin is bound between latitude 18 to 21 and between longitude 83 to 86 approximately. The normal annual rainfall of the entire Mahanadi basin is 1360 mm (16% coefficient of variation, CV) of which about 6%, i.e.1170 mm, occurs during the monsoon season (15 % CV) from June to September. Fig. 1.General Location Map of the Tel sub-basin Daily discharge data for the years were collected from Central Water Commission, Bhubaneswar Figure 2 shows the daily mean discharge time series from for Kantamal (downstream) and 1979 to 2009 for Kesinga (upstream) station of Tel basin. In addition, we fit a non-linear function to the time series, using locally weighted scatterplot smoothing (LOWESS). The LOWESS results illustrate that the series does not have major non-stationaries in frequency or variability.

4 430 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) Materials and Method Fig. 2.Discharge time series including linear and non-linear trend of the Tel sub-basin, India Annual Maximum (AM) and Peak over Threshold (POT) flood series was used in the present study for fitting different distributions. AM analysis is relatively straightforward; it employs only the largest event in each year, regardless of whether the second (or third) event is greater than the largest events in other years. POT, also known as partial distribution series (PDS), analysis tries to overcome these problems by using all events above a specified threshold. This approach overcomes some of the problems with the AMS analysis, but it complicates the analysis by introducing other issues: the definition of an appropriate threshold level and the selection of independent exceedance of that threshold. In the present work, in order to model the flood series, Peak-Over-Threshold (POT) approach, which includes all independent peaks above a truncation or threshold level, was employed. The POT series may also be termed the partial duration series or basic stage series. The number of floods (K) generally will be different to the number of years of record (N), and will depend on the selected threshold discharge. The US Geological Survey (Dalrymple, 1960) recommended that K should equal 3N. If a probability distribution is to be fitted to the POT series the desirable threshold discharge and average number of floods per year selected depend on the type of distribution. It has been observed that the values below 5% probability of exceedance are affecting the downstream regions and may create disaster. However, these values are much more than 3N to 5N, as suggested earlier. Keeping all the criteria in view, the flood series having probability of exceedance less than 5% were considered in the present work for both the gauging stations. The Quantile- Quantile plot is a graphical technique for determining if two data sets come from populations with a common distribution. The methodology used included subjecting data to quality control using mass curves and time series plots to check if there are any outliers, selection of model type, fitting extreme value distributions, and determination of the return periods. Fourteen different distributions are fitted to the maximum annual discharges and Peak over Threshold from each of these stations, and parameters of these distributions are estimated using the method of maximum likelihood. The best distribution is selected based on the goodness-of-fit tests. 3.1 Method of Maximum Likelihood The method of maximum likelihood has been defined and applied to several probability distribution functions with defined probability density functions (pdf). Such method has suitable characteristics like the invariance property (Mood et al, 1974), and the asymptotically unbiasedness, sufficiency, consistency and efficiency in the large sample estimation and applicability in estimating the parameters of complex probability density functions. The likelihood function of N independent random variables is defined to be the joint probability density function of N random variables and is viewed as a function of the parameters. If X 1,..., X N is a random sample of a univariate probability density function, the corresponding likelihood function for the observed X 1,..., X N sample is where denotes the parameter set and f( ) is the probability density function. (3)

5 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) The logarithmic version of Eq. (1) is: (4) 3.2 Goodness-of fit Test The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. In assessing whether a given distribution is suited to a dataset, the following test and their underlying measures of fit can be used: Kolmogorov-Smirnov test, Anderson-Darling test and Chi Square test. In statistic, the Kolmogorov-Smirnov test (K S test) is a nonparametric test for the equality of continuous, one-dimensional probability distribution that can be used to compare a sample with a reference probability distribution (one-sample K S test), or to compare two samples (two-sample K S test). 4. Results and Discussion It has been observed that the values below 5% probability of exceedance are affecting the downstream regions and may create disaster in Mahanadi basin. Keeping these criteria in view, the flood series having probability of exceedance less than 5% were considered in the present work for both the gauging stations. It is interesting to note that the log-normal plot of daily discharge data as POT values for both the stations also indicate an exponential increase in discharge valves having probability of exceedance below 5% at Kesinga (upstream) and Kantamal (downstream) stations Figure 3. Figure.3. Log-normal plot of daily Discharge Data of Tel Sub-basin, India After obtaining the AM and POT flood series (see Figure 3), they were analyzed using the exponential quantile plots for quality control using mass curves, making time series plots to check if there are any outliers, to select the model type, to fit extreme value distributions and to determine return periods. The analysis involved preparation of the Q-Q plots and analysis of the behaviour of the distribution in the upper tail. On the basis of the results of this analysis, the appropriate distribution was identified. The evaluation of the extreme value distribution using the Exponential Q-Q plot is shown in Figures 4 and 5 for AM and POT flood series respectively. It has been observed that the exponential Q-Q plot gave the upper tail points tending towards a straight line.

6 432 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) Figure.4. Exponential Q -Q plot for AM for Kesinga and Kantamal Figure.5. Exponential Q -Q plot for POT for Kesinga and Kantamal Now, fourteen flood frequency distributions distribution employed, as discussed earlier, were tested for their applicability for AM and POT data sets at both the gauging stations. Generalized Pareto (GP) distribution gives best results for Annual Maximum flood series for which parameters were estimated using maximum likelihood method. The governing equation used for GP developed by Pickands (1975) can be written as (5) Where, and are location, scale and shape parameters respectively. The frequency distribution results are shown in figure 6. Figure.6. Generalized Pareto plot of AM flood series for Kesinga and Kantamal

7 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) It is interesting to see that Log-Normal (3P) distribution shows best results when the flood series based on POT were considered. The governing equation used for Log-Normal (3P) is written as (6) where μ is the scale parameter, is the location parameter and σ is the shape parameter. The frequency distribution results are shown in Figure 7. Further analysis indicates that the GP model is also providing very good results when we used POT data (rank 2) Figure 8. Figure.7. LogNormal (3P) plot of POT flood series for Kesinga and Kantamal Figure.8. Generalized Pareto plot of POT for Kesinga and Kantamal The goodness of fit tests, including Kolmogorov- Smirnov (KS), Anderson-Darling (AD), and Chi-square for all the data sets were done for Kesinga and Kantalmal gauging stations and is shown in Table 1. Table 1. Goodness of fit test of AM and POT for Kesinga and Kantamal Stations Best fit method Sample Size Kolmogorov-Smirnov Anderson- Darling Chi Square Statistic P-Value Statistic Statistic P-Value Kesinga (AM) GP Kantamal GP NA NA (AM) (Reject) Kesinga (POT) LN(3P)

8 434 Nibedita Guru and Ramakar Jha / Aquatic Procedia 4 ( 2015 ) Kantamal (POT). 5. Conclusions LN(3P) (Reject) At-site analysis was performed using both the Annual Maximum and Peak over Threshold (POT) flood series data sets of two stations (Kesinga and Kantamal) of Tel basin, Mahanadi river system, India. It has been observed that the values below 5% probability of exceedance are affecting the downstream regions and may create disaster in Mahanadi basin. Keeping these criteria in view, the flood series having probability of exceedance less than 5% were considered in the present work as POT values for both the gauging stations. The results are very promising and effective. The shape of the distribution s tail analyzed using quantile-quantile plots to discriminate between distributions based on their tail behaviour. The results indicated that the majority of both the stations used in the analysis had their record values conforming to the normal tail behaviour. Out of Fourteen frequency distributions, Generalized Pareto (GP) distribution showed the best results for AM data sets, whereas LN (3P) distribution showed best results for POT data sets followed by GP distribution. The goodness of fit tests using Kolmogorov Smirnov (KS), Anderson-Darling (AD), and Chi-square methods obtained indicates the suitability of the distribution models for flood predictions. References Chow, V.T., Maidment, D.R., Mays, L.W., Applied hydrology. McGraw-Hill, New York. Cunnane, C., Statistical distributions for flood frequency analysis. World Meteorological Organization. Operational Hydrology Report No.33, WMO Publ. No.718, Geneva. Dalrymple, T., Flood-frequency analyses. Water Supply Paper 1543-A, USGS, Washington, DC. Foster, H. A., Theoretical frequency curves. ASCE Trans., 87, Fuller, W. E., Flood flows. ASCE Trans., 77, Gumbel, E.J., The return period of flood flows. The Annals of Mathematical Statistics, 12(2): , 2, 3. Hazen, A., Discussion of Flood flows by W. E. Fuller. ASCE Trans., 77, Greenwood, J.A., Landwehr, J.M., Matalas, N.C., Wallis, J.R., Probability weighted moments: definition and relation to parameters of distributions expressible in inverse form. Water Resources Research 15(5): Hosking, J.R.M., Wallis. J.R., Parameter and quantile estimation for the generalized Pareto distribution, Technometrics, 29(3), Hosking, J.R.M., Wallis, J.R., Some statistics useful in regional frequency analysis. Water Resources Research, 29(2): Jayasuriya, MDA., Mein, RG., Frequency Analysis Using the Partial Series, Proc. 1985, Hydrology and Water Resources Symposium, IEAust. Natl. Conf. Publ. No. 85/2, pp Lang, M., Ouarda, T.B.M.J., Bobée, B., Towards operational guidelines for over-threshold modeling, J. Hydrol., 225, Lettenmaier, D.P., Wallis, J.R., Wood, E.F., Effect of regional heterogeneity on flood frequency estimation. Water Resources Research, 23(2): Langbein, W.B., Annual floods and the partial duration flood series. Transactions of the American Geophysical Union, vol. 30, no. 6, Madsen, H., Rasmussen, P.F., Rosbjerg, D., Comparison of annual maximum series and partial duration series methods for modeling extreme hydrologic events. 1. At-site modelling, Water Resources Research, 33(4), Meng, F., Li, J., Gao, L., ERM-POT model for quantifying operational risk for Chinese commercial banks. Lecture notes in computer science, Springer-Verlag Berlin Heildbreg, Berlin: Germany. Mood A. M., Graybill, F. A., Boes, D. C., Introduction to the theory of statistics (3rd ed.)tokyo: McGraw-Hill. O Connell, P. P. L., On the relation of the freshwater floods of rivers to the areas and physical features of their basins and on a method of classifying rivers and streams with reference to the magnitude of their floods. Minutes Proc. Inst. Civ. Eng., Pickands, J., Statistical inference using extreme order statistics. Ann Stat 3: Rosbjerg, D., Madsen, H., Rasmussen, P.F., Prediction in partial duration series with generalized Pareto-distributed exceedances, Water Resources Research 28(11), Water Resources Council (WRC), Guidelines for determining flood flow frequency. Bulletin 17 of the Hydrology Subcommittee, Water Resources Council, Washington, DC.