MODELING WATER QUALITY VARIABLES OF THE POTOMAC RIVER AT THE ENTRANCE TO ITS- ESTUARY PHASE I: TREND AND SEASONALITY

Size: px
Start display at page:

Download "MODELING WATER QUALITY VARIABLES OF THE POTOMAC RIVER AT THE ENTRANCE TO ITS- ESTUARY PHASE I: TREND AND SEASONALITY"

Transcription

1

2 MODELING WATER QUALITY VARIABLES OF THE POTOMAC RIVER AT THE ENTRANCE TO ITS- ESTUARY PHASE I: TREND AND SEASONALITY Prepared by: J.T.B. Obeysekera and Vujica Yevjevich International Water Resources Institute Civil, Mechanical, and Environmental Engineering Department The George Washington University This report is produced under the research grant Improved Accuracy in Modeling Non-Point Source Water Quality and Flow in Tidal Portions of the Potomac River Basin, awarded to The George Washington University by the D.C. Water Resources Research Center of the University of the District of Columbia, Washington, D.C. Washington, D.C. January, 1985

3 Table of Contents I. INTRODUCTION Background Sources of Nutrients Detection of Trend and Seasonality Research Objectives Report Outline 6 II. RESEARCH DATA ASSEMBLY 7 I. 2.1 Data Availability Data Assembly USGS-PES Data Set WAQ Data Set DES Data Set 11 I. 2.6 STORET Data Set 11 V. 2.7 USGS-MD Data Set 12 III. TREND ANALYSIS 13 I. 3.1 Background Techniques of Trend Analysis 15 Regression Models 15 Nonparametric Methods USGS-PES Data Set WAQ Data Set 24 Regression Analysis 40 Nonparametric Tests 51 Summary DES Data Set STORET Data Set Streamflow at Chain Bridge Summary of Results 80 IV. ANALYSIS OF SEASONALITY Background Detection of Seasonality 83 Nonparametric Methods 83 The Kruskal-Wallis test 84 The k-sample median test 84 The van der Waeden test 85 Fourier Series Approach USGS-PES Data Set WAQ Data Set DES Data Set STORET Data Set Summary of Results 117 REFERENCES i -

4 Table 1 Major sub-basins of Potomac river. 4 Table 2 Grand statistics of USGS-PES data set averaged over each day. 22 Table 3 Nonparametric trend test results of USGS-PES data set. 23 Table 4 Grand statistics of WAQ data set. 25 Table 5 Regression analysis on daily water quality data of WAQ data set. 41 Table 6 Regression analysis on standardized mean monthly water quality 44 data of WAQ data set, for the period Table 7 Regression analysis on mean annual water quality data of WAQ 44 data set, for the period Table 8 Results of the seasonal Kendall trend test applied to mean biweekly 52 water quality series for the period of the WAQ data set. Table 9 Results of the modified seasonal Kendall trend test applied to mean 52 monthly water quality variables of the WAQ data set. Table 10 Summary of hypothesis testing by regression models and nonparametric 54 trend tests applied to WAQ data set. Table 11 Grand statistics of DES data set. 56 Table 12 Results of the linear regression analysis on water quality data of DES 67 data set. Table 13 Results of the seasonal Kendall trend test applied to water quality 68 variables of the DES data set. Table 14 Grand statistics of the STORET data set. 69 Table 15 Results of the linear regression analysis on water quality data of STORET data set. 77 Table 16 Results of the seasonal Kendall trend test applied to water quality variables of the STORET data set. 78 Table 17 Results of the nonparametric tests used for detection of seasonality in the USGS-PES data set. 88 Table 18 Computed Fourier series coefficients of the means and standard deviations of DISCHRGE, CONDUCT, SUSSED, TPHOS, and N02NO3 of the USGS-PES data set. 90 Table 19 Results of the nonparametric tests applied to the more frequently monitored constituents of the WAQ data set. 94 Table 20 Results of the nonparametric tests applied to the less frequently monitored constituents of the WAQ data set. 94 Table 21 Computed Fourier series coefficients of the means and standard deviations of selected water quality constituents of the WAQ data set. 99 Table 22 Results of the nonparametric tests applied to the constituents in the DES data set. 105 Table 23 Computed Fourier series coefficients of the means and standard deviations of selected water quality constituents of the DES data set. 107 Table 24 Results of the nonparametric tests applied to the variables in the STORET data set ii -

5 Table 25 Computed Fourier series coefficients of the means and standard deviations of selected variables of the STORET data set. 112 Table 26 Summary of results on detection of seasonality iii -

6 List of Figures Page Figure 1 The Potomac River Basin 2 Figure 2 Plot of Mean Turbidity Vs. Data WAQ Data Set 27 Figure 3 Plot of Mean Chlorine - Demand Vs. Date WAQ Data Set 28 Figure 4 Plot of Mean Alkalinity (Methyl Orange) Vs. Date WAQ Data Set 29 Figure 5 Plot of*mean Non Carbonate Hardness Vs. Date WAQ Data Set 30 Figure 6 Plot of Mean Total Hardness Vs. Date WAQ Data Set 31 Figure 7 Plot of Mean PH Vs. Date WAQ Data Set 32 Figure 8 Plot of Mean Carbon Dioxide Vs. Date WAQ Data Set 33 Figure 9 Plot of Mean Dissolved Oxygen Vs. Date WAQ Data Set 34 Figure 10 Plot of Mean Chemical Oxygen Demand Vs. Date WAQ Data Set 35 Figure 11 Plot of Mean BOD5 Vs. Date WAQ Data Set 36 Figure 12 Plot of Mean Nitrite Vs. Date WAQ Data Set 37 Figure 13 Plot of Mean Nitrate Vs. Date WAQ Data Set 38 Figure 14 Plot of Mean Annual Alkalinity Vs. Year WAQ Data Set 45 Figure 15 Plot of Mean Annual PH Vs. Year WAQ Data Set 46 Figure 16 Plot of Mean Annual C02 Vs. Year WAQ Data Set 47 Figure 17 Plot of Mean Annual BOD5 Vs. Year WAQ Data Set 48 Figure 18 Plot of Mean Annual Nitrite Vs. Year WAQ Data Set 49 Figure 19 Plot of Mean Annual Nitrate Vs. Year WAQ Data Set 50 Figure 20 Plot of Temperature Vs. Date DES Data Set 57 Figure 21 Plot of Dissolved Oxygen Vs. Date DES Data Set 58 Figure 22 Plot of Biochemical Oxygen Demand Vs. Date DES Data Set 59 Figure 23 Plot of Turbidity Vs. Date DES Data Set 60 Figure 24 Plot of Alkalinity Vs. Date DES Data Set 61 Figure 25 Plot of Total Phosphorus Vs. Date DES Data Set 62 Figure 26 Plot of Total Kjeldahl Nitrogen Vs. Date DES Data Set 63 Figure 27 Plot of Nitrite Vs. Date DES Data Set 64 Figure 28 Plot of Nitrate Vs. Date DES Data Set 65 Figure 29 Plot of Ammonia Vs. Date DES Data Set 66 Figure 30 Plot of Discharge Vs. Date STORET Data Set 70 Figure 31 Plot of Conductivity Vs. Date STORET Data Set 71 Figure 32 Plot of Total Ammonia Vs. Date STORET Data Set 72 Figure 33 Plot of Total Kjeldahl Nitrogen Vs. Date STORET Data Set 73 Figure 34 Plot of Total Nitrogen Vs. Date STORET Data Set 74 Figure 35 Plot of Total Phosphorus Vs. Date STORET Data Set 75 Figure 36 Plot of Total Dissolved Phosphorus Vs. Date STORET Data Set 76 Figure 37 Plot of Monthly Discharge at Chain Bridge Vs. Data USGS-MD Data Set 79 Figure 38 Cumulative Periodograms of Selected Water Quality Constituents of USGS-PES Data Set in the Case of (a) Mean and (b) Standard Deviation 89 Figure 39 Mean Monthly Discharge (DISCHRGE) of the USGS Data Set and the Figure 40 Fitted Fourier Function with Two Harmonics Mean Monthly Conductivity (CONDUCT) of the USGS Data Set and 91 Figure 41 the Fitted Fourier Function with Two Harmonics Mean Monthly Suspended Sediment (SUSSED) of the USGS Data Set 91 Figure 42 and the Fitted Fourier Function with Two Harmonics Mean Monthly Total Phosphorus (TPHOS) of the USGS Data Set and 92 the Fitted Fourier Function with Two Harmonics 92 - iv -

7 List of Figures (continued) Page Figure 43 Mean Monthly Nitrate + Nitrite (N02NO3) of the USGS Data Set Figure 44 and the Fitted Fourier Function with Two Harmonics 93 Cumulative Periodograms of TURB, CLODEMAN, METHORN, THARD, and C02 of WAQ Data Set in the Case of (a) Mean and (b) Standard Deviation Figure 45 Cumulative Periodograms of DO, COD, BOD5, and N03 of WAQ Data 96 Set in the Case of (a) Mean and (b) Standard Deviation 98 Figure 46 Mean Monthly Turbidity (TURB) of the WAQ Data Set and the Fitted Fourier Function 100 Figure 47 Mean Monthly Chlorine Demand (CLODEMAN) of the WAQ Data Set and the Fitted Fourier Function 100 Figure 48 Mean Monthly Alkalinity-Methyl Orange (METHORN) of the WAQ Data Figure 49 Set and the Fitted Fourier Function Mean Monthly Total Hardness (THARD) of the WAQ Data Set and the 101 Fitted Fourier Function 101 Figure 50 Mean Monthly PH of the WAQ Data Set and the Fitted Fourier Function 102 Figure 51 Mean Monthly Carbon Dioxide (C02) of the WAQ Data Set and the Fitted Fourier Function 102 Figure 52 Mean Monthly Dissolved Oxygen (DO) of the WAQ Data Set and the Fitted Fourier Function 103 Figure 53 Mean Monthly Chemical Oxygen Demand (COD) of the WAQ Data Set and the Fitted Fourier Function 103 Figure 54 Mean Monthly Biochemical Oxygen Demand (BOD5) of the WAQ Data Set and the Fitted Fourier Function 104 Figure 55 Mean Monthly Nitrate (N03) of the WAQ Data Set and the Fitted Fourier Function 104 Figure 56 Cumulative Periodograms of TEMP, DO, TURB, ALK, TKN, and N03 of the DES Data Set in the Case of (a) Mean and (b) Standard Deviation 106 Figure 57 Mean Monthly Temperature (TEMP) of the DES Data Set and the Fitted Fourier Function with Two Harmonics 108 Figure 58 Mean Monthly Dissolved Oxygen (DO) of the DES Data Set and the Figure 59 Fitted Fourier Function with Two Harmonics Mean Monthly Turbidity (TURB) of the DES Data Set and the 108 Fitted Fourier Function with Two Harmonics 109 Figure 60 Mean Monthly Alkalinity (ALK) of the DES Data Set and the Fitted Fourier Function with Two Harmonics 109 Figure 61 Mean Monthly Total Kjeldahl Nitrogen (TKN) of the DES Data Set and the Fitted Fourier Function with Two Harmonics 110 Figure 62 Mean Monthly Nitrate (N03) of the DES Data Set and the Fitted Fourier Function with Two Harmonics 110 Figure 63 Cumulative periodograms of DISCH, CONDUCT, TKN, TOTN, and of the STORET Data Set in the Case of (a) Mean and (b) Standard Deviation 113 Figure 64 Mean Monthly Conductivity (CONDUCT) of the STORET Data Set and the Fitted Fourier Function with Two Harmonics 114 Figure 65 Mean Monthly Discharge (DISCH) of the STORET Data Set and the Fitted Fourier Function with Two Harmonics v -

8 Figure 66 Figure 67 Figure 68 Page Mean Monthly Total Kjeldahl Nitrogen (TKN) of the STORET Data Set and the Fitted Fourier Function with Two Harmonics 115 Mean Monthly Total Nitrogen (TOTN) of the STORET Data Set and the Fitted Fourier Function with Two Harmonics 115 Mean Monthly Dissolved Phosphorus (DPHOS) of the STORET Data Set and the Fitted Fourier Function with Two Harmonics vi -

9 I. INTRODUCTION 1.1 Background The Potomac River with its 14,670 square miles (Figure 1) is the second largest tributary to the Chesapeake Bay, one of the largest estuarine systems in the world. The Potomac originates in the headwaters in the eastern slopes of the Appaliachian Mountains and flows in a southeasterly direction to its Fall Line at Great Falls, Virginia, below which the river is tidal. Above the Fall Line, the drainage area is about 11,500 square miles. The average discharge of the Potomac near Washington, D.C. is cubic meters per second (m3/s) with discharges ranging from a minimum of 3.43 m3/s (September 9, 1966) and a maximum of 13,700 m3/s (March 19, 1936) during the fifty-one year ( ) period of record (Blanchard and Hahl,1984). The Potomac estuary below the Fall Line is constantly used for many purposes including industrial water supply, navigation, recreation and commercial fishing. The Washington metropolitan area with its population of about 3 million and the numerous parks and other land uses has a direct influence on the state of the Potomac estuary in general and water quality conditions in particular. Beginning in the late 1940's and early 1950's, the use of the estuary has been hampered occasionally by the occurrences of level dissolved oxygen (DO) and the nuisance blooms of macroscopic and microscopic plants. Occurrences of floating algal mats also have been reported. In the 1960's, the Potomac estuary was already in an advanced state of eutrophication characterized by massive blue-green algae blooms and frequent low dissolved oxygen levels. It was considered to be an open sewer and a national disgrace (U.S. EPA, 1983)

10

11 Any attempt to understand and model the eutrophication phenomena in the Potomac estuary must involve an accounting of inputs and outputs of nutrients and oxygen demanding material which are physically controllable through various means. The estuary receives point source discharges from municipal wastewater treatment plants around the estuary and combined sewer overflows and point and nonpoint source inputs from up-basin and estuary tributaries. The overall research study which led to this preliminary report deals with the quantification and modeling of the latter, namely the inputs originating in the Upper Potomac basin. The study reported here involves the analysis of water quality inputs as time series for detecting trends and seasonal patterns. 1.2 Sources of Nutrients Table 1 shows the major sub-basins and their drainage areas of the Potomac river basin, shown in Figure 1. The land use in the entire basin is estimated to be 5 percent urban, 55 percent forest, and 40 percent agriculture and pasture land (Lang, 1982). Although the largest percentage of the basin is covered by forests, they normally have lower concentrations t" -.an agricultural areas. Based on a stream sample network of 40 stations established during the calendar year 1966, Jaworski (1969) reported the following estimates of nutrient loadings from land runoff. The forest areas (6800 square miles) contributed 2400 lbs/day of total phosphorus as P04, 13,600 lbs/day of nitrate + nitrite as N, and 2,720 lbs/day of total Kjeldahl nitrogen as N. The agricultural areas (4,100 square miles) contributed 5,100 lbs/day of total phosphorus as P04, 24,500 lbs/day of nitrate + nitrite as N, and 2,660 lbs/day of total Kjeldahl nitrogen as N. It was noted that

12 Sub-basin Drainage area (sq. mi.) North Branch 1,328 South Branch 1,493 Cacapon River 683 Conococheague Creek 563 Opequon Creek 345 Antietam Creek 292 Shenandoah River 3,054 Monocacy River 970 percent of the nutrients from land runoff is from agricultural areas even though over 62 percent of the basin is covered by forest (Jaworski, 1969). Municipal and industrial wastewater discharges throughout the basin also constitute another major source of nutrients. Jaworski (1969) reported that as of 1968, there were 256 wastewater discharges in the upper Potomac River Basin. It was estimated that about 18,430 lbs/day of total phosphorus and 10,680 lbs/day of total Kjeldahl nitrogen were discharged to the surface waters. Among the nine subbasins shown in Table 1, Shenandoah sub-basin was the largest source of nutrients. The amount of nitrate + nitrite in both the industrial and municipal discharges was estimated to be insignificant, suggesting that most of the organic nitrogen comes from land and other sources. The nutrient loads reaching the Potomac estuary can be very different from the figures quoted above, primarily due to storage characteristics of the stream channel in between the discharge points and the entrance to the estuary. For instance, from a mass balance analysis, Jaworski (1969) found that at six of the eight stations investigated, phosphorus was retained in the stream channel bound by sediments and aquatic plants

13 1.3 Detection of Trend and Seasonality Water quality variables like many others in hydrology can be conceived in general as deterministic-stochastic processes. They show deterministic variations caused by physical mechanisms, man made or natural, that occur in a basin mixed with stochastic variations caused by numerous random factors. The quantification and the modeling of water quality usually require the variables as time series. The detection of trends and seasonality is a prerequisite for many approaches in modeling time series. it has many practical implications as well. Since the early 1960's, many steps have been taken to reduce the discharges of nutrients and oxygen demanding material into the Potomac River and its estuary. The effectiveness of such measures will be seen only by investigating the changes (trends) in water quality variables analyzed as time series. The trends may be present in the form of jumps and/or continuous changes. Since homogeneity in a sample is a prerequisite for many statistical analyses, these trends must be identified and removed prior to such analyses. The distinction between the changes due to natural or man made causes and the data inconsistency (systematic errors) is a difficult problem for many applications. The land use patterns determine the extent of loads of many water quality variables. Since certain land use categories such as agriculture exhibit seasonal variations, the loads and concentrations of water quality inputs entering the estuary must also show similar seasonal patterns. Even if the concentration is constant within a year, the load exhibits the seasonal variations of stream discharge. The study of seasonality in general -5-

14 is useful for determining the extent and the changes necessary in upstream management practices to control the discharges of nutrients and other constituents. The detection and modeling of seasonality is also a prerequisite to many approaches in time series analysis. 1.4 Research Objectives The overall objectives of the research project is to improve the accuracy in modeling of nonpoint source water quality and flow in the Potomac River basin. The phase of the project which is reported here has the following objectives: 1. Detection of changes (trends, jumps, etc.) in water quality input to the Potomac estuary. 2. Detection and modeling of seasonality in concentrations of water quality input to the Potomac estuary. 1.5 Report Outline Chapter II presents a description of the research data assembly and the division of the entire collection into separate data sets for purposes of analysis. Chapter III presents the trend analysis applied to different data sets and a brief description of parametric and nonparametric techniques used for such analysis. The techniques of analysis of seasonality and their applications to different data sets are presented in Chapter IV. - 6

15 II. RESEARCH DATA ASSEMBLY 2.1 Data Availability The emphasis of this phase of research was to statistically analyze the water quality inputs entering the Potomac estuary. The data requirements for the types of analysis proposed include water quality inputs as short interval time series, preferably daily. Unfortunately the data availability for stations around the Fall Line (Great Falls, Little Falls, and Chain Bridge) do not meet such requirements. The data available at these stations have been collected by various Federal and State agencies. The data records differ significantly in accuracy, frequency of observation, length and period of records, variables observed, and the units reported. Within most of the data sets, observations are not uniform temporally, making the applications of time series models difficult. For instance, one of the U.S. Geological Survey data sets has observations approximately every week mixed with several observations within certain days. During the observation period of certain data sets, the data collection and laboratory procedures have changed. In these situation, the consistency of the data set needs to be verified. Although many records contain gaps, filling in or extension of records was not attempted because of the differences between and within the data sets. Instead, the data sets are analyzed individually with the intention of combining the results and conclusions of individual analysis. 2.2 Data Assembly A considerable length of time was spent assembling the data, with relatively moderate success. After noting that the data availability around - 7 -

16 the Fall Line is inadequate, attempts were made to collect relevant data from upstream stations which would indirectly indicate the potential loadings at the Fall Line. A strong need for one organization continuously processing and maintaining the water quality data for the entire Potomac Estuary and its tributaries was felt. This would considerably lessen the time spent on collection, transmission, and processing of data by various agencies concerned with the estuary. For this phase of the research project, the data have been made available from the following agencies: 1. United States Geological Survey, both National Center and Maryland District Office. 2. Interstate Commission for Potomac River Basin (ICPRB). 3. Department of Environmental Services, Washington, DC. 4. Washington Aqueduct Division, U.S. Army Corps of Engineers. 5. United States Environmental Protection Agency, STORET system. 6. Council of Governments (COG). The collection of data has. been separated into different data sets for purposes of analysis. A brief description of each data set follows

17 2.3 USGS-PES Data Set This data set has been collected as a part of the Potomac Estuary Study (PES) initiated in August, 1977, and completed in September, Although the entire study involved data collection for approximately 23 stations (Blanchard and Hahl, 1981; Blanchard et al, 1982; Blanchard, 1982; Blanchard, 1983) only the data at Chain Bridge were used for this phase of the research study. The Chain Bridge station is located 1.9 km downstream from Little Falls Dam where the USGS streamflow station is located. The samples have been collected from a narrow, boulder strewn part of the channel where flow is turbulent and well mixed. More details regarding the water quality station and the sample collection can be found in Blanchard and Hahl (1984) and in foregoing references. A summary of data used for the present study is provided below. Period of Record: December 1977 to September 1981 Number of Observations: approximately 640 Frequency of Observation: weekly during periods of low flow, three to four times a day during large floods Discharge (cfs) 0 Conductivity (micromhos at 25 0) Suspended Sediment (mg/1) Total Phosphorus as P (mg/1) Dissolved Phosphorus as P (mg/1) Total Kjeldahl Nitrogen as N (mg/1) Dissolved Kjeldahl Nitrogen as N (mg/1) Dissolved Ammonia as N (mg/1) Dissolved Nitrate + Nitrite as N (mg/1) Most observations contain missing values of certain variables. The data set contains more than 390 non-missing observations for all the variables except Ammonia which has only 163 observations. The months of October, November, and December contain less than about 30 observations in each for the entire record

18 2.4 WAQ Data Set This data set was compiled from the chemical analysis data of samples collected by the Washington Aqueduct Division of the Army Corps of Engineers near the water supply intake at Little Falls. The records at this station are available since December Weekly observations of dissolved oxygen (DO), biochemical oxygen demand (BOD5), and only nitrate (N02) and nitrite (N03) are available with some gaps. Daily observations (with missing values in weekends) are available for some other variables which include turbidity, alkalinity, hardness, and PH. Although the record includes only a few nutrient parameters, all of the variables were used in the current study. It was assumed that the data were representative of water quality of the Potomac River at Little Falls. More specific information regarding the data set is given below. Period of Record: December 1963 to February 1984 Frequency of Observation: Variables Included in the Data Set: Remarks: daily (except weekends) for turbidity, alkalinity, hardness, PH and C02 and weekly for DO, COD, BOD5, N02 and N03 Turbidity Chlorine Demand Alkalinity (Methyl Orange) Hardness PH Carbon Dioxide (C02) Dissolved Oxygen Chemical Oxygen Demand (COD) Biochemical Oxygen Demand (BOD5) Nitrate (N02) Nitrite (N03) Considering the scarcity of water quality data in general, this is an excellent data set. Unfortunately, only few nutrient parameters have been monitored

19 2.5 DES Data Set This data set is a compilation of the data collected by the Department of Environmental Services, Washington, D.C. The samples are collected at the Fletcher's Boat House (FBH) which is located at a short distance down stream of Chain Bridge. Other pertinent information regarding this data set are given below: Period of Record: June 1966 to December 1983 Frequency of Observation: weekly Variables Included in the Data Set: Remarks: Temperature Dissolved Oxygen Biochemical Oxygen Demand Turbidity Alkalinity Total Phosphorus Total Kjeldahl Nitrogen Nitrate as N Nitrite as N Ammonia as N The laboratory practices have changed during the period of record and therefore the consistency is questionable. 2.6 STORET Data Set The Environmental Protection Agency's STORET system has water quality data for several stations within the Potomac River Basin. Only the data at Chain Bridge, Washington D.C. were retrieved for the present study. The parameters to be analyzed are conductivity, streamflow, ammonia, total Kjeldahl nitrogen, organic nitrogen, total nitrogen, and total phosphorus

20 2.7 USGS-ND Data Set This data set was obtained from the United States Geological Survey's Maryland District Office. A brief description of the entire data set is as follows. 1. Water Quality Monitor Data - Potomac River at Chain Bridge ( ) - the variables included are conductivity, dissolved oxygen, PH and temperature. 2. Water Quality Monitor Data - Potomac River at Great Falls ( ) - only conductivity and temperature. 3. Sediment and Flow data for Potomac River at Point of Rocks ( ). 4. Sediment data for Monocacy River at Reich Ford ( ) and flow data for Monocacy River at Jug Bridge ( ). 5. Flow data for Potomac River at Washington, D.C. ( ). 6. Water quality data for Potomac River at Point of Rocks ( ), Monocacy River at Reich Ford ( ), and Potomac River at Chain Bridge ( ). The Potomac River station at Point of Rocks and the Monocacy River stations are located upstream of Great Falls (see Figure 1 for approximate locations.) Although this data set contains many parameters as time series, it was received late and only part of it was included in the current study. It is intended to use this data set extensively during the future phases

21 III. TREND ANALYSIS 3.1 Background The trends are conceived as the changes in basic parameters either continuous in time or jumps at different time epochs along observed or projected time series of water quality variables. Both systematic errors (inconsistency) and nonhomogeneity cause trends in parameters of water quality variables. The inconsistency may be viewed as the differences between observed values with systematic errors and the true values in nature which contain various changes in nature. The systematic errors are due to changes in gauge environment, instrumentation and methods of observation, sampling techniques and laboratory procedures, and methods of processing data. The nonhomogeneity is the difference between the observed values with the natural and man made changes in place and the values that would have resulted without them. The major sources on nonhomogeneity are the changes in land use including urbanization, man-made flow and water quality regulation, and sudden occurrences or slow evolutions of disaster phenomena such as volcanic eruptions and desertification. It is with the trends introduced by nonhomogeneity that the engineers are concerned ordinarily, but their differentiation from the trends due to inconsistency is often difficult. The characteristics of the water quality data bases available today are such that they do not satisfy many requirements for the application of standard statistical methods of trend analysis. For instance, short interval water quality and quantity variables in general have non-normal marginal probability distributions for which many standard statistical methods

22 are not applicable. Moreover, the serial correlation in short-interval data violates the independence assumption often required in such methods. The time series modeling approaches which may or may not account for non-normality but consider serial correlation encounter difficulties due to short samples, missing gaps in records, nonuniform sampling intervals, censored data, and inconsistencies due to practices such as changing laboratory procedures. These are all common characteristics of water quality data bases. Fortunately, some nonparametric techniques are available for situations where the above parametric methods are not strictly applicable. Although nonparametric techniques do not account for all the shortcomings stated above, they can be used often for trend analysis with probable minor effects due to characteristics such as serial correlation which are not considered in their development. The weak set of assumptions required for the validity of nonparametric tests have made them popular among scientists and engineers. In spite of the robustness in assumptions, the nonparametric methods hold up quite well against the parametric methods when the assumptions of the latter are violated (Lehmann and D'Abrera, 1975). This section describes the application of parametric and nonparametric techniques of trend analysis to different data sets mentioned in Chapter II. First, a brief description of the specific methods employed in this study is given. It is followed by the presentation and discussion of results obtained by applying these techniques to different data sets

23 3.2 Techniques of Trend Analysis Regression Models Although the trend in water quality time series can be present in many of their statistical parameters (mean, standard deviation, skewness, etc.), the regression models used here account only for the trend in the mean. The regression models assume that the trend in the mean can be represented by a polynomial equation of T m = a + alt + a2 t atn (1) o where T is the changing mean, t is the current time, and a m o, al, a2, --are regression parameters. The water quality variable Xt at time t is composed of the trend component T and a random term Zt as M Xt = T + Zt (2) m In general, Zt can be considered to be a stochastic process. Of particular interest are the linear trend of the form T m and the quadratic form of the trend = a + alt (3) o T m = a + a1t + a2 t 2 (4) o The proper estimation of the parameters a, al, a2, etc. of the trend o

24 Component under non-ideal situations of Zt is a formidable task. The non-normality, seasonality, and serial correlation in Zt make the efficient estimation of these parameters from observed water quality data difficult. For this study, it was assumed that Zt follows Z t = E t b1z t-1 b2 Z t b p Z t-p where Et is a normally and independently distributed random component with zero mean, b1 through b y are autoregressive parameters, and p is the order of the autoregressive process. In situations where the serial correlation is known to be small, such as the case of larger time interval between successive observations, Zt is assumed to be an independent process identical to Et. Through the above formulation, the trend analysis is assumed to be a problem of parameter estimation in regression models where the data are the time series and the error term is an autoregressive process in general. In the present study, regression models for trend analysis are employed only when the water quality observations are available as time series for sufficiently long periods. The application of regression models are car ried out by using the programs available in the Statistical Analysis System (SAS, 1982) package installed at the George Washington University Computing Center. In particular, the procedures REG and AUTOREG were used for trend analysis

25 Nonparametric Methods The application of nonparametric techniques to detecting trends in water quality variables has been discussed by Lettenmaier (1976), Hirsh et al (1982), and van Belle and Hughes (1984). The increasing popularity of these techniques for trend analysis is due to their requirement of few assumptions regarding the structure of water quality data bases. According to van Belle and Hughes (1984), the nonparametric trend tests can be categorized into two broad classes: (a) aligned rank methods and (b) intrablock methods. For a discussion of the distinction between the two classes, the reader is referred to their paper. The intrablock methods as discussed by Hirsh et al (1982) are more suitable for non-ideal situations such as the case of missing data, non-uniform sampling interval than the aligned rank procedures, although the latter are more powerful. For this Study, a modified intrablock method which accounts for multiple observations in a given season (eg. month) as reported by van Belle and Hughes (1984) was used. A brief description of this procedure is as follows. First, it is assumed that the seasons within the year correspond to the calendar months without any loss of generality. The water quality constituent record is assumed to consist of multiple observations within a given month in a year and that the number of observations in a month varies from one year to another. The data for a month (say January) may be displayed as shown below:

26

27 where C a = a(a-1); t is the number of data points involved in a tie; for instance, if 3 values are equal, then t = 3 and so on; u is the number of time points involved in a tie, i.e. the number of data points in each year for a given month. (4) Repeat (1), (2) and (3) for all seasons and compute S = SJAN + S FEB S DEC and var[s] = var[s ] + var[s ] var[s ] JAN FEB DEC (5) Now, let which is a random variable whose distribution is well approximated by the standard normal distribution in the case of no trend in data. Consequently, a null hypothesis H o may be advanced as the situation where all values in a given month are distributed independently and identically, meaning no trend from year to year. The alternative hypothesis is that for one or more months the sub sample is not distributed identically

28 In a two-sided trend test, Ho (no trend) is accepted when Z < z a where za is the standard normal variate corresponding to the significance level a. If the null hypothesis is rejected, one can proceed further by interpreting a positive value of S as an "upward trend" and a negative value of S as a "downward trend". One final remark regarding the parametric and nonparametric trend tests is in order. Often, the water quality constituents for which the trend tests are applied are correlated to quantities such as streamflow and conductivity. The historic pattern of these quantities can have a significant influence on the perceived trend in the water quality constituents which have significant correlations with them. For instance, if the streamflows indicate an apparent downward "trend" due to random fluctuations, the water quality constituents may also indicate such a trend if they are positively correlated with streamflow. What is more important, however, is to know whether a perceived trend indicates a real change in the relationship between streamflow and the water quality constituent due to a change in the physical processes by which the constituent is made available to and transported by the river. Hirsh et al (1982) proposed a flow adjustment procedure which uses the regression between the streamflow and the water quality constituent prior to the trend analysis on the constituent itself. Specifically, a linear regression between the flow and concentration is found, and the resulting residuals are subject to the nonparametric trend tests described above. In this study, the seasonality in the regression relationship is accounted by computing a separate regression for each month. The applications of the above techniques of trend analysis to each water quality data set is described in the following subsections. Selected

29 water quality variables of each data set are subject to trend tests, and the appropriate trend tests for a particular data set are chosen on the basis of its characteristics (eg. Length, uniformity in sampling interval, missing data, availability of data, etc.). Sometimes both parameters and nonparametric techniques are used. 3.3 USGS-PES Data Set The record length in this data set is short (December 1977 to September 1981), and the observations are nonuniform with respect to time. Consequently, only the nonparametric trend tests are employed. The results of this application would indicate only the presence or absence of "trends" in water quality only during the period of record, and they should not be used to make any conclusions regarding long term trends. The application of trend tests to this short record was mainly to investigate their sensitivity to such factors as flow adjustment. Table 2 presents the basic statistics of the entire USGS-PES data set when the seasonality in the parameters is ignored. The data set was averaged over each day prior to the computation of these statistics in order to retain only a single observation for each day for subsequent trend and seasonality analyses. The results obtained by applying the nonparametric trend tests to the USGS- PES data set are summarized in Table 3. This table presents the values of S, var[s], the Z-statistic of (8), (9) and (10) respectively, and the correlation coefficient between the flow and the water quality constituents. The trend test results are shown for (a) raw data, (b) data adjusted for correlation between flow and the water quality constituent, and (c)

30 Table 2.Grand statistics of USGS-PES data set averaged over each day. Standard N Mean Deviation Skewness Variable Label DISCHRGE Discharge at Chain Bridge CONDUCT Conductivity at CB SUSSED: Suspended Sediment at CB TPHOS: Total Phosphorus at CB DPHOS Dissolved Phosphorus at CB TKJELN Total KN as Nitrogen at CB DKJELN Diss. KN as Nitrogen at CB NH 3 Ammonia as Nitrogen at CB NO NO NO +NO as Nitrogen at CB data adjusted for seasonally (monthly) varying correlation between flow and the constituent. The seasonal adjustment for correlation was deemed necessary to reduce the influence, if any, of seasonality in parameters on the flow and the constituent regression relationship and the subsequent flow adjusted values. The sign of S in Table 3 indicates the direction (upward and downward) of variation of the particular constituent. The results of raw data indicate that the particular time history of discharge during the period of record indicates an apparent downward trend. Among the water quality constituents, only dissolved phosphorus (DPHOS) and ammonia (NH3) show a significant (c(= 5%) trend (upward) during the period of record. Conductivity (CONDUCT) indicates a significant upward trend. The correlation coefficients shown in Table 3 indicate that certain parameters have high correlations with flow. Thus the significant "trends" indicated by the trend tests on raw data cannot be trusted since they could be simply due to that particular time history of discharge. For instance,

31 Table 3. Nonparametric trend test results of USGS-PES data set Seasonally Variable Raw Data Flow Flow Correlation Adjusted Adjusted Coefficient Data Data With Flow Value of S DISCHRGE CONDUCT SUSSED TPHOS DPHOS TKJELN DKJELN NH N02NO Variance of S (105) DISCHRGE CONDUCT SUSSED TPHOS DPHOS TKJELN DKJELN NH N02NO Z-Statistic DISCHRGE CONDUCT SUSSED TPHOS DPHOS TKJELN DKJELN NH N02NO z (α=5%) a

32 the significance of an upward trend indicated for conductivity disappears when it is adjusted for its negative correlation with flow. The upward trends indicated for dissolved phosphorus and ammonia remain significant at a = 5% even after flow adjustment. This is to be expected since their correlation with flow is very small. Although the total phosphorus (TPHOS) in raw data does not indicate a significant trend, the Z-statistic of TPHOS after flow adjustment is significant at a = 5%. Since the flow has an apparent downward trend and TPHOS has a high positive correlation with flow, any real upward trend in TPHOS could have been masked by this particular time history of discharge. Another interesting result is shown for S-value of NO NO 2 3. The value of S after overall flow adjustment is positive, whereas it takes a large negative value with N02NO3 adjusted for flow seasonally. This is probably due to a strong seasonal variation in the correlation between NO NO 2 3 and flow as indicated by a large negative correlation coefficient during high flow months and a positive correlation coefficient during certain low flow months. A final remark on the trend results of USGS-PES is in order. On account of the short record length, a conclusive inference regarding the trends in water quality at Chain Bridge cannot be made. However, the results presented above indicate how this particular nonparametric method is sensitive to correlations (and its seasonality) and that caution should be exercised in making inferences regarding trends in water quality variables or their relationship with flow. 3.4 WAQ Data Set Table 4 presents the basic statistics of the entire WAQ data set. The

33 Table 4. Grand statistics of WAQ data set. Variable Label Standard N Mean Deviation Skewness TURB Turbidity at Little Falls CLODEMAN Chlorine Demand at LF METHORN Alkalinity (Methyl Orange) NCARBH Non-Carb. Hardness at LF THARD Total Hardness at LF PH PH at LF C02 CO 2 at LF DO Dissolved Oxygen at LF COD Chemical Oxygen Demand at LF BOD5 Biochemical Oxygen Demand NO 2 Nitrite at LF NO Nitrate at LF seasonal (eg. monthly) statistics may vary around the corresponding values of the grand statistics shown. Clearly, the data set can be separated into two, groups on the basis of frequency of nonmissing values, N. The first group consists of variables TURB, CLODEMAN, METHORN, NCARBH, THARD, PH and CO 2 (see Table 4). Each of these variables may be analyzed as a time series, although there are missing values during weekends and a few weekdays. The entire 1979 year was missing in the original WAQ data set. The data prior to July, 1965 are also very sparse. Consequently, the trend analyses which consider the variable as a time series were performed only for the period July, 1965 to December, However, the entire data set covering the period December, 1963 to February, 1984 was employed in non-parametric trend analysis which poses no problems with missing data. The second group, consisting of variables DO, COD, BOD5, NO 2, and NO3, has observations approximately every week, with some missing values. Since the intervals between successive observations vary within the data set, the variables in this group were not analyzed as weekly time series

34 The skewness values of TURB, CO 2, COD, and NO 2 are relatively high, indicating nonnormal marginal probability distributions for these which probably is due to a few outliers in the data. It is to be noted that although the nonparametric tests are robust against nonnormality in probability distributions, certain parametric analyses assume normality. The fact that high skewness in the variable may significantly affect the accuracy of estimated parameters must be considered when interpreting results of parametric analyses of such a variable. Visual inspection of the plots of water quality variables with time must be the first step in identification of trends. Since the variables in the first group contain a large number of observations, they were average in each month for all the years before making the time plots. Figures 2 through 13 present these plots for the 12 variables of the WAQ data set (Table 4). A visual inspection of these figures reveals the following observations regarding the trend and other characteristics of the water quality variables. 1. The mean (monthly) turbidity (TURB) is positively skewed, and exhibits no definite trend. 2. The mean chlorine demand (CLODEMAN) exhibits a nonstationary behavior over the period of record. In particular, the data indicates a strong downward "trend" prior to 1973 followed by a strong upward "trend". Whether there were physical reasons for such a behavior remains to be investigated. 3. Mean alkalinity (METHORN) exhibits no definite trend in its time series plot. 4. Noncarbonate hardness (NCARBH) is the difference between the total hardness and alkalinity (carbonate + bicarbonate). The plot of

35

36

37

38

39

40

41

42

43

44

45

46

47 NCSRBH with time indicates a strong non-stationary behavior similar to that of CLODEMAN. 5. Total hardness (THARD) exhibits no definite trend. 6. The time plot of PH indicates a strong downward trend, with low values reported after The record of CO 2 extends only until the end of During this period CO 2 exhibits an upward trend. 8. There is no indication of a trend in the time plot of dissolved oxygen (DO). 9. The mean chemical oxygen demand (COD) exhibits a downward trend. A change in the variability of COD with time (a possible trend in variance) is also indicated on the plot of COD versus time. 10. Mean BOD5 does not indicate any possible trend during the period of record. 11. Mean nitrite plot shows relatively high values after As a consequence, any trend analysis for this variable would indicate a strong upward trend. While searching for any physical reasons for such a drastic change in NO 2, the possibility of errors and/or changes in observational and recording procedures must not be ignored. 12. The nitrate (NO 3 ) also exhibits a strong upward trend. It is important to note that the visual inspection of time plots is only useful for preliminary investigation of possible trends in data. Final conclusions regarding trends based on the visual inspection alone can be dangerous since apparent "trends" can be merely stochastic variations in the random variables of water quality. Statistical tests (parametric and/or nonparametric) must be used to investigate the significance of

48 apparent trends in data. Regression Analysis The regression of the water quality variables with time and then testing the regression coefficients for statistical significance was employed for detection of trends. This parametric method assumes that the water quality variable is normally distributed and independent in sequence. These assumptions are often violated in short interval water quality data. Averaging over larger intervals often makes the data closer to normal and reduces dependence. In this study, ordinary linear regression techniques were employed for trend detection in daily, mean monthly, and mean annual water quality variables, although certain underlying assumptions in regression analysis may have been violated in daily and monthly data. The procedure REG available in the statistical package SAS (Statistical Analysis System, 1982) was employed for all regression analyses. The error term in the regression models of daily and monthly data is often dependent across time. In such situations, the ordinary least squares estimates of regression coefficients are biased and inefficient. In order to correct for this deficiency, the daily and monthly time series data were also analyzed by using regression models which assume the error term to be dependent, or more specifically, to be an autoregressive process. This was done by employing the procedure AUTOREG available in the SAS package. Table 5 presents the results of the regression analysis on daily water quality variables in the WAQ data set. The procedure AUTOREG was applied only for the period July 1965 to December 1978 to exclude large gaps in data. The observation on a missing day during this period (mostly due to

49 Table 5. Regression analysis on daily water quality data of WAQ data set. Variable Method of Estimation Period of Analysis Intercept Slope Aprox.** a b Probability TURB REG AUTOREG CLODEMAN REG AUTOREG METHORN REG NCARBH AUTOREG REG THARD AUTOREG REG AUTOREG PH REG AUTOREG C02 REG AUTOREG DO REG COD REG BOD5 REG NO 2 REG NO REG * procedures available in the statistical package SAS. ** approximate significance probability of the computed t = b/stderr(b) under the null hypothesis b = 0. interruptions of sampling on Sundays) was filled by the previous day's observation. The procedure AUTOREG was not applied to the second group of variables which have only weekly observations. The last column in Table 5 reports the approximate significant probability of the computed Student's t statistic assuming that t = b/stderr(b) has a Student's t distribution under the null hypothesis that b = 0. The null hypothesis is rejected if this probability is smaller than a specified significance level, say 5%.