Data Mining

As discussed in the previous section, the key objective in the data mining process was to screen out data collected under conditions or factors that might interfere with, or confound, the point source discharge/downstream DO signal. This section presents the six-step, peer-reviewed data mining process designed and implemented to develop the before- and after-CWA data sets to be used in the comparison analysis.

Step 1—Data Selection Rules

The data selection step incorporated three screening rules:

• DO, expressed as a concentration (mg/L), will function as the signal relating municipal and industrial discharges to downstream water quality responses.

• DO data will be extracted only from the July-September (summer season) time period.

• Only surface DO data (DO data collected within 2 meters of the water surface) will be used.

DO Concentration (mg/L) as the Water Quality Indicator The rationale for selecting DO as the water quality indicator for this study was discussed earlier in this chapter and in Chapter 2. The only question remaining was how this parameter should be expressed in the analysis—by concentration or by percent of DO saturation. The latter measurement has some advantages because it would reduce the noise introduced by changes in temperature. However, DO expressed as mg/L concentration was ultimately selected because it is more intuitive to a broader audience. For example, USEPA has established a DO concentration of 5.0 mg/L as the minimum concentration to be achieved at all times for early life stages of warm-water biota (see Table 1-1). For this reason, this level of DO is used as a benchmark for assessing acceptable versus nonacceptable conditions. In contrast, it is somewhat more difficult, perhaps, to comprehend whether a DO saturation of 50, 60, or 70 percent is protective.

DO from the Time Period of July to September Summer and early fall (July through September) is usually the best time for evaluating worst-case impacts of wastewater loading on water quality in general and DO in particular in most areas of the continental United States. Typically, this is when water temperatures are highest and flow is the lowest (i.e., lowest oxygen solubility and lowest dilution potential). Selecting DO data from only this time period screens out noise introduced by seasonal variations in temperature, precipitation, and flow. In addition, BOD loadings from nonpoint sources of pollution are reduced during low precipitation periods, thus minimizing this contribution to DO signals.

DO from Surface Waters In lakes, reservoirs, estuaries, coastal waters, and deep rivers, scientists typically measure DO at several depths in the water column. Often these measurements reveal significant differences between surface and bottom DO concentrations because of thermal stratification and the lack of reaeration of the bottom layer. By limiting DO data selection to the top 2 meters of a waterway, one can screen out much of the noise associated with the physical, chemical, and biological processes that occur in the lower layers and maintain some level of comparability between shallow streams and deeper waters.

Step 2—Data Aggregation Rules from a Temporal Perspective

The data aggregation from a temporal perspective step incorporated the following rules:

• 1961-1965 will serve as the time-block to represent conditions before the CWA, and 1986-1990 will serve as the time-block to represent conditions after the CWA.

• To remain eligible for the before- and after-CWA comparison, DO data must come from a station residing in a catalog unit that had at least 1 year classified as dry (streamflow ratio < 0.75) out of the 5 years in each before- and after CWA time-block.

An analysis of catalog units revealed that 1,923 (91 percent) of the 2,111 catalog units in the contiguous United States experienced at least one dry summer in the 19611965 time-block. Further, a total of 1,475 catalog units (70 percent) experienced at least two dry summers, and 886 catalog units (42 percent) experienced at least three dry summers in the before-CWA time-block. Of the catalog units that remained eligible for the comparison analysis (note that only 188 were screened out), low flow conditions persisted for an average of 2.5 years. In the 1986-1990 time-block, 1,776 (84 percent) of the 2,111 catalog units in the contiguous United States experienced at least one dry summer. A total of 1,420 catalog units (64 percent) experienced at least two dry summers and 1,073 catalog units (51 percent) experienced at least three dry summers in the after-CWA time-block. Of the catalog units that remained eligible for the comparison analysis (335 were screened out), low flow conditions persisted for an average of 2.7 years.

Step 3—Calculation of the Worst-Case DO Summary Statistic Rules

The calculation of the worst-case DO summary statistic step incorporated the following rules:

• For each water quality station, the tenth percentile of the DO data distribution from the before-CWA time-block (July through September, 1961-1965) and the tenth percentile of the DO data distribution from the after-CWA time-block (July through September, 1986-1990) will be used as the DO worst-case statistic for the comparison analysis.

• To remain eligible for the before- and after-CWA comparison, a station must have a minimum of eight DO measurements within each of the 5-year time-blocks.

Typically, the mean or median statistic is used to summarize a distribution of data because it describes the central tendency of the distribution. In this study, however, the emphasis is on worst-case (low) DO. Consequently, a summary statistic describing the lowest DO measurements of the data distribution was needed because these data would inherently carry a sharper point source discharge/downstream water quality signal. In other words, the objective was to characterize the worst of the DO data collected under the worst-case physical conditions (high temperature and low flow).

Because simply choosing the minimum measurement might introduce anomalous results, the tenth percentile, a more robust statistic (i.e., one that conveys information under a variety of conditions and is not overly influenced by data values at the extremes of the data distribution) was selected as the appropriate summary statistic to characterize the worst DO of a station's range of DO measurements within a time-

block. An example of how one might interpret a tenth percentile value for a station is to say that 90 percent of the values collected at that station were higher than the tenth percentile value. To minimize statistical errors associated with calculating extreme percentiles, the requirement was added that a station must have a minimum of eight observations within each 5-year time-block to remain eligible for the before- and after-CWA comparison.

Step 4—Spatial Assessment Rules

The spatial assessment step incorporated one screening rule:

• Only water quality stations on portions of streams and rivers affected by point sources will be included in the before- and after-CWA comparison analysis; stations influenced only by nonpoint sources are excluded from the analysis.

The objective was to develop before- and after-CWA data sets that contain data that inherently contain a response signal linking point source discharges with downstream water quality. Consequently, a screening rule reflecting the need to ensure that DO data came from stations located downstream, rather than upstream, from point sources was required. As noted in Section A of this chapter, the distance downstream was not relevant for the screening rule; the only requirement was that the station was somewhere in the downstream network.

Although the focus of this book is on effluent loading from POTWs, changes in DO are tied to industrial discharges as well. Estimates of current (ca. 1995) BOD5 loading using the NWPCAM indicate that industrial loads are the dominant component of total point and nonpoint source loading in many catalog units associated with major urban-industrial areas (see Section E in Chapter 2). For this reason, and because of the fact that it is not always possible to satisfactorily distinguish between industrial and POTW outfalls because of their close proximity in many areas, this leg of the study defines "point source discharges" to include both industrial and municipal dischargers.

The upstream/downstream relationship between point source discharges and water quality monitoring stations was established using USEPA's Reach File, Version 1 (RF1). RF1 is a computerized network of 64,902 river reaches in the 48 contiguous states, covering 632,552 miles of streams (see Figure 1-2). Using this system, one can traverse stream networks and establish relative positions along the river basin network of both free-flowing and tidally influenced rivers.

A list of point source dischargers was developed from USEPA's Permit Compliance System (PCS), Clean Water Needs Survey (CWNS), and Industrial Facilities Discharge File (IFD). Spatially integrating the dischargers with RF1 resulted in identifying 12,476 reaches that are downstream of point source dischargers (Figure 3-7) (Bondelid et al., 2000). These reaches, in turn, reside in 1,666 out of a total of 2,111 catalog units in the contiguous United States.

Figure 3-7 Reach File version 1 stream reach network of the 48 contiguous states with point source inputs discharging to a reach.

Example Application of the Screening Rules on DO Data from a Single Water Quality Monitoring Station Figure 3-8 illustrates how the above screening rules were applied to monitoring station data to obtain worst-case DO data for the before- and after-CWA comparison analysis. A station located on the Upper Mississippi River at Lock and Dam No. 2 at Hastings, Minnesota, is used in this example. Figure 3-8a displays a time series of the entire historical record (225 observations) of raw ambient DO measurements for the station from 1957 to 1997. Note that DO concentrations fluctuate from close to zero to slightly over 15 mg/L. The apparent noise (rapid up and down movement of the DO line) is due to many factors, including annual seasonal changes in streamflow and water temperature. Long-term interannual changes, on the other hand, might be due to persistent dry or wet weather or to changes in pollutant loading from the St. Paul METRO wastewater facility.

In the data selection step, the study authors extracted from the raw data set surface measurements collected at the station during the summer season (52 observations). Then, in the data aggregation step, they grouped the data in 5-year time-blocks and focused in on the data from the before- and after-CWA persistent dry weather timeblocks of 1961-1965 (10 observations) and 1986-1990 (15 observations). Because (1) the catalog unit in which the station resides had at least one dry year (streamflow ratio < 0.75) in each of the before- and after-CWA time-blocks [streamflow ratios: 1961 (0.31); 1964 (0.65); 1987 (0.59); 1988 (0.22); and 1989 (0.40)] and (2) the number of observations for each grouping was confirmed to be greater than eight, the groupings remained eligible for the next phase.

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 STORET7TYPA/AMBNT/STREAM
0 0

Post a comment