Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Advanced Analyses

Controlling for Natural Variability

In causal assessment, a test site is often compared to other sampled locations to determine whether the test site differs from expected values. However, even in the absence of any environmental stressors, stream communities vary across natural environmental gradients. This variation can occur longitudinally among reaches in a single stream drainage, spatially among stream drainages, and temporally within a single stream reach. Comparisons between test sites and other location are most informative when this natural variability is minimized or partitioned.

Sources of natural variability

Much of the longitudinal variation in community structure and function among reaches in a stream drainage can be related to some measure of stream size (Vannote et al. 1980). Measures of stream size can include watershed area (i.e., the catchment area above a sampling reach), stream order (Strahler 1957), distance from the source (i.e., linear distance along the stream channel from the most upstream permanent flow source to a sampling reach), stream width, stream depth, the product of stream width and depth, or average discharge. Often, watershed area is the simplest of these variables to measure.

The environmental characteristics of streams can also vary because of natural variation in regional landscapes. Such variation is often related to variation in landsurface form (i.e., physiography and geomorphology), surface geology and soils, potential natural vegetation or climate. Stream characteristics that can vary at this spatial scale include water chemistry variables, such as alkalinity, hardness, or salinity (i.e., if near to the estuarine zone); physical variables, such as temperature (i.e., particularly in relation to differences in elevation); and stream channel characteristics, such as stream slope and the dominant substrate particle size.

On a temporal scale, stream characteristics can vary with seasonal cycles or even daily cycles. Water temperature and discharge exhibit such seasonal cycles, as they vary in relation to climatic variables like air temperature and precipitation. Dissolved oxygen often exhibits a diurnal cycle with a maximum near the end of daylight and a minimum near the end of the night. Water temperature can also vary along with daily variation in air temperature. In addition to the seasonal variation, discharge can vary more stochastically in relation to individual precipitation events.

Methods for controlling for natural variability

Several approaches can be used to reduce the magnitude of natural variability in the variables of interest (e.g., a biological response or a stressor of interest). Any approach that reduces variability can be useful, but the best approach may differ from one data set to another.

Sampling Design

Particularly for variables that vary seasonally, data may be used from a limited period called the index period within which temporal variation is minimized. This can also be a period when the variables of interest are most extreme and more likely to affect the biotic community. More stochastic variation can be avoided by not sampling during periods of high discharge related to storm events, unless environmental conditions during the periods of high discharge are of interest. Automatic equipment or sondes may be used to record the range of variation in variables that vary diurnally.

Mid-Atlantic Ecoregions - source: U.S.EPA, 1998; Omernik, 1987
Figure 1. This map shows an example of ecoregions from the mid-Atlantic region of United States, as defined by Omernik (1987). Regions with similar topography, climate and geological substrate are expected to contain streams that are similar with respect to hydrological and chemical conditions. The physical and chemical similarities presumably account for similarities in biota, and similarities in the types and magnitudes of responses to stressors.

In classification, streams are grouped into ecologically similar classes, and analyses are then performed on these subsets of the data. When classification is effective, the influence of natural variables is reduced, so the differences remaining among locations and among variables are more likely the result of the stressor or stressors of interest. Classification may be based on directly relevant environmental factors, such as water temperature, that are relevant to the site or may adopt existing ecological classifications of streams or regions.

Several geographic classification systems (i.e., ecoregions) have been devised to identify regions that are thought to be relatively homogeneous with respect to aquatic ecosystems. Omernik (1987) delineates ecoregions based on land-surface form, potential natural vegetation, climate, soils, and predominant land use, while Bailey (1983) uses climate and potential natural vegetation. These ecoregions can be used to classify streams into groups in which expected conditions may be more homogeneous (Barbour et al. 1999, Omernik and Bailey 1997).

These classifications systems are hierarchical in that more homogeneous, smaller, higher-level ecoregion units (Level IV) are nested within consecutively larger, lower-level ecoregion units up to Level I. Until recently, Omernik’s Level III ecoregions were widely used in field research to group sites for the purpose of controlling natural variability, but Level IV ecoregions are now delineated for most of the United States. The choice of level depends on balancing two considerations. Smaller, higher-level ecoregions can be small enough that obtaining data from enough sites for analysis may be problematic, but the higher-level ecoregions may have significantly less variability, making it potentially easier to detect the effects of stressors. For example, the Little Miami River basin in Ohio is within the Level III, Eastern Cornbelt Plains ecoregion, but is divided among three Level IV ecoregions where the streams differ strongly in channel morphology and nutrient dynamics, because of their differing glacial histories (Daniel et al., 2009).

In other cases, Level III ecoregions may not differ significantly in the natural factors affecting stream structure and function [see analysis by Waite et al. (2000) of ecoregions in the mid-Atlantic Highlands, but also compare their results with those of van Sickle and Hughes (2000), McCormick et al. (2000), Rabeni and Doisy (2000), and Feminella (2000)]. In some of these cases, Level III ecoregions might be lumped, or one may consider using Level II ecoregions. Because Level II ecoregions are geographically larger, obtaining data from enough sites for analysis may be easier.

Within a region, streams may be further classified according to other sources of natural variation. Such factors could include temperature regime, stream size and drainage area, stream slope or gradient, the presence of wetlands, geomorphology, or watershed land-use. Such classification can be defined using statistical models to identify discontinuities in the values of environmental variables (e.g, classification and regression tree analysis). Standard criteria may also be used. A maximum mean monthly temperature of 20°C is often used to separate coldwater from warmwater streams. The State of Ohio uses watershed size-based classes, headwater streams (< 52 km2), wadeable streams (52 km2 to approximately 520 km2), and boatable streams (> approximately 520 km2) (Ohio EPA 1987), while U.S. EPA’s Environmental Monitoring and Assessment (EMAP) identified wadeable streams, non-wadeable streams, and great rivers (i.e., the mainstems of major rivers, such as the Ohio, Mississippi, and Missouri) (Flotemersch et al. 2006). When biogeographic data describing the distribution of aquatic species are available, they may be used to delineate ecoregions based on factors that are most relevant to freshwater species (Abell et al. 2008).

Statistical models of natural variability

Regression models can be used to estimate the variability of a variable (e.g., a biological response or a stressor of interest) that is associated with natural factors. For example, observed fish species richness could be modeled as a function of watershed area. Then, the estimated regression relationship can be used to predict conditions at other test sites. This approach is particularly useful when one wishes to estimate expected conditions at a test site from conditions observed at a set of regional reference sites. Prediction intervals on the regression predictions then provide a range of conditions that are consistent with reference expectations.

Top of page

Jump to main content.