CADDIS Volume 4: Data Analysis
Topics in Tests of Significant Difference
Author: L.L. Yuan
Tests of Significant Difference
Tests of significant differences most commonly test whether the difference between two mean values is significantly different from zero (Snedecor and Cochran 1989). However, in causal assessment, a more common question is whether an observation at a test site deviates significantly from the range of conditions observed at a reference site. For example, we may wish to compare total phosphorus concentrations (TP) at a biologically-degraded test site with TP at one similar reference site with unimpaired biota to establish whether elevated TP co-occurs with the observed biological degradation (see page on spatial co-occurrence). Consider the case in which 5 measurements of TP collected at different times at the reference site range from 10 - 16 μg/L (Figure 1). At the test site, a single observation of TP = 19 μg/L is available. We would like to know the probability that the TP concentration observed at the test site would have been observed at the reference site. If this probability is low (e.g., less than 5%), then we could conclude that TP concentrations differ at the test site compared to the reference site.
A simple approach for estimating the probability of observing a particular value relative to a set of reference observations is to note that the reference observations define a range of possibilities. That is, N random reference observations at a site divide the range of possible values into N +1 intervals. Therefore, the probability that a subsequent observation is higher than the highest reference value is 1/(N +1). In the example described above, the probability of observing any observation greater than 16 is 1/(5+1), or 17%. More samples at the reference site would increase our ability to assert whether the observation at the test site could come from the reference site (i.e., that the test site is similar to the reference site).
With a larger number of reference samples, one might specify a threshold probability (e.g., p = 0.05) below which a test site value would be declared to be outside the range of reference conditions. Returning to our example, and considering the case in which 50 samples are available (Figure 2), one could state a priori that a test site TP greater than the 95th percentile of the distribution is unlikely to be part of the reference distribution, and therefore, the test site TP is different from reference. In this case, the 95th percentile is 17 μg/L, the observation of TP = 19 μg/L is greater than this threshold, and we declare that TP at the test site differs from reference expectations.
Parametric estimates of prediction intervals can provide more informative comparisons than non-parametric techniques, but they require larger sample sizes, and they require that we assume that the observed values are drawn from a particular statistical distribution (e.g., a normal distribution). TP values shown in Figure 2 appear to be normally distributed, and we calculate the sample mean value (Xmean ) as 13.6 and standard deviation (S) as 2.05. A quantile-quantile plot (not shown) confirms that values are nearly normally distributed. We can estimate the 95th percentile of the distribution using the following formula:
where Xt is the threshold associated with the 95th percentile of the distribution and t0.95 is the t-statistic for the 95th percentile with 50 degrees of freedom. In this example, the threshold is estimated as 17 μg/L, and our test site with TP = 19 μg/L is declared to be different from the reference distribution.
The discussion here has focused on comparing a single test site measurement with a set of samples collected at different times from a single reference site. The same types of tests could also be applied to a set of samples collected from multiple reference sites, if we were confident that the natural expectations at all of the references sites were identical. In practice, different streams are likely to vary in terms of natural expectations, and test site comparisons should take these natural variations into account. Quantile regression and prediction intervals from linear regression extend these ideas to situations in which one needs to control for the influence of one or more covarying factors
The number of samples available at the reference site often limits our ability to determine whether a test site differs from reference expectations. In cases in which the number of reference samples is low (e.g., < 5), pooling samples from other sites to estimate a within-site standard deviation may be necessary.
Statistical methods to explicitly formulate the null hypothesis in terms of a range of possible values are available and discussed in Kilgour et al. (1998).