CADDIS Volume 4: Data Analysis

## Exploratory Data Analysis

##### Topics in Scatterplots

Examples

Stressor-Response from the Case

### Scatterplots

Scatterplots are graphical displays of matched data plotted with one variable on the horizontal axis and the other variable on the vertical axis. Data are usually plotted with measures of an influential parameter on the horizontal axis (independent variable) and measures of an attribute that may respond to the influential parameter on the vertical axis (dependent variable). Scatterplots are a useful first step in any analysis because they help visualize relationships and identify possible issues (e.g., outliers) that can influence subsequent statistical analyses.

#### Data issues that can be revealed by scatterplots

Different characteristics of a particular data set are readily apparent from scatterplots. Relationships between variables may be nonlinear (Figure 1a) or the variance about the mean relationship may not be constant (Figure 1b). In both of these cases, simple linear regression may not be appropriate, so identifying these features early can help one select more appropriate analytical techniques. For nonlinear relationships, a different functional form (e.g., quadratic) may be appropriate, and when variances are not constant, one might opt for quantile regression or generalized linear models.

Figure 1. (a) Capture probability of the caddisfly Calineuria plotted versus stream temperature. Each open circle shows the capture probability estimate from approximately 20 samples with an average temperature as plotted. Line shows a nonparametric regression fit to the data. (b) Mayfly richness versus percent substrate sand/fines. Variance in observed richness decreases with increased sediment.

#### How can I use scatterplots in causal analysis?

Scatterplots are useful primarily for data exploration, but can inform evaluations of stressor-response from the case. An example of this latter application is available here.