Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Exploratory Data Analysis


Scatterplots are graphical displays of matched data plotted with one variable on the horizontal axis and the other variable on the vertical axis. Data are usually plotted with measures of an influential parameter on the horizontal axis (independent variable) and measures of an attribute that may respond to the influential parameter on the vertical axis (dependent variable). Scatterplots are a useful first step in any analysis because they help visualize relationships and identify possible issues (e.g., outliers) that can influence subsequent statistical analyses.

Data issues that can be revealed by scatterplots

Different characteristics of a particular data set are readily apparent from scatterplots. Relationships between variables may be nonlinear (Figure 1a) or the variance about the mean relationship may not be constant (Figure 1b). In both of these cases, simple linear regression may not be appropriate, so identifying these features early can help one select more appropriate analytical techniques. For nonlinear relationships, a different functional form (e.g., quadratic) may be appropriate, and when variances are not constant, one might opt for quantile regression or generalized linear models.

occurrence plot of calineuria
Figure 1. (a) Capture probability of the caddisfly Calineuria plotted versus stream temperature. Each open circle shows the capture probability estimate from approximately 20 samples with an average temperature as plotted. Line shows a nonparametric regression fit to the data. (b) Mayfly richness versus percent substrate sand/fines. Variance in observed richness decreases with increased sediment.

How can I use scatterplots in causal analysis?

Scatterplots are useful primarily for data exploration, but can inform evaluations of stressor-response from the case. An example of this latter application is available here.

More information

  • A set of scatter plots showing pairwise relationships between several variables can be conveniently displayed as scatterplot matrix (Figure 2).

    example of scatterplot matrix
    Figure 2. Scatterplot matrix showing relationships between log total nitrogen (log TN), log total phosphorus (log TP), percent substrate sand/fines (SED), and stream temperature in the western United States.

  • One limitation of scatterplots is that one can only examine relationships between two variables. In cases in which many different variables interact, multivariate approaches for exploring data may provide greater insights.

Top of Page

Jump to main content.