CADDIS Volume 4: Data Analysis
Exploratory Data Analysis
- What is EDA?
- Mapping Data
Topics in Scatterplots
- Data issues that can be revealed by scatterplots
- How can I use scatterplots in causal analysis?
- More information
Scatterplots are graphical displays of matched data plotted with one variable on the horizontal axis and the other variable on the vertical axis. Data are usually plotted with measures of an influential parameter on the horizontal axis (independent variable) and measures of an attribute that may respond to the influential parameter on the vertical axis (dependent variable). Scatterplots are a useful first step in any analysis because they help visualize relationships and identify possible issues (e.g., outliers) that can influence subsequent statistical analyses.
Different characteristics of a particular data set are readily apparent from scatterplots. Relationships between variables may be nonlinear (Figure 1a) or the variance about the mean relationship may not be constant (Figure 1b). In both of these cases, simple linear regression may not be appropriate, so identifying these features early can help one select more appropriate analytical techniques. For nonlinear relationships, a different functional form (e.g., quadratic) may be appropriate, and when variances are not constant, one might opt for quantile regression or generalized linear models.
Scatterplots are useful primarily for data exploration, but can inform evaluations of stressor-response from the case. An example of this latter application is available here.
A set of scatter plots showing pairwise relationships between several variables can be conveniently displayed as scatterplot matrix (Figure 2).
One limitation of scatterplots is that one can only examine relationships between two variables. In cases in which many different variables interact, multivariate approaches for exploring data may provide greater insights.