Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Basic Analyses

CART Analysis

Author: M.B. Griffith

Classification and Regression Tree (CART) Analysis

Classification and regression tree (CART) analysis recursively partitions observations in matched data set, consisting of a categorical (for classification trees) or continuous (for regression trees) dependent (response) variable and one or more independent (explanatory) variables, into progressively smaller groups (De'ath and Fabricius 2000, Prasad et al. 2006). Each partition is a binary split based on a single independent variable. A typical output from these analyses is shown in Figure 1.

CART example
Figure 1. A tree diagram for relative abundance of lithophilous fish (i.e., fish that broadcast spawn on gravel beds) with respect to % sand and fines (% S&F, a measure of fine bedded sediment), and watershed area (WA). Branches are annotated showing the decision rules (e.g., % sand and fines < 22.3). Nodes are annotated showing the mean of the dependent variable (n = number of observations, x = mean value, MSE = mean squared error). Data set provided by the Minnesota Pollution Control Agency.

How do I run a CART analysis?

A classification and regression tree tool is available in CADStat, and other algorithms for different types of CART analyses are available in R. The CART algorithm available in CADStat can be used for either classification or regression trees, and handles mixed models containing both categorical and continuous variables. Guide Exit EPA Disclaimer, a freestanding (and free) tool for CART analysis, also is available (Chaudhuri and Loh 2002, Loh 2002).

What do CART results mean?

CART analysis constructs a set of decision rules that identify homogeneous groups of the response variable as a function of a set of explanatory variables (Figure 1). During each recursion, splits for each explanatory variable are examined, and the split that maximizes the homogeneity of the two resulting groups with respect to the dependent variable is chosen. To avoid overfitting of the data, algorithms used in CART usually simplify or “prune” the tree that contains all possible splits of the data to an optimal tree that contains a sufficient number of splits to describe the data.

How do I use classification and regression tree (CART) analysis in causal analysis?

In general, CART analysis can be applied effectively to the causal analysis in three ways. When controlling for natural variability, CART analysis can be used in data exploration to classify systems that differ as a result of natural factors and to develop models that predict environmental conditions as a function of natural factors.

CART analysis also can be used to help identify variables that may confound estimates of stressor-response relationships or used to inform efforts to describe stressor-response relationships.

More information

Technical details for CART analysis are available here.

Top of page

Jump to main content.