CADDIS Volume 4: Data Analysis
Author: M.B. Griffith
Classification and Regression Tree (CART) Analysis
Classification and regression tree (CART) analysis recursively partitions observations in matched data set, consisting of a categorical (for classification trees) or continuous (for regression trees) dependent (response) variable and one or more independent (explanatory) variables, into progressively smaller groups (De'ath and Fabricius 2000, Prasad et al. 2006). Each partition is a binary split based on a single independent variable. A typical output from these analyses is shown in Figure 1.
A classification and regression tree tool is available in CADStat, and other algorithms for different types of CART analyses are available in R. The CART algorithm available in CADStat can be used for either classification or regression trees, and handles mixed models containing both categorical and continuous variables. Guide , a freestanding (and free) tool for CART analysis, also is available (Chaudhuri and Loh 2002, Loh 2002).
CART analysis constructs a set of decision rules that identify homogeneous groups of the response variable as a function of a set of explanatory variables (Figure 1). During each recursion, splits for each explanatory variable are examined, and the split that maximizes the homogeneity of the two resulting groups with respect to the dependent variable is chosen. To avoid overfitting of the data, algorithms used in CART usually simplify or “prune” the tree that contains all possible splits of the data to an optimal tree that contains a sufficient number of splits to describe the data.
In general, CART analysis can be applied effectively to the causal analysis in three ways. When controlling for natural variability, CART analysis can be used in data exploration to classify systems that differ as a result of natural factors and to develop models that predict environmental conditions as a function of natural factors.
Technical details for CART analysis are available here.