CADDIS Volume 4: Data Analysis

## Predicting Environmental Conditions from Biological Observations (PECBO) Appendix

- Introduction

- Using Existing Taxon-Environment

Relationships - Estimating Taxon-Environment

Relationships - Computing Inferences

- R Scripts

##### Topics in Estimating Taxon-Environment Relationships

### Non-Parametric Regressions

Many researchers have noted that unimodal relationships are unlikely for all taxa across all gradients (Austin and Meyers 1996, Oksanen and Minchin 2002). To address this issue, a modeling approach that requires only that the modeled function vary smoothly and slowly over the modeled range is often used. Here, the distribution of a given taxon is modeled as follows:

where *p* is the probability of occurrence of the taxon of interest, *s0* is a constant, and *s* represents a nonparametric smooth curve that is fit through the data.

The locations of the mean responses for each point along a nonparametric curve, *s*, are determined through an iterative procedure that uses data in a local neighborhood around each point. The *local* nature of the fit differs fundamentally from that of a parametric model, which computes a best fit for the entire set of data. Thus, nonparametric responses can capture smaller scale variations in response (Figure 7). Near an edge of the domain, though, less data are available on one side of the point of interest, and increasing amounts of data must be drawn from within the sampled range. Therefore, the width of the neighborhood broadens at the extremes of the environmental gradient, and the fit is less local than in the center of the domain (Hastie and Tibshirani 1999).

A commonly used method for fitting nonparametric curves is known as the generalized additive model (GAM) (Hastie and Tibshirani 1999). This method allows for more than one explanatory variable, each associated with its own nonparametric smooth curve.

The flexibility of nonparametric regressions also complicates this method's use because there are no parameters with which the modeled relationship can be represented. Instead, a numerical representation of the entire curve must be stored for further analysis (e.g., to compute inferences).

As with parametric regressions, several variables can be modeled simultaneously with non-parametric regressions. Each additional variable can be treated as independent, additive smooth functions, or combinations of variables can be smoothed simultaneously, which allows for interactions between variables (Figure 8).

Statistical scripts for computing single variable nonparametric regressions are available under the R Scripts tab of this section.

#### Biological inference

Once taxon-environment relationships have been estimated using parametric regression, the most appropriate method to use for computing inferences is a maximum likelihood approach.

#### References

**Estimating Taxon-Environment Relationships:** Overview Previous Next