CADDIS Volume 4: Data Analysis

## Predicting Environmental Conditions from Biological Observations (PECBO) Appendix

- Introduction

- Using Existing Taxon-Environment

Relationships - Estimating Taxon-Environment

Relationships - Computing Inferences

- R Scripts

##### Topics in R Scripts

### Non-Parametric Regressions: R Script

Nonparametric regressions (Equation 8)
can be computed with a set of commands similar to those of
parametric regressions. In this case,
generalized additive models (`gam`

) are used to fit nonparametric curves
to the data.

First, install the `gam`

library
into R. Type at the R prompt:

install.packages("gam")

You will then need to select a mirror site from the provided list, and the package should install automatically.

Next make sure that you have loaded
the sample biological and environmental data and merged them into a single
data frame called `dfmerge`

.

Also make sure that you have selected the taxa for which you wish to calculate environmental limits and saved them in the
vector `taxa.names`

(information on defining `taxa.names`

).

# Load GAM library library(gam) modlist.gam <- as.list(rep(NA, times = length(taxa.names))) for (i in 1:length(taxa.names)) { # Create a logical vector is true if taxon is # present and false if taxon is absent. resp <- dfmerge[, taxa.names[i]] > 0 # Fit the regression model, specifying two degrees of freedom # to the curve fit. modlist.gam[[i]] <- gam(resp ~ s(temp, df = 2), data = dfmerge, family = "binomial") print(summary(modlist.gam[[i]])) }

To plot model results (similar to those shown in Figure 7) run the following script.

# Specify 3 plots per page par(mfrow = c(1,3), pty = "s") for (i in 1:length(taxa.names)) { # Compute mean predicted probability of occurrence # and standard errors about this predicted probability. predres <- predict(modlist.gam[[i]], type= "link", se.fit = T) # Compute approximate upper and lower 90% confidence limits up.bound.link <- predres$fit + 1.65*predres$se.fit low.bound.link <- predres$fit - 1.65*predres$se.fit mean.resp.link <- predres$fit # Convert from logit transformed values to probability. up.bound <- exp(up.bound.link)/(1+exp(up.bound.link)) low.bound <- exp(low.bound.link)/(1+exp(low.bound.link)) mean.resp <- exp(mean.resp.link)/(1+exp(mean.resp.link)) # Sort the environmental variable. iord <- order(dfmerge$temp) # Define bins to summarize observational data as # probabilities of occurrence nbin <- 20 # Define bin boundaries so each bin has approximately the same # number of observations. cutp <- quantile(dfmerge$temp, probs = seq(from = 0, to = 1, length = 20)) # Compute the midpoint of each bin cutm <- 0.5*(cutp[-1] + cutp[-nbin]) # Assign a factor to each bin cutf <- cut(dfmerge$temp, cutp, include.lowest = T) # Compute the mean of the presence/absence data within each bin. vals <- tapply(dfmerge[, taxa.names[i]] > 0, cutf, mean) # Plot binned observational data as symbols. plot(cutm, vals, xlab = "Temperature", ylab = "Probability of occurrence", ylim = c(0,1), main = taxa.names[i]) # Plot mean fit as a solid line lines(dfmerge$temp[iord], mean.resp[iord]) # Plot confidence limits as dotted lines. lines(dfmerge$temp[iord], up.bound[iord], lty = "dotted") lines(dfmerge$temp[iord], low.bound[iord], lty = "dotted") }