Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Predicting Environmental Conditions from Biological Observations (PECBO) Appendix

Non-Parametric Regressions: R Script

Nonparametric regressions (Equation 8) can be computed with a set of commands similar to those of parametric regressions. In this case, generalized additive models (gam) are used to fit nonparametric curves to the data.

First, install the gam library into R. Type at the R prompt:

install.packages("gam")

You will then need to select a mirror site from the provided list, and the package should install automatically.

Next make sure that you have loaded the sample biological and environmental data and merged them into a single data frame called dfmerge.

Also make sure that you have selected the taxa for which you wish to calculate environmental limits and saved them in the vector taxa.names (information on defining taxa.names).

# Load GAM library
library(gam)
modlist.gam <- as.list(rep(NA, times = length(taxa.names)))
for (i in 1:length(taxa.names)) {
  # Create a logical vector is true if taxon is
  #   present and false if taxon is absent.
  resp <- dfmerge[, taxa.names[i]] > 0

  # Fit the regression model, specifying two degrees of freedom
  # to the curve fit.
  modlist.gam[[i]] <- gam(resp ~ s(temp, df = 2), data = dfmerge,
			  family = "binomial")

  print(summary(modlist.gam[[i]]))
}

To plot model results (similar to those shown in Figure 7) run the following script.


# Specify 3 plots per page
par(mfrow = c(1,3), pty = "s")
for (i in 1:length(taxa.names)) {

  # Compute mean predicted probability of occurrence
  # and standard errors about this predicted probability.
  predres <- predict(modlist.gam[[i]], type= "link", se.fit = T)

  # Compute approximate upper and lower 90% confidence limits
  up.bound.link <- predres$fit + 1.65*predres$se.fit
  low.bound.link <- predres$fit - 1.65*predres$se.fit
  mean.resp.link <- predres$fit

  # Convert from logit transformed values to probability.
  up.bound <- exp(up.bound.link)/(1+exp(up.bound.link))
  low.bound <- exp(low.bound.link)/(1+exp(low.bound.link))
  mean.resp <- exp(mean.resp.link)/(1+exp(mean.resp.link))

  # Sort the environmental variable.
  iord <- order(dfmerge$temp)

  # Define bins to summarize observational data as
  # probabilities of occurrence
  nbin <- 20

  # Define bin boundaries so each bin has approximately the same
  # number of observations.
  cutp <- quantile(dfmerge$temp,
                   probs = seq(from = 0, to = 1, length = 20))

  # Compute the midpoint of each bin
  cutm <- 0.5*(cutp[-1] + cutp[-nbin])

  # Assign a factor to each bin
  cutf <- cut(dfmerge$temp, cutp, include.lowest = T)

  # Compute the mean of the presence/absence data within each bin.
  vals <- tapply(dfmerge[, taxa.names[i]] > 0, cutf, mean)

  # Plot binned observational data as symbols.
  plot(cutm, vals, xlab = "Temperature",
       ylab = "Probability of occurrence", ylim = c(0,1),
       main = taxa.names[i])
  # Plot mean fit as a solid line
  lines(dfmerge$temp[iord], mean.resp[iord])
 
  # Plot confidence limits as dotted lines.
  lines(dfmerge$temp[iord], up.bound[iord], lty = "dotted")
  lines(dfmerge$temp[iord], low.bound[iord], lty = "dotted")
}

Top of page


R Scripts:   Overview    Previous    Next

Jump to main content.