CADDIS Volume 4: Data Analysis
Predicting Environmental Conditions from Biological Observations (PECBO) Appendix
- Using Existing Taxon-Environment
- Estimating Taxon-Environment
- Computing Inferences
- R Scripts
Topics in R Scripts
Weighted Average Inference: R Script
To compute a weighted average inference, we first need to compute central tendencies for all taxa using the regional EMAP-West data (
site.species) and then use those central tendencies to assess test sites in a data set collected from western Oregon (
site.species.or). Before beginning, make sure that you have downloaded both EMAP-West data and Oregon data, and merged environmental and biological data.
Next, identify and save the names of taxa that are found in both data sets.
# Compare taxa names in tolerance value and assessment data. # Make sure all taxa names are in capital letters only names.tv <- toupper(names(site.species)[-1]) names.assess <- toupper(names(site.species.or)[-1]) # Combine taxa names from both datasets in one vector # and then find taxanames that are repeated names.all <- c(names.tv, names.assess) names.match <- names.all[duplicated(names.all)] print("Taxa in both databases") print(sort(names.match))
To apply assessment tools, we need to compute central tendencies for as many taxa as possible. To do this, expand the list of taxa to include all taxa that occur in at least 20 sites in the EMAP-West data set. (The 20 site limit is imposed to avoid overfitting a model to a rare taxon.)
# Get names of all taxa in the data set taxa.names.init <- names(site.species)[-1] # Compute the number of occurrence of each taxon getocc <- function(x) sum(x>0) numocc <- apply(site.species[, taxa.names.init], 2, getocc) # Save all taxa names that occur in at least 20 sites taxa.names <- taxa.names.init[numocc >= 20]
Now, recompute central tendencies for the expanded list of taxa by running the central tendencies script again. Make sure you run the script for all taxon names identified above. Depending on the number of taxa selected, this may take some time.
Continuous tolerance values (e.g., weighted averages) can be classified into tolerance categories, but it is preferable to use them in conjunction with a mean tolerance value metric.
Mean tolerance values are the best metric to use in conjunction with continuous-valued tolerance values such as weighted averages or optima. The following script assumes that weighted averages have been computed for all taxa listed in
names.match. Other tolerance values can be substituted into the third line of code as desired.
# Only select taxa for which tolerance values # have been computed. mat1 <- as.matrix(dfmerge.or[, names.match]) # First get total abundance tot.abn <- apply(mat1, 1, sum) # Use matrix multiplication to compute the sum of all # observed tolerance values, and then divide by total # abundance to get the mean tolerance value. mean.tv <- (mat1 %*% WA[names.match])/tot.abn plot(dfmerge.or$temp, mean.tv, xlab = "Temperature", ylab = "Mean tolerance value")