CADDIS Volume 4: Data Analysis

## Download Software

### R Command Line Tutorial

R is a free statistical software, providing access to a broad array of statistical tools. This page provides a brief outline of some commands that will help users begin to work with this software.

#### Workspace

Before beginning any computations, it is helpful to first set up a
working directory.
Using Windows Explorer (or any other comparable method), make a new
folder for storing your work. Then, after launching R, select
*File: Change dir...*

Navigate to the folder that you just created and select it. R will now store your working data in this directory. It also will automatically look in this directory for scripts and data that you wish to import. Example data will be used to demonstrate most of the scripts in this section. These data can be obtained from the sample data section and should be stored in your working directory.

#### Basic syntax

Variable names in R can be composed of combinations of letters,
numbers, underscores, and periods. They are case sensitive. *Note that in this and
all subsequent sections, R commands can be run by cutting and pasting
text directly into the R Console window.*

x, y, X, Y, flow.rate

To assign a value to a variable, use the assignment operator,
`<-`

.

x <- 1 # Assign a single value to the variable x x <- c(1,3,2) # Assign a vector of numbers to x x <- c(T,F,T) # Assign a vector of logical values to x x <- list(colors = c("red", "blue", "black"), numbers = c(1,3)) # Assign a list of dissimilar objects to x

The value of any variable can be examined by typing the variable name, or by using the print command:

x print(x)

Simple mathematical and statistical operations can be performed on different numerical vectors.

x + y # Addition x - y # Subtraction x * y # Multiplication x / y # Division mean(x) # Arithmetic mean var(x) # Variance sum(x) # The sum of all the elements of x

The most commonly used format for storing data is the data frame, which is a list of objects of the same length. Data frames allow one to combine logical, numerical, and factor data in a single data structure.

site.name <- c("A", "B", "C", "D") # A site label stored as a # factor pH <- c(7.6, 6.0, 4.0, 8.2) # Site pH stored as a # numerical vector abund.baetis <- c(103, 204, 602, 301) # Baetis abundance stored as # a numerical vector sampled.spring <- c(T, T, F, T) # Sampling season stored as a # logical vector

all.data <- data.frame(site.name, pH, abund.baetis, sampled.spring) # All data combined together # as a data frame

Elements of a vector can be referred to in various ways.

x[1] # The first element of the vector x x[1:3] # The first three elements of vector x x[c(T,T,F)] # The first two elements of x (assuming that x # has three elements) x[-1] # All of x except for the first element

We can also refer to different subsets of a data frame in various ways.

all.data$pH # The element labeled "pH" from the data frame # all.data all.data[, "pH"] # The same column labeled "pH" all.data[, 2] # The second column of the data.frame all.data[1,] # The first row of the data.frame

Within R, you can access help pages on a particular command by typing,

help(<command name>)

For example:

help(glm) help(mean)