Very common and convenient data structures
Used to store tables:
| Age | PhD | GPA |
---|---|---|---|
Alice | 25 | TRUE | 3.6 |
Bob | 24 | TRUE | 3.4 |
Carol | 21 | FALSE | 3.8 |
An R data frame is a list of equal length vectors
df <- data.frame(age = c(25L,24L,21L), # Warning: df is an
PhD = c( T , T , F ), # R function
GPA = c(3.6,2.4,2.8))
print(df)
typeof(df)
class(df)
Since data frames are lists, we can use list indexing
Can also use matrix indexing (more convenient)
print(df[2,'age'])
print(df[2,])
print(df$GPA)
nrow(df)*ncol(df)
list functions apply as usual
matrix functions are also interpreted intuitively
Useful functions are:
rownames(df) <- c("Alice", "Bob", "Carol")
df[4,1] <- 30L; print(df)
Many R datasets are data frames
library("datasets")
class(mtcars)
print(head(mtcars)) # Print part of a large object
Tibbles are essentially dataframes, with some convenience features
Interact will with the tidyverse package (later)
library(tidyverse)
t_mtcars <- as_tibble(mtcars)
class(t_mtcars)
Tibbles print more nicely that dataframes (but see RStudio's View())
print(t_mtcars)
You can reference columns of a tibble as you create it
sin_tb <- tibble(x=seq(-5,5,.1), y=sin(x));
print(sin_tb)
Categorical variables that take on a finite number of values
student/staff/faculty
A/B/C/F
Useful when variable can take a fixed set of values (unlike character strings)
R implements these internally as integer vectors
Has two attributes to distinguish from regular integers:
levels()
specifies possible values the factor can take
c("male", "female")
class = factor
tells R to check for violations
# Character vector for 4 students
grades_bad <- c("a", "a", "b", "f")
# Factor vector for 4 students
grades <- factor(c("a", "a", "b", "f"))
print(grades);
typeof(grades)
class(grades)
levels(grades) # Not quite what we wanted!
grades <- factor(c("a", "a", "b", "f"))
str(grades)
grades[2] <- "c"
str(grades)
grades <- factor(c("a","a","b","a","f"),
levels = c("a","b","c","f"))
str(grades)
table(grades) # table also works with other data-types
Factors can be ordered:
grades <- factor(c("a","a","b","f"),
levels = c("f","c","b","a"),
ordered = TRUE )
grades
grades[1] > grades[3]
gl()
: Generate factors levels
Usage (from the R documentation):
gl(n, k, length = n * k, labels = seq_len(n),
ordered = FALSE )
Look at the examples there:
# First control, then treatment:
gl(2, 8, labels = c("Control", "Treat"))
gl(2, 1, 20) # 20 alternating 1s and 2s
gl(2, 2, 20) # alternating pairs of 1s and 2s