Combining data elements
Data types refer to individual elements of information. Those elements combine into different data structures. Technically, all elements of information (even individual ones) in R are vectors, and there are different forms of vectors.
![](images/data-structures.png)
Atomic vectors are homogeneous meaning that they contain a single data type. Lists are heterogenous meaning that they can contain multiple data types. We will refer to one dimensional atomic vectors as vectors. In terms of lists, we will primarily work with data frames or rectangular lists. Tibbles are special forms of data frames.
Vectors
Vectors can include numeric, character, or logical data, but they can only contain a single data type. The simplest way to create a vector is by using the c()
function.
cast
that includes the words Kenan, Punkie, and Molly in that order.
cast <- c("Kenan", "Punkie", "Molly")
Combine the vectors weekdays
and weekend
to create a new vector called week
that starts with Monday.
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
weekend <- c("Saturday", "Sunday")
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
weekend <- c("Saturday", "Sunday")
week <- c(weekdays, weekend)
Sequences
We can create sequences of numbers with seq()
or :
. To get quartiles, run seq(from = 0, to = 100, by = 25)
. To get 1, 2, 3, run 1:3
. Give these examples a try.
seq(from = 0, to = 1, by = 0.05)
seq()
(include all argument names) then using :
. Replace the _
with your answers.
seq(from = _, to = _, by = _)
_:_
seq(from = 10, to = 0, by = -1)
10:0
Repetitions
Sometimes, you need to create a repetition of values, e.g., when creating a column of experimental conditions. You can use the rep()
function to either repeat single values or vectors of values. Vectors can be repeated either as a whole vector (times
argument) or each element of the vector can be repeated (each
argument).
conditions <- c("Control", "Treatment A", "Treatment B")
rep(conditions, times = 3)
## [1] "Control" "Treatment A" "Treatment B" "Control" "Treatment A"
## [6] "Treatment B" "Control" "Treatment A" "Treatment B"
rep(conditions, each = 3)
## [1] "Control" "Control" "Control" "Treatment A" "Treatment A"
## [6] "Treatment A" "Treatment B" "Treatment B" "Treatment B"
Repeat the entire myvector
10 times.
myvector <- 1:5
myvector <- 1:5
rep(myvector, times = 10)
myvector
10 times.
myvector <- 1:5
myvector <- 1:5
rep(myvector, each = 10)
Dimensions
We can use length()
to find the length of a vector and dim()
to get the dimensions of a data frame. We can also use nrow()
and ncol()
to get the number of rows and columns (respectively) for data frames.
Indexing
Extracting subsets of vector or data frame elements involves using the index operator []
. For data frames, the first number represents the row and the second represents the column. For instance, mydf[2, 7]
extracts the value from the second row and the seventh column. To extract vectors, use sequences or vectors to subset multiple rows and/or columns: mydf[1:2, c(3, 4, 7)]
. Leave the row or column empty to select the entire row or column: mydf[, 2]
.
Here, we create a data frame.
mydf <- data.frame(matrix(1:25, ncol = 5))
names(mydf) <- letters[1:5]
mydf
Wrap-up
Congratulations, you finished the tutorial!
To get credit for this assignment, replace my name with the first name that you submitted in the course introduction form in the code below and click Run Code to generate the text for you to submit to Canvas.
# replace my name below with your first name (surrounded by quotes)
first_name <- "Jeff"
generate_text(first_name)
Assignment complete!
Great! Copy that code into Canvas, and you're all set for this tutorial.