Skip to Tutorial Content

Data wrangling

Setup

Here, we're focusing on data wrangling using the core functions from {dplyr}. To start, let's load the {dplyr} package along with the {palmerpenguins} package to use the penguins data set.

library(dplyr)
library(palmerpenguins)
head(penguins)

Now we'll start working with columns for this lesson.

Selecting columns

First, we'll work with columns of our data frames and start with selecting a subsample of them. Often data frames have many columns that we don't want or simply don't want to work with. We use the select() function to build data frames with a subset of columns.

For the penguins data set, select only the columns for species, body mass, sex, and year using :. Replace the ... in the code below with the proper arguments.

select(penguins, ...)
Now just exclude the island column.
select(penguins, ...)
Using a {dplyr} helper function, select only columns that ends in _mm.
select(penguins, ...)
Using a {dplyr} helper function, select only columns that include the string length.
select(penguins, ...)

Moving columns

There are several ways to move columns in a data frame. select() works fine when there are a few columns, but relocate() is better when there are many columns.

Here's list of the column names:

names(penguins)
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"
Use select() and a helper function to move the year after the island column.
select(penguins, ...)
Now use relocate() to do the same thing.
relocate(penguins, ...)

Renaming columns

Like moving columns, there are multiple ways to rename them. Our trusty select() can do this, but if you're renaming just a few columns, you have to include all of them. The rename() function keeps the rest of the columns intact while only renaming the columns listed.

names(penguins)
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"
Use select() to keep the data frame intact but rename body_mass_g to body_mass.
select(penguins, ...)
Now use rename()` to do the same thing.
rename(penguins, ...)

Wrap-up

Congratulations, you finished the tutorial!

To get credit for this assignment, replace my name with the first name that you submitted in the course introduction form in the code below and click Run Code to generate the text for you to submit to Canvas.

# replace my name below with your first name (surrounded by quotes)
first_name <- "Jeff"
generate_text(first_name)

Assignment complete!

Great! Copy that code into Canvas, and you're all set for this tutorial.

Selecting data