Skip to Tutorial Content

Data validation

Validating that your data are as you expect is a critical step before processing them. Minimally, you should inspect your data sets with str() or glimpse() or look at them in the Environment tab. You can build validation rules to check your data with assertions and testing.

Summarizing data

There are many different ways to summarize data in R. head() is a way to see the first few rows of a data set using base R, whereas glimpse() is the tidyverse way. Try both head() and glimpse() on the airquality data set (built into base R).

Quiz

Now run the summary() function on the airquality data.
summary(airquality)

Quiz

Validating with {dataReporter}

The {dataReporter} package provides a nice way of generating a data dictionary (they call them codebooks) while giving you a nice overview of your data. Install and load {dataReporter}. Run makeCodebook() on toyData to explore this data set.

Quiz

Wrap-up

Congratulations, you finished the tutorial!

To get credit for this assignment, replace my name with the first name that you submitted in the course introduction form in the code below and click Run Code to generate the text for you to submit to Canvas.

# replace my name below with your first name (surrounded by quotes)
first_name <- "Jeff"
generate_text(first_name)

Assignment complete!

Great! Copy that code into Canvas, and you're all set for this tutorial.

Validating data