Matching strings
Regular expressions
{stringr}
use regular expressions to match characters in strings. This allows us to find, extract, and replace strings.
Here are a few metacharacters for regular expressions that help us generate pattern.s
Metacharacters | Meaning |
---|---|
. |
Wildcard--any character |
^ |
Matches at beginning of string |
$ |
Matches at end of string |
| |
Matches one pattern or another |
() |
Matches character group |
\d |
Matches numerical digits |
[] |
Matches any characters inside brackets |
[^] |
Matches any characters not inside brackets |
[a-z] |
Matches any lower case letters |
[A-Z] |
Matches any upper case letters |
[A-Za-z] |
Matches any letters |
To practice with regular expressions, we can use the str_view_all()
function, which simply highlights patterns that match regular expressions passed to it. Let's work with the first 20 elements of the fruit
data set, which we'll call fruits
.
Let's first view the letter "a" in fruits
.
str_view_all(fruits, "a")
## Warning: `str_view_all()` was deprecated in stringr 1.5.0.
## ℹ Please use `str_view()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## [1] │ <a>pple
## [2] │ <a>pricot
## [3] │ <a>voc<a>do
## [4] │ b<a>n<a>n<a>
## [5] │ bell pepper
## [6] │ bilberry
## [7] │ bl<a>ckberry
## [8] │ bl<a>ckcurr<a>nt
## [9] │ blood or<a>nge
## [10] │ blueberry
## [11] │ boysenberry
## [12] │ bre<a>dfruit
## [13] │ c<a>n<a>ry melon
## [14] │ c<a>nt<a>loupe
## [15] │ cherimoy<a>
## [16] │ cherry
## [17] │ chili pepper
## [18] │ clementine
## [19] │ cloudberry
## [20] │ coconut
Strings that match the patterns are surrounded by <>
just to show us where the pattern is matched. Now let's view letter "a"s that have any other characters in front of them.
str_view_all(fruits, ".a")
## [1] │ apple
## [2] │ apricot
## [3] │ avo<ca>do
## [4] │ <ba><na><na>
## [5] │ bell pepper
## [6] │ bilberry
## [7] │ b<la>ckberry
## [8] │ b<la>ckcur<ra>nt
## [9] │ blood o<ra>nge
## [10] │ blueberry
## [11] │ boysenberry
## [12] │ br<ea>dfruit
## [13] │ <ca><na>ry melon
## [14] │ <ca>n<ta>loupe
## [15] │ cherimo<ya>
## [16] │ cherry
## [17] │ chili pepper
## [18] │ clementine
## [19] │ cloudberry
## [20] │ coconut
Notice apple and apricot are no longer marked in the view. Why not?
str_view_all()
to mark all vowels in fruits
.
str_view_all(fruits, ...)
str_view_all()
to mark all consonants in fruits
.
str_view_all(fruits, ...)
str_view_all()
to mark all fruits that end with "nut".
str_view_all(fruits, ...)
Detecting and extracting patterns
We can detect whether a pattern was present in each element of a vector with str_detect()
. Note this returns a logical vector, so we only get a TRUE/FALSE indication, but we can do our usual tricks with logical vectors.
str_detect()
to create a logical vector of fruits that include the pattern "berry".
str_detect(fruits, ...)
str_detect()
to determine how many fruits include the pattern "berry".
...
We can also pair str_detect()
with filter()
to return observations that match patterns in columns.
starwars
data set for the names that include a "-".
starwars |>
...
Often, instead of a logical vector detecting the presence of a pattern, we're interesting in extracting the elements of a vector that include a pattern. For this, we use str_subset()
.
fruits
.
str_subset(fruits, ...)
fruits
that have "black" or "blue" in them.
...
Replacing patterns
In addition to detecting and extracting patterns, we may want to replace them (e.g., to correct spelling mistakes). The str_replace()
function replaces a pattern the first time it appears within an element. It will replace across all elements but only once per element. The str_replace_all()
function will replace all instances across the whole vector.
str_replace(fruits, ...)
...(fruits, ...)
...(fruits, ...)
Wrap-up
Congratulations, you finished the tutorial!
To get credit for this assignment, replace my name with the first name that you submitted in the course introduction form in the code below and click Run Code to generate the text for you to submit to Canvas.
# replace my name below with your first name (surrounded by quotes)
first_name <- "Jeff"
generate_text(first_name)
Assignment complete!
Great! Copy that code into Canvas, and you're all set for this tutorial.