Skip to Tutorial Content

Matching strings

Regular expressions

{stringr} use regular expressions to match characters in strings. This allows us to find, extract, and replace strings.

Here are a few metacharacters for regular expressions that help us generate pattern.s

Metacharacters Meaning
. Wildcard--any character
^ Matches at beginning of string
$ Matches at end of string
| Matches one pattern or another
() Matches character group
\d Matches numerical digits
[] Matches any characters inside brackets
[^] Matches any characters not inside brackets
[a-z] Matches any lower case letters
[A-Z] Matches any upper case letters
[A-Za-z] Matches any letters

To practice with regular expressions, we can use the str_view_all() function, which simply highlights patterns that match regular expressions passed to it. Let's work with the first 20 elements of the fruit data set, which we'll call fruits.

Let's first view the letter "a" in fruits.

str_view_all(fruits, "a")
## Warning: `str_view_all()` was deprecated in stringr 1.5.0.
## ℹ Please use `str_view()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
##  [1] │ <a>pple
##  [2] │ <a>pricot
##  [3] │ <a>voc<a>do
##  [4] │ b<a>n<a>n<a>
##  [5] │ bell pepper
##  [6] │ bilberry
##  [7] │ bl<a>ckberry
##  [8] │ bl<a>ckcurr<a>nt
##  [9] │ blood or<a>nge
## [10] │ blueberry
## [11] │ boysenberry
## [12] │ bre<a>dfruit
## [13] │ c<a>n<a>ry melon
## [14] │ c<a>nt<a>loupe
## [15] │ cherimoy<a>
## [16] │ cherry
## [17] │ chili pepper
## [18] │ clementine
## [19] │ cloudberry
## [20] │ coconut

Strings that match the patterns are surrounded by <> just to show us where the pattern is matched. Now let's view letter "a"s that have any other characters in front of them.

str_view_all(fruits, ".a")
##  [1] │ apple
##  [2] │ apricot
##  [3] │ avo<ca>do
##  [4] │ <ba><na><na>
##  [5] │ bell pepper
##  [6] │ bilberry
##  [7] │ b<la>ckberry
##  [8] │ b<la>ckcur<ra>nt
##  [9] │ blood o<ra>nge
## [10] │ blueberry
## [11] │ boysenberry
## [12] │ br<ea>dfruit
## [13] │ <ca><na>ry melon
## [14] │ <ca>n<ta>loupe
## [15] │ cherimo<ya>
## [16] │ cherry
## [17] │ chili pepper
## [18] │ clementine
## [19] │ cloudberry
## [20] │ coconut

Notice apple and apricot are no longer marked in the view. Why not?

Use str_view_all() to mark all vowels in fruits.
str_view_all(fruits, ...)

Use str_view_all() to mark all consonants in fruits.
str_view_all(fruits, ...)

Use str_view_all() to mark all fruits that end with "nut".
str_view_all(fruits, ...)

Detecting and extracting patterns

We can detect whether a pattern was present in each element of a vector with str_detect(). Note this returns a logical vector, so we only get a TRUE/FALSE indication, but we can do our usual tricks with logical vectors.

Use str_detect() to create a logical vector of fruits that include the pattern "berry".
str_detect(fruits, ...)
Use str_detect() to determine how many fruits include the pattern "berry".
...

We can also pair str_detect() with filter() to return observations that match patterns in columns.

Return the observations from the starwars data set for the names that include a "-".
starwars |> 
  ...

Often, instead of a logical vector detecting the presence of a pattern, we're interesting in extracting the elements of a vector that include a pattern. For this, we use str_subset().

Return the berries from fruits.
str_subset(fruits, ...)
Return the fruits that have "black" or "blue" in them.
...

Replacing patterns

In addition to detecting and extracting patterns, we may want to replace them (e.g., to correct spelling mistakes). The str_replace() function replaces a pattern the first time it appears within an element. It will replace across all elements but only once per element. The str_replace_all() function will replace all instances across the whole vector.

Make bell pepper and chili pepper one word (bellpepper and chilipepper) while leaving blood orange and canary melon as two words.
str_replace(fruits, ...)

Replace just the first time "an" shows up in a string with "am".
...(fruits, ...)
Replace any instance of "an" with "am".
...(fruits, ...)

Wrap-up

Congratulations, you finished the tutorial!

To get credit for this assignment, replace my name with the first name that you submitted in the course introduction form in the code below and click Run Code to generate the text for you to submit to Canvas.

# replace my name below with your first name (surrounded by quotes)
first_name <- "Jeff"
generate_text(first_name)

Assignment complete!

Great! Copy that code into Canvas, and you're all set for this tutorial.

Matching patterns