Regular Expressions Workshop
This page is at http://brown.edu/go/regex | PDF of handout
Challenges are in increasing difficuty — try them in order. See how far you get in the alotted time.
Part 0: Introduction (5 minutes)
- What are regular expressions?
- Examples of use:
- text mining
Find me all the names of characters in P&P mentioned by other characters
“[^”]*M(s|rs?)\.(\s+[A-Z]\w+)+[^”]*” - data structure
Find the mistake in my data structure
(“type”)[^{}]+”type”
(not a good regex, but good enough)
- text mining
Part 1: Literals, symbols, and flags (10 minutes)
- Match all “color” and correct them to the Canadian spelling
- Put a double carriage return between paragraphs
- Find all Rhode Island telephone numbers
- Remove all 2015 entries
- Remove the second column of this spreadsheet
(note that the slash / must be escaped in your regex, i.e. \/)
Part 2: Character classes, boundaries, and disjunction (10 minutes)
- Match all 4-letter words beginning with f (any case). Don’t use the i flag
- Match all 5-letter words beginning with a capital letter
- Remove all entries from Rhode Island or area code 497 in any other year than 2014
Part 3: Quantifiers (10 minutes)
- Match all capitalized words
- Get a list of names that start with “Mr.” or “Mrs.”
- Match all quotations — e.g. I said: “Hello”
- Match all quotations that mention Mr. Bingley
Part 4: Backreferences (10 minutes)
- Find all URLs and make them into HTML links
An example: http://google.com becomes google.com
Hints:- For the purpose of this exercise, URLs are of the form: http://aaa/bbb/ccc.ddd
- An HTML link is of the form: <a href=”[URL]”>[text]</a>
- Regex crossword! (sort of) – how to play
Beatles (easy) | Always remember (intermediate, includes quantifiers) | Yikes!