Skip to page navigation menu Skip entire header
Brown University
Skip 14 subheader links

Center for Digital Scholarship

Regular Expressions Workshop

This page is at http://brown.edu/go/regex | PDF of handout

Challenges are in increasing difficuty — try them in order. See how far you get in the alotted time.

Part 0: Introduction (5 minutes)

  1. What are regular expressions?
  2. Examples of use:
    • text mining
      Find me all the names of characters in P&P mentioned by other characters
      “[^”]*M(s|rs?)\.(\s+[A-Z]\w+)+[^”]*”
    • data structure
      Find the mistake in my data structure
      (“type”)[^{}]+”type”
      (not a good regex, but good enough)

Part 1: Literals, symbols, and flags (10 minutes)

  1. Match all “color” and correct them to the Canadian spelling
  2. Put a double carriage return between paragraphs
  3. Find all Rhode Island telephone numbers
  4. Remove all 2015 entries
  5. Remove the second column of this spreadsheet
    (note that the slash / must be escaped in your regex, i.e. \/)

Part 2: Character classes, boundaries, and disjunction (10 minutes)

  1. Match all 4-letter words beginning with f (any case). Don’t use the i flag
  2. Match all 5-letter words beginning with a capital letter
  3. Remove all entries from Rhode Island or area code 497 in any other year than 2014

Part 3: Quantifiers (10 minutes)

  1. Match all capitalized words
  2. Get a list of names that start with “Mr.” or “Mrs.”
  3. Match all quotations — e.g. I said: “Hello”
  4. Match all quotations that mention Mr. Bingley

Part 4: Backreferences (10 minutes)

  1. Find all URLs and make them into HTML links
    An example: http://google.com becomes google.com
    Hints:
    • For the purpose of this exercise, URLs are of the form: http://aaa/bbb/ccc.ddd
    • An HTML link is of the form: <a href=”[URL]”>[text]</a>
  2. Regex crossword! (sort of) – how to play
    Beatles (easy) | Always remember (intermediate, includes quantifiers) | Yikes!