Regular expressions are a means of describing a set of strings. We will look at a subset of regular expressions which are used by vi to allow the searching of a file being edited and grep to allow the searching of files, however, many other Unix tools use regular expressions, including, for example, lex, sed, awk. Engineering students taking ECE 251 will more formally cover regular expressions.
The simplest regular expression is a string which contains only letters and numbers, such as net. This trivial regular expression matches the set of strings {net}.
Matching Words
If you want to match th, but only at the start
of a word, you can precede it with \&
Alternation
Suppose, however, we wish to match the words {net, Net}. To allow multiple characters in one location, we can use [ ... ], adding any characters we are interested in between the brackets: [Nn]et.
Other examples of alternation are n[aeiou]t which matches {nat, net, nit, not, nut} or [Nn][aeiou]t which matches {nat, net, nit, not, nut, Nat, Net, Nit, Not, Nut}.
As you can see, you can very quickly describe, with a small number of characters, a large possible collection of strings.
To match [ and ], escape the character with a \.
Exercise
Down-load the file hamlet.txt.
{ecelinux:1} grep be hamlet.txt {ecelinux:2} grep "[Bb]e" hamlet.txt {ecelinux:3} grep "\<[Bb]e\>" hamlet.txt {ecelinux:4} grep -n "\<[Bb]e\>" hamlet.txt
The first will match be anywhere, the second will match be or Be anywhere, while the third will match only the word be or Be, and not, for example, remember.
Next, edit the file in vi and
{ecelinux:5} gvim hamlet.txt
and search for /be. Press n and N an few times each. Next search for ?/[Bb]e. Again, press n and N a few times. Finally, search for /\>[Bb]e\<.
Ranges of Letters or Numbers
You can specify ranges of letters or numbers in [ and ] using the -, for example, [a-z]. Thus, [a-zA-Z0-9] searches for any letter or number. You are not restricted to ranges: [a-e1-5] matches any letter in {a, b, c, d, e, 1, 2, 3, 4, 5}.
If you want to add a - to the set of characters you're searching for, either escape it with a \ or add it at the front or end. E.g., [-az] or [a\-z].
Exercise
Find all two-letter words ending with e by using \<[a-z]e>>. Use both gvim and grep.
Matching the Complement
To match everything not in a set, let the first symbol between [
Universal Match
You can match any symbol with .. If you want to match a actual period, escape it with a slash: \.<\tt>.
Exercise
Find all matches of 'd\. and all matches for 'd. using both gvim and grep.
Why doesn't t..\> match all words with t as a third-last letter? How would you fix this?
Matching the Start and Ends of Lines
You can match the start of a line by using the ^, for example, ^To matches all words starting with To which appear at the start of a line. Similarly, $ matches the end of a line. In a C++ program, ;$ would match only those semicolons which appear at the end of a line.
Repetition
You can indicate that any regular expression is repeated by using *. For example, a* indicates that you are searching for zero or more instances of the letter a. For example, [a-z][a-z]* matches all words in a text.
Copyright ©2005-2012 by Douglas Wilhelm Harder. All rights reserved.