[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] Skip to the content of the web site.

Advanced Directory Commands

Contents Previous Topic No Next Topic

Regular expressions are a means of describing a set of strings. We will look at a subset of regular expressions which are used by vi to allow the searching of a file being edited and grep to allow the searching of files, however, many other Unix tools use regular expressions, including, for example, lex, sed, awk. Engineering students taking ECE 251 will more formally cover regular expressions.

The simplest regular expression is a string which contains only letters and numbers, such as net. This trivial regular expression matches the set of strings {net}.

Matching Words

If you want to match th, but only at the start of a word, you can precede it with \&, for example, \<th. Similarly, you can match it at the end of a word with \>, for example, th\>. Of course, you can match entire words, e.g., \<the\>.

Alternation

Suppose, however, we wish to match the words {net, Net}. To allow multiple characters in one location, we can use [ ... ], adding any characters we are interested in between the brackets: [Nn]et.

Other examples of alternation are n[aeiou]t which matches {nat, net, nit, not, nut} or [Nn][aeiou]t which matches {nat, net, nit, not, nut, Nat, Net, Nit, Not, Nut}.

As you can see, you can very quickly describe, with a small number of characters, a large possible collection of strings.

To match [ and ], escape the character with a \.

Exercise

Down-load the file hamlet.txt.

{ecelinux:1} grep be hamlet.txt
{ecelinux:2} grep "[Bb]e" hamlet.txt
{ecelinux:3} grep "\<[Bb]e\>" hamlet.txt
{ecelinux:4} grep -n "\<[Bb]e\>" hamlet.txt

The first will match be anywhere, the second will match be or Be anywhere, while the third will match only the word be or Be, and not, for example, remember.

Next, edit the file in vi and

{ecelinux:5} gvim hamlet.txt

and search for /be. Press n and N an few times each. Next search for ?/[Bb]e. Again, press n and N a few times. Finally, search for /\>[Bb]e\<.

Ranges of Letters or Numbers

You can specify ranges of letters or numbers in [ and ] using the -, for example, [a-z]. Thus, [a-zA-Z0-9] searches for any letter or number. You are not restricted to ranges: [a-e1-5] matches any letter in {a, b, c, d, e, 1, 2, 3, 4, 5}.

If you want to add a - to the set of characters you're searching for, either escape it with a \ or add it at the front or end. E.g., [-az] or [a\-z].

Exercise

Find all two-letter words ending with e by using \<[a-z]e>>. Use both gvim and grep.

Matching the Complement

To match everything not in a set, let the first symbol between [

Universal Match

You can match any symbol with .. If you want to match a actual period, escape it with a slash: \.<\tt>.

Exercise

Find all matches of 'd\. and all matches for 'd. using both gvim and grep.

Why doesn't t..\> match all words with t as a third-last letter? How would you fix this?

Matching the Start and Ends of Lines

You can match the start of a line by using the ^, for example, ^To matches all words starting with To which appear at the start of a line. Similarly, $ matches the end of a line. In a C++ program, ;$ would match only those semicolons which appear at the end of a line.

Repetition

You can indicate that any regular expression is repeated by using *. For example, a* indicates that you are searching for zero or more instances of the letter a. For example, [a-z][a-z]* matches all words in a text.

Contents Previous Topic No Next Topic

Copyright ©2005-2008 by Douglas Wilhelm Harder. All rights reserved.

[an error occurred while processing this directive]