Contents of the BOYER.DOC file
W. Vann Hall
P.O. Box 10804
Arlington, VA 22210
Boyer-Moore is an attempt to speed up the usually serial text search.
Traditionally, text searches proceed from the first character onward.
(In these examples, "string" refers to string we're trying to locate,
"text" the array of characters we're trying to find the string in.
These all ignore case sensitivity, matching around line boundaries,
String to find: STRING
Text to find it in: HOW YOU SEARCH FOR A STRING
Our StringPointer and TextPointer both start at 1. We compare
String and Text. Since "S" and "H" don't match, we increment the
text pointer and compare again. At TextPointer = 9, the "S" of
"STRING" and the "S" of "SEARCH" match, so we increment TextPointer
*and* StringPointer and compare String and Text. "T" and "E"
don't match, so we reset StringPointer and increment TextPointer again.
And so on. You can see that it takes 22 comparisons before we locate
the correct "S" and a full 27 before we know that we have located
Boyer-Moore, on the other hand, works from the end of the string (but
still the beginning of the text). First, it creates a look-up table of
values corresponding to the distance of the first occurence of each
character from the end of the string. For instance, the table for
"STRING" would read:
A=6 B=6 C=6 D=6 E=6 F=6 G=6 H=6 I=2 J=6 K=6 L=6 M=6
N=1 O=6 P=6 Q=6 R=3 S=5 T=4 U=6 V=6 W=6 X=6 Y=6 Z=6
Note that characters not located in the string are set to
Length(String), as it the final character.
Since a 6-character string can't be located entirely within the first 5
characters, we start by comparing the String -- "G" -- with Text:
"O". They don't match, so we increment TextPointer by the value
associated with "O" in the table; that is, by 6. Our next compare is
"G" with the "R" in "SEARCH". We now increment TextPointer by "R"'s
value, 3, and compare "S" to " ". And so on. Here's a tracing of the
progression; the letter to be compared is highlighted by lower casing:
HOW YoU SEARCH FOR A STRING
HOW YOU SEArCH FOR A STRING
HOW YOU SEARCH^FOR A STRING
HOW YOU SEARCH FOR A STRING
HOW YOU SEARCH FOR A STRINg
HOW YOU SEARCH FOR A STRIng
HOW YOU SEARCH FOR A STRing
HOW YOU SEARCH FOR A STring
HOW YOU SEARCH FOR A String
HOW YOU SEARCH FOR A string
There; 10 comparisons total. You can see how this would speed
searching a long text -- enough to make up for the additional overhead
of compiling the look-up table.