Dec 192017
 
Finds selected strings in file with C source.
File FINDS101.ZIP from The Programmer’s Corner in
Category C Source Code
Finds selected strings in file with C source.
File Name File Size Zip Size Zip Type
BMSEAR.C 3193 716 deflated
BMSEAR.OBJ 663 469 deflated
FINDSTR.C 2453 904 deflated
FINDSTR.COM 9445 5427 deflated
FINDSTR.DOC 8273 2972 deflated
FINDSTR.OBJ 1985 1155 deflated

Download File FINDS101.ZIP Here

Contents of the FINDSTR.DOC file


FINDSTR.ARC - Find Multiple Strings in Multiple Files


Version 1.01: August 5, 1986
Author: Don A. Williams
Language: Datalight C Developer's Kit, Version 2.03


FINDSTR is a program that can be used to search multiple files
for multiple text strings. It was actually developed as a test
program for a Boyer-Moore string search subroutine. The Boyer-
Moore algorithm is many times faster than the more common string
search algorithms and FINDSTR is several times faster than other
similar programs. It does not provide the complex pattern
matching and action statements of BAWK but is from 3 to 5 times
faster than BAWK at finding simple strings.


USAGE:

A>FINDSTR [-c] file_spec [file_spec .....]

FINDSTR defaults to a case insensitive compare on the assumption
that is better to find too many occurrences than to miss some.
The '-C' command option will change this default to a case
sensitive compare if that is what is required. The '-C' option
can actually appear anywhere on the command line, however, no
file name specification may begin with a '-' as a result.

FINDSTR can accept multiple file specifications on the command
line and each file specification may be a full MS-DOS path name
which, complements of Datalight C, may contain "wild cards".
FINDSTR will read STDIN for the strings to be searched for. The
search strings are entered one per line and may contain blanks,
quotes, and other "special" characters but may not contain
carriage returns or line feeds (in this version). FINDSTR will
quit reading search strings when it encounters either a null
line, i.e. a line containing only a carriage return or an End-of-
File (Control-Z). Since STDIN may be redirected, FINDSTR can
read the search strings from either the console or a file.


OUTPUT:

For each file that it processes, FINDSTR will output a line
containing the path name of the file followed by line for each
occurrence of one of the search strings in the file. These lines
will contain the search string found followed by a blank followed
by the 4-digit decimal line number of the line in the file
followed by a ':' followed by a blank followed by the line
itself.


EXAMPLES:

Since the output lines of FINDSTR will frequently exceed the
length of the lines in this document, they will be truncated in
these examples; a '>' character in the right margin will indicate
that the line has been truncated.

Example 1 - Search this document (to this point) for "the"


C>FINDSTR findstr.doc
FINDSTR Version 1.01: August 5, 1986

Enter search patterns, 1 per line
NULL line terminates
the

--findstr.doc--
the 12: Moore algorithm is many times faster than the more com>
the 13: search algorithms and FINDSTR is several times faster>
the 14: similar programs. It does not provide the compl>
the 23: FINDSTR can accept multiple file specifications on t>
the 26: FINDSTR will read STDIN for the strings to be searche>
the 28: quotes, and other "special" characters but may >
the 30: quit reading search strings when it encounters ei>
the 33: read the search strings from either the console or a f>
the 39: containing the path name of the file followed by line>
the 40: occurrence of one of the search strings in the file. >
the 41: will contain the search string found followed by a bla>
the 42: by the 4-digit decimal line number of the line >
the 43: followed by a ':' followed by a blank followed by>
the 49: Since the output lines of FINDSTR will frequently>
the 50: length of the lines in this document, they will be tr>
the 51: these examples; a '>' character in the right margin wi>
the 52: that the line has been truncated.
the 54: Example 1 - Search this document (to this point) for ">


Example 2 - Search the C source files for this program for "if"
and "for".

C>FINDSTR *.c
FINDSTR Version 1.01: August 5, 1986

Enter search patterns, 1 per line
NULL line terminates
if
for

--BM.C--
for 7: for (i=0; i<256; i++) d[i] = pl;
for 8: for (i=0; iif 28: if (j < 0) return(io - pl);
--BMSEAR.C--
for 7: for (i=0; i<256; i++) d[i] = pl;
for 8: for (i=0; iif 30: if (j < 0) return(io - pl);
--FINDSTR.C--
if 35: if (pl == 0) break;
if 37: if (t == NULL) {
for 38: fprintf(stderr, "Insufficient memory>
if 42: if (t->Pattern == 0) {
for 43: fprintf(stderr, "Insufficient memory>
if 47: if (t->d == 0) {
for 48: fprintf(stderr, "Insufficient me>
if 54: if (PatQueue.Head == NULL) PatQueue.Head>
for 58: for (fp=1; fpif 59: if ((F1 = fopen(argv[fp], "r")) == 0) {
for 69: for (t=PatQueue.Head; t != NULL;>
if 70: if ((p = BMSearch(Line, >

This example could also have been run by creating a file, say
INPUT, containing the search strings, one string to a line as
follows:

if
for

The command line would then be:

C>FINDSTR *.c
The output would be the same as shown above. The output could
also have been redirected to a file by the command line:

C>FINDSTR *.C output

In this case, all of the output from the first file name on would
have been put in the file, OUTPUT. Input and output redirection
are independent, i.e. either may be used without the other.


TECHNICAL CONSIDERATIONS:

The "heart" of this program is the Boyer-Moore string search
algorithm. This algorithm is the fastest known on the average.
The description of how it works is somewhat complex and is

presented in "Data Structures and Algorithms" by Niklaus Wirth,
pages 66-69 and by the inventors R. S. Boyer and J. S. Moore in
"A Fast String Matching Algorithm", Communications of the ACM,
20, 10, (Oct. 1977), pp 762-772. The algorithm does required
"compilation" of each search string and is best suited to
conditions where the data to be search is considerably larger
than the search string. The routine, BMCompile, in the module,
BMSEAR, performs the compilation and the routine, BMSearch, in
the same module performs the actual search. For each search
string, the algorithm requires a integer array to hold the
"compiled" string as well as space to retain the string itself.
I have chosen to make these arrays 256 entries long to allow
strings to contain characters above the standard ASCII 128.

FINDSTR forms the input search strings into a simple First-In-
First-Out (FIFO) linked list, acquiring memory for both the
string itself and for its "compiled" array dynamically.

The expansion of "wild card" file specifications on the command
line is performed by a proprietary module supplied with the
Datalight C compiler, however, other good C compilers provide
similar facilities. Outside of the "wild card" expansion,
FINDSTR and BMSEAR are very "standard" C and should be compilable
by any other C compiler.

A similar Boyer-Moore search algorithm is also available in Turbo
Pascal.


COPYRIGHT CONSIDERATIONS:

As far as I can determine, everything in FINDSTR and BMSEAR,
except the expansion of "wild card" command line parameters, is
in the public domain and no copyright restrictions apply. Since
the total development time for this program from initial
conception to the production of this document was under 9 hours,
the usual request for "donations" is absurd. The author, of
course, disclaims liability for any damages of any kind
whatsoever arising from the use of this program or any of its
component parts.




 December 19, 2017  Add comments

Leave a Reply