Dec 162017
 
Count word frequency in text.
File WORDFREQ.ZIP from The Programmer’s Corner in
Category Word Processors
Count word frequency in text.
File Name File Size Zip Size Zip Type
WDFREQ.COM 15390 10973 deflated
WDFREQ.DOC 7491 2886 deflated
XCL.DIC 1500 557 deflated

Download File WORDFREQ.ZIP Here

Contents of the WDFREQ.DOC file




W d F r e q - - Word frequency counter


The .ARC file or diskette you just received should contain
the following files:

WDFREQ.COM The program itself
WDFREQ.DOC This file
XCL.DIC A sample exclusion file


Please read the statement/disclaimer/exhortation at the end of
this document; we have put it there rather than here as a courtesy
to those taking a quick look at this file just to get started.

What the program does:

WdFreq reads an input file of ASCII text and writes an output file
containing a list of the different words found in the text and their
frequency of occurrence. This output file may be sorted using other
programs, input to a style analyzer, or whatever.

How to use the program:

prepare the standard ASCII text file you wish to feed to WdFreq.
Note: Most word-processing programs have an option to
write their own special file format (usually containing
formatting information and other special codes) to a
straight ASCII text file. Look in your WP manual under
ASCII files or Programming (since .BAT programs, and
the input to most language compilers, must be in normal
ASCII). WdFreq can read WordStar files without conversion
(but not WordStar 2000 files). A good test is to try
TYPEing or PRINTing the file from the DOS prompt; if it
looks OK, WdFreq can probably read it all right.

type WDFREQ at the DOS prompt.

the program will ask "Infile:" -- answer with the name of the
text file you created above. You may include device:
\subdir\subdir .. , up to 64 characters.

the program will ask "Outfile :" -- answer with the name of
the file to write the word counts to. If you just type
-- i.e. answer nothing -- the program will use the
name WF.TMP. Again you may use device:\subdir\ .. if
necessary.

finally the program will ask "Exclusion List :" -- answer with
the name of a file containing words you don't want included
in the count. A sample file containing pronouns, articles,
etc. is included here. If you answer here, the
program will count everything, including THIS, IS, THE, OF,
etc.

Note: If the program can't find the input or exclusion file, it will
complain and offer you the opportunity to either reenter the
filename or quit in disgust by entering (the escape
key). It will NOT complain if the output file you specify
already exists; it will just merrily overwrite the existing
file with its word count -- so keep your head about you!


The program will then read the input files, keeping you advised of
its progress. The "bytes left" is how much memory remains
free as he allocates space for his internal table.

Important note: In the unlikely case that the program runs out of
space for its internal word list, it will complain "Out of
room" and again allow you to abort the program with .
You should definitely abort under these circumstances; if
the program continues it may lock up your machine, requiring
a reboot. But don't worry too much about this; the program
needed less than 66k of dynamic storage to count the 2000+
different words in a 2800-line input file created from
several shareware .DOC files. If you run it on War & Peace,
however, you may want to break the text into smaller pieces.

Performance: Processes about 1200 lines per minute on an 8-MHz
XT (V-20 chip) with a fast hard disk.

Using the output file -- this file, as produced by WdFreq, is
not in any useful order; for study you will probably
want to sort it either alphabetically or in numerical
frequency order. As long as the file is less than 64K bytes
long, the DOS sort filter can be used, although it is
very slow:

Alphabetically:

SORT < wf.tmp (or whatever) > wf.srt (the sorted filename)

By frequency, most frequent first:

SORT/+24/r < wf.tmp > wf.srt

[/r produces the reverse order, /+24 says start
at column 24 in the file. 24 is the maximum
word length for WdFreq, by the way -- German
scholars take note . . .]

A much faster and more flexible sorting program is Buerg's
FSORT, available on many bulletin boards.

* * * * *

This program was written by Craig Goodrich of PussyCat Systems
(p.o. box 266, Washington Grove MD 20877, phone 301 869-2297) and is
distributed without any warrantee, liability, or guarantee whatever.
It may be redistributed freely providing that no modification is made
to any of the files included with this distribution. No charge beyond
a reasonable media fee or bulletin board subscription fee may be charged
for it except by us PussyCats.

Frequent users of this program may assuage their consciences by
sending a contribution ($10 suggested) to PussyCat Systems at the above
address.

Students of programming may receive the Turbo Pascal source code
to WdFreq by sending a blank, formatted 360K 5.25-inch floppy disk and
stamped, self-addressed mailer to us PussyCats, together with a statement
(preferably signed in blood) affirming that they will use the code for
educational purposes only. The code provides what we think is a very
good example of elementary hash-table and linked-list processing.




************************************

PussyCat Systems provides expert VMS & DOS consulting services

including:
hardware evaluation

systems analysis

communication & networking

user education

programming in
C
Pascal
dBASE (FoxBASE, Clipper, . . )
Datatrieve (VAX or PDP-11)
Assembler


Contact us PussyCats for help solving YOUR mini/micro
problems at Box 266, Washington Grove MD 20877, or call (301)
869-2297 between 5 & 9 pm.


This has been a commercial announcement.



Programs mentioned in this document are the intellectual property of
their authors and/or distributors and their names may be registered trade
or service marks.



 December 16, 2017  Add comments

Leave a Reply