Contents of the READ.ME file
60K Words Documentation
(January 5, 1988)
The following 5 files should be present:
60K-PRT1.ARC (119K bytes)
60K-PRT2.ARC (119k bytes)
APPEND.BAT (22 bytes)
READ.ME (This file)
WORDSTAT.DAT (Summary word list statistics)
I have structured this file as an "arc of arcs" since the file
containing all 60,878 records is over 645K bytes long, and thus will
not fit on a 360K floppy. Each of the first two files listed above
will expand to 323K bytes when "un-arc'd".
The files produced from "unarcing" the two files are named
60K-WORD.PT1, and 60K-WORD.PT2 respectively. To combine the two,
assuming you have enough disk space, you can use the DOS copy, but
since the syntax is somewhat awkward, I have included APPEND.BAT to
(hopefully) help. The syntax for APPEND.BAT is:
APPEND 60K-WORD.PT2 60K-WORD.PT1 [outfile]
where "outfile" is optional. If a file name is included, the
combined files will be copied to the designated file name in the
proper order. If no "outfile" is specified, the file 60K-WORD.PT1
will hold it's original contents plus the contents of 60K-WORD.PT2
when the operation is complete.
ABOUT THE WORD LIST
My theory of spelling checkers is that ideally their word lists
will contain all the words I will ever use, and no more. As more
words are included in the word lists, the likelihood increases that a
"typo" will result in a word which is included in the spell checker's
word list, and thus not be caught when the document is spell checked.
Obviously, a commercial spell checker must contain most of the
words that any of its prospective users will use; thus, most spell
checkers contain 80,000 to 130,000 words. On the other hand most
shareware spell checkers include only 4,000 to 10,000 words in their
"dictionaries" (some which are misspelled!), which is woefully
inadequate. Even if they allow the user to add words, it can take
many months before the spell checker is useful since it will likely
stop every 20-50 words to flag a word which is not in its dictionary,
even though it is spelled perfectly.
The 60,000 words contained in this file were selected to serve as
a "core" dictionary to which the user will add those words unique to
his own usage. Therefore, the following guidelines were used:
1. Common (uncapitalized) words.
2. Names of States in the US and their conventional abbrevia-
tions (two-letter postal abbreviation, and 3 or more letter
3. Common abbreviations used in addresses (St., Blvd., Rd.,
4. Continents and oceans.
5. Major religions.
6. Months of the year, days of the week and standard abbrevia-
7. No hyphenated words.
8. Words which are contained in the spell checking dictionaries
of both Turbo Lightning, and WordPerfect 4.2.
There are a few exceptions to the above guidelines, but they are
not significant in my opinion. Rest assured that all the words in the
list are correctly spelled.
During the course of developing the word list I found that it is
not difficult to find additional words, the difficulty is in reducing
the list to include only "common" words while ensuring that all very
common words are included. I am not entirely satisfied with the
result. When I last worked on the list, I added "downcast", "mercy",
and "envy", which strongly suggests that many very common words are
still missing. Nonetheless, since I now intend to expand the list
with words which I myself use, I decided to upload the list before I
"corrupted" it further with the arcane language of engineering,
computer programming, contracts, and management consulting.
I ended up with over 10,000 words in my "exclude" list since they
didn't meet the guidelines listed above. Some of them seem to me to
be fairly common (e.g. "awestruck", "boilerplate", "configure",
"radians", "toggled") but are not in either of the commercial spelling
checkers I used. Some are in one of the commercial checkers, but not
the other. Others are proper names ("Andrew", "Michael", "Douglas",
"Chevrolet", etc.), hyphenated words ("Coca-Cola"), abbreviations
("IRS", "ASCII", "YMCA", etc.), some are obscure, and some are
probably just misspelled.
I also developed five utility programs help evaluate word lists.
One compares two lists and creates a new file with only those words
in the second list which are not in the first, or alternatively, can
create a file containing the words common to both lists, the second
splits word list files into 1-9 equal-sized files by either number of
records, or number of bytes, a third makes a WORDSTAT.DAT file
(example included herein), a fourth alphabetizes lists, and eliminates
leading and trailing spaces and blank lines, and a fifth which makes a
file containing only those words which are between x and y characters
long (x and y are user-specified).
If you would like a diskette containing the "excluded" word list
and the five utility programs send $10.00 to me at the address below.
Otherwise use the 60,000 words to begin your own word list or to test
your spell checker, but please don't send me a note or otherwise
advise me of all the common words which are missing -- I have neither
the time nor the temperament to act as a clearing house for the
"ultimate word list".
Skene H. Moody
S. H. Moody & Associates
1810 Fair Oaks Ave.
South Pasadena, CA 91030