Dec 062017
 
Interesting spell checking program. Uses the idea of "spell filtering".
File SPELLF.ZIP from The Programmer’s Corner in
Category Word Processors
Interesting spell checking program. Uses the idea of “spell filtering”.
File Name File Size Zip Size Zip Type
$.DIC 8192 5302 deflated
SPELLF.COM 12156 7508 deflated
SPELLF.DOC 19116 7579 deflated

Download File SPELLF.ZIP Here

Contents of the SPELLF.DOC file



^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^ ^
^ DOCUMENTATION FOR ^
^ ^
^ SPELLF !Tiny Trainable Spellfilter Ver. 1.01 ^
^ ^
^ (c) 1989 Kas & Rita Thomas. All rights reserved. ^
^ ^
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^

You may distribute SPELLF to your friends. But if you do so, please
register your copy with us today, by sending $10 (in cash, check,
M.O., stamps, whatever) to: Kas & Rita Thomas, 578 Fairfield Ave.,
Stamford, CT 06902.

This program was created using Borland Turbo C (Ver. 2.0). If you
would like fully commented source code for SPELLF, kindly send $25 in
lieu of the registration fee. In other words: For $25, you get source
code AND registration, all in one shot (on disk).

When you write, please tell us about your computer setup. Are you
using SPELLF on a laptop? Which model? (Are you happy with it?) We'd
be delighted to hear from you. Write to us today!





WHAT IS SPELLF? AND: WHAT ISN'T IT?

SPELLF is a spellfilter. We feel this word better describes the spirit
of this utility than the term "spellchecker," which implies a literal
brute-force comparison of words against a dictionary. SPELLF does not
do this, obviously. (The "dictionary" file used by SPELLF is only 8K
in size!) Rather than check literal spellings, SPELLF checks the
"legality" of substrings within each word and reports out any words
that contain illegal letter combinations. In this way, SPELLF acts
more like a logical filter than a mere word-looker-upper. You give a
text file to SPELLF, and the program acts as a sieve, reading the text
file in and passing suspect words to "stdout" (the standard output
device for your computer; usually the console). The output is fully
redirectable, which means you can have it show up on your printer (or
on disk) rather than on your screen.

SPELLF carries out its filtering at lightning speed: On the slowest 8-
MHz XT clone we could find, SPELLF took only 10 seconds to spellcheck
a 120K text file! (It adds words to the main dictionary at the
blistering rate of 1,000 English words -- about 6 Kbytes -- per
second.) On a 286 or 386 machine, you may see throughput rates five to
ten times higher. Especially if you use redirection. (The biggest
runtime bottleneck for SPELLF is waiting for the screen to update.)

Speed isn't SPELLF's primary virtue, however. Small size (both in
terms of disk image and RAM overhead) is where SPELLF really shines.
The program itself is just 12K, making it perfect for use on floppy-
disk-only systems (laptops, for example) and systems that can't
accommodate the 200K spellcheckers that come with such programs as ...
well, you know the big names. At runtime, SPELLF uses approxi-
mately 76K of RAM (12K of core RAM and 64K of "far core" heap
space). For its dictionary, SPELLF uses a disk file only 8 kilobytes
in size. No matter how many words you stuff into this dictionary file
($.DIC), it never gets bigger than precisely 8K.

Another virtue of SPELLF is that it operates in command-line mode,
making it easy to invoke spellchecking from batch files.

And, SPELLF's output (a list of suspect words, with their line
locations in the original file) is redirectable. That is, you can make
the suspect-word list show up at your printer, or in a new file on
disk, rather than on the screen. (Again, this approach lends itself to
batch file applications.)





DISCLAIMERS

Whether or not you send us money, no warranties are made with respect
to the program's performance, fitness, hardware compatibility, etc.

To run this program, you need at least 72K of free RAM and 20K of disk
space at all times. Any version of DOS should be fine.

Note: This program is designed to spellcheck standard ASCII text files
only. Nothing disastrous will happen if you attempt to spellcheck a
WordStar file. SPELLF merely ignores high-bit characters. For
this reason, control codes that fall outside the normal range of
alpha-numeric ASCII characters pose no problem whatsoever. But if
your word processor uses /b or /i for font calls, you may see
erroneous exception-word callouts on the screen.

If there is sufficient demand, we will rewrite the program to take
care of format code detection. (Don't hold your breath, however!)





THE $.DIC FILE

Your program should have come bundled with an 8K file called $.DIC.
SPELLF looks for this file at runtime; it contains the information
needed to conduct spellchecking (or spellfiltering) operations. This
file MUST be present in the current working directory, for SPELLF to
work properly.

The $.DIC dictionary file that we created for bundling with this
program contains data for 20,000 commonly used English words. If you
did not get a copy of $.DIC, don't panic: SPELLF will let you create
your own $.DIC file, but you will have to supply your own core word
list (of properly spelled words) with which to make the file. To
create a brand-new $.DIC file from scratch, do this:

1. Be sure you are in a directory that contains no $.DIC file. (If a
$.DIC file already exists, rename or delete it; otherwise the existing
file will be UPDATED to contain the new word list.)

2. Run SPELLF with two arguments as follows:


SPELLF {filename of word list} -a [Enter]


Don't forget the "-a" on the command line. This tells SPELLF to ADD
the contents of {filename} to $.DIC.

If $.DIC did not exist before, a new file named $.DIC will show up in
your current directory when you follow the above procedure. You will
then be able to "spellcheck" any file against the $.DIC dictionary.

If a $.DIC file already existed in the current directory, the contents
(vocabulary) of {filename} will simply be ADDED to $.DIC. The previous
contents of $.DIC will be preserved. It is a true ADD operation.






CREATING CUSTOM DICTIONARIES

Obviously, you can use the foregoing procedure to create any number of
custom dictionaries (scientific word lists, lists of foreign words,
computer terms, etc.) for use with the SPELLF program. All you need to
do is supply the name of a file containing a list of scientific terms
(or whatever) on the command line:


SPELLF NEWWORDS.TXT -a [Enter]


In this case, the vocabulary of NEWWORDS.TXT will be added to $.DIC.
(Once again: If $.DIC did not already exist in the current working
directory, it will be created. If it exists, it will be updated.) The
filename does not have to have a .TXT extension; this is merely an
example. Any extension is permissible. (Caution: If you accidentally
supply the name of a spreadsheet or other non-text file, you stand a
chance of corrupting your existing $.DIC file. Always keep a "good"
copy of $.DIC on a backup disk somewhere.)

Note that SPELLF is not merely an English-language spellchecker. If
you happen to have a French word list on disk, you can use it to
create a French-language version of $.DIC, and hence a French-language
spellchecker. The copy of $.DIC that comes bundled with SPELLF happens
to contain 20,000 English words. It could just as easily contain
French, Spanish, German, or Italian words, etc.





SPELLCHECKING

To spellcheck a text file, just type SPELLF at the DOS prompt and
supply the name of the text file you wish to examine. Then hit Enter.
Example:


SPELLF {filename} [Enter]



If you type SPELLF DRAFT.DOC [Enter], the file DRAFT.DOC will be
rigorously spell-checked using the words contained in $.DIC. (Remember
that SPELLF expects the file $.DIC to be in the same directory as
DRAFT.DOC.) A complete list of suspect words and their line locations
will be printed to the screen. If you want, the list of suspect words
can show up in a separate disk file, or be printed out on your
printer. This is called redirection.

To redirect the output of SPELLF to a brand-new file called BADWORDS,
simply type:



SPELLF DRAFT.DOC > BADWORDS [Enter]



During program execution, in this case, nothing will happen on the
screen (but the disk will be active). When the DOS prompt returns, you
should see that there is a new file, BADWORDS, in your current
directory. You can look at this file with the DOS command TYPE, or use
your favorite word processor to open it.

Note that SPELLF operates much faster when redirection to a file is
used. That's because the process of displaying output on the screen
(in SPELLF's default mode) is inherently slow -- slower, at least,
than SPELLF's file I/O operations.

You may wish, occasionally, to redirect output to your printer. Again,
you can use DOS to do this:



SPELLF DRAFT.DOC > PRN [Enter]


This causes the file DRAFT.DOC to be spellchecked and the suspect
words to appear at the printer, rather than at the screen.






TESTING WORDS FOR PRESENCE IN THE DICTIONARY

You can test words for presence or absence in the dictionary.
The procedure is very simple: Just type SPELLF and -t. Like this:



SPELLF -t [Enter]



As soon as the program loads, a query message comes up on the screen
asking you to supply a word. Just type it on the screen; then hit
Enter. SPELLF will consult $.DIC to see if the word you typed is
already in the dictionary. If it is, you'll be told so, and you will
be asked if you want to test another word. (Type 'y' or Enter to
answer in the affirmative.) If the word you typed was NOT in the
dictionary, you'll be told so, and you will be given a chance to add
it. No extra typing is required to add the word: Just hit Enter OR
type 'y' for yes. (You must type 'n' if you do NOT wish to add the
word to $.DIC.)

You may keep checking words in this fashion for as long as you like.
The loop will stop when you answer 'n' (No) to the "Test another
word?" prompt.

"Test" mode thus offers a second way (besides "Add" mode; see
"CREATING CUSTOM DICTIONARIES," above) to update the dictionary. You
can update the $.DIC dictionary one word at a time simply by entering
the test-word loop as above, quitting at any convenient point.

Obviously, when you have more than a handful of words to add to the
dictionary, it makes sense to create a text file containing just the
words in question, and "feed" it to SPELLF in "Add" mode by typing
SPELLF -a , then Enter.





SPELLCHECKING VS. SPELLFILTERING

Conceptually, there are two ways to attack the problem of
spellchecking: You can approach it in a brute-force manner, comparing
every word in a file with entries in a dictionary . . . or you can
approach it as an "expert system" type of problem, wherein an attempt
is made to catalog all known rules of spelling (and spellcheck on a
logical basis rather than a lookup basis). The first approach -- brute
force lookup of words in an all-encompassing dictionary -- is familiar
to anyone who has ever used a fully featured word processor. This
approach is, in theory, infallible, provided the dictionary is
complete. In the real world, no dictionary is ever large enough to
approach completeness, nor CAN a dictionary ever be foolproof, due to
the changing nature of the language. Proper nouns are always a
problem, and new coinages are always entering the language. Indeed,
the inherent "inventivity" of English is one of its most endearing
characteristics. But this very property dooms all orthodox spell-
checkers to failure.

The second approach -- that of examining the rules governing spelling,
and constructing a spellchecker based on those rules -- happens to
coincide with the approach taken by most hyphenation programs. Most
hyphenators are state machines whose main operations consist of affix
analysis and word-root parsing. The same approach can be extended to
spellchecking. Why not simply examine patterns of letters, and ask
whether the individual patterns are "legal constructs" or "illegal
constructs" based on known rules of English spelling? (Here, by
patterns we mean more than just two-letter combinations.) An
adaptation of this approach is used in SPELLF.

The state-machine approach has many advantages over the brute-force
method. For one thing, it requires no bulky dictionary (which, in many
conventional programs, consumes 200K or more of disk space IN
COMPACTED FORM). For another thing, it means that, in theory anyway,
English words that don't exist yet can be spellchecked properly,
because new coinages will always (presumably) follow the rules of
English-language letter-combining. (That is, it's unlikely anybody
will coin words that contain unpronounceable combinations like "xbjt"
or "tqtpp.") A rule-based program should be able to spellcheck new
coinages that it has not seen before. This is a formidable advantage
over conventional spellchecking systems.

SPELLF incorporates these ideas. Plus, it does so in ways that are
simple, effective, and conducive to rapid program execution.

When we say that SPELLF is a spellfilter, we mean simply that the goal
of the program is to filter out illogical spellings and yield up a
list of words whose letter-combinations don't make sense from the
standpoint of known rules of English spelling.





HOW THE PROGRAM WORKS (A TECHNICAL ASIDE, FOR PROGRAMMERS ONLY)

If you're wondering how 20,000 or more English words can fit into an
8,192-byte dictionary file, consider that in 8,192 bytes there are
65,536 bits, each of which can represent a unique hash code. The $.DIC
file that we've been discussing so far is not really a dictionary file
in the true sense of the word, but a hash table, each one-bit entry of
which represents a 16-bit hash code derived from a four-byte substring
of whatever word is currently being examined. Words less than five
letters long correspond to single entries in the hash table; words
with five or more letters are carried in the table as multiple hashes.
A five-letter word has two hash entries; a six-letter word has three
hash entries; etc.

During runtime, every word in a file is "factored out" into its
constituent hash codes (however many there are), and each
corresponding hash position in the $.DIC file is checked. If the hash
check reveals a "legal" combination of four letters, the check
succeeds; if the hash is illegal (representing a letter combination
not found in the table), it fails and the word is flagged as
"suspect."

A 16-letter word such as "incomprehensible" contains 13 four-letter
substrings and has 13 corresponding hash codes, each of which must be
checked against entries in the $.DIC table. If even one check fails,
the word is flagged as suspect, and printed to "stdout." The word
"incomprehensible" must pass all 13 checks in order to be presumed
correctly spelled.

Notice that all checks are done in RAM; the disk is not accessed 13
times when "incomprehensible" is checked against $.DIC, because $.DIC
is captured to RAM at runtime.

The concept of hashing spellcheckers is not new; McIlroy created one
for the PDP-11 (IEEE Trans. Comm. COM-30, Jan. 1982, pp. 91-99), and
another attempt at such a program is discussed in CRAFTING TURBO C
SOFTWARE COMPONENTS & UTILITIES by Richard S. Wiener (1988, Wiley &
Sons). What is new about SPELLF is the small size of the hash table,
and the concept of "spellfiltering" as opposed to conventional
spellchecking from within a text document. No one talks about these
programs as filters. But that's essentially what they are. You run
SPELLF on a file in order to filter out bad words -- "suspect" words.
The spelling is never actually "checked" in the literal sense.

One can argue as to whether a 16-bit hash is adequate for a four-
letter substring. We start from the assumption (which is demonstrably
true) that five bits of information will suffice to encode 26 letters
of the alphabet. We also know that 16 letters can be mapped precisely
into four bits. Something slightly more than four bits is needed to
unambiguously encode 26 letters, but the process of mapping 26 letters
into four bits isn't as dangerous as it sounds, because: (1) the 16
most-frequently-used letters of the alphabet are used with very high
frequency indeed, and (2) the remaining 10 letters can be assigned
hash positions that tend to map over known-good portions of the hash
table. Accordingly, in SPELLF, the ASCII table is renumbered to
reflect frequency of usage and best-fit mapping, so that hash
collisions are seldom "fatal" in the sense of allowing a wrong
spelling to score as "right."

Still, SPELLF is not perfect. There are times when a misspelled word
is counted as correct. But this happens very infrequently. (If you
want to get a feel for this, run the program in "test" mode and try to
trick the dictionary with various misspellings of common words.)
Please understand that we make no guarantee of SPELLF's spelling
accuracy, nor do we contend that it is untrickable. SPELLF is
imperfect -- like every spellchecker.

What SPELLF lacks in precision, however, it more than makes up for in
speed, ease of use, and compactness (to say nothing of its
adaptability to batch file technique, redirectable output, etc.), and
on balance, we feel the "filter" approach is every bit as useful in
day-to-day usage as the orthodox big-dictionary brute-force approach.
We'd love to know what YOU think; write to us at 578 Fairfield Ave.,
Stamford, CT 06902. And enclose $25 for full Turbo C source code,
ready to compile in the tiny model.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *






 December 6, 2017  Add comments

Leave a Reply