Dec 062017
Converts text from ASCII to other formats, etc. | |||
---|---|---|---|
File Name | File Size | Zip Size | Zip Type |
TEXTCON.DOC | 63474 | 16377 | deflated |
TEXTCON.EXE | 22528 | 11671 | deflated |
Download File TEXTCONV.ZIP Here
Contents of the TEXTCON.DOC file
TEXTCON (Version 1.3)
A Program for Conversion of ASCII Files
Between Word Processors
Chris Wolf
November 30, 1986
PURPOSE:
Virtually all word processors will import ASCII files, but
anyone who has tried it knows that the results are often
less than optimal. Documents that are transferred this way
almost always require a great deal of manual "cleaning up"
to get them into the desired format.
TEXTCON is a file "pre-processor" for MSDOS computers that
does much of the necessary conversion automatically. The
ASCII files that it produces are in a form more suitable for
importation to most word processors. TEXTCON will not
eliminate all manual editing, but it makes the job much
easier.
TEXTCON has tremendous power and flexibility that can be
useful for tasks involving data base files, desktop
publishing, and program editing. TEXTCON users have found
the program helpful for many kinds of file manipulations,
such as adding line feeds where only carriage returns are
present, expanding tabs to spaces, removing all blank lines
from a file, etc.
(. . . caution . . . advertisement follows . . .)
And now, those who contribute $25 or more for TEXTCON will
be sent TEXTDCA, which has all the features of TEXTCON, but
will also write files in IBM DCA/RFT format. This allows
ASCII or WordStar files to be imported to your word
processor with spacing characteristics such as margins,
indents, centering, tab stops, etc. intact.
If you use a word processor that accepts DCA/RFT format
files (this includes Word Perfect, Microsoft Word, IBM
Displaywrite, MultiMate, Volkswriter 3, WordStar 2000, and
many others), TEXTDCA is simply the best program available
for importing ASCII files.
This can save you a great deal of time that would otherwise
be spent reformatting the imported file. An additional
TEXTCON File Converter
feature of TEXTDCA is a menu-driven mode (on PC-compatibles
only) which simplifies the selection of processing options.
(. . . end of advertisement . . .)
It's convenient to divide the functions performed by TEXTCON
into five main categories:
1. Removing carriage returns
The most common problem when importing ASCII files into
word processors is that each line from the original file
will end in a "hard" carriage return. In most cases
these have to be removed manually in order to get the
document formatted properly on the new word processor.
TEXTCON uses a sophisticated algorithm to determine
which sections of text constitute "paragraphs", and then
it removes all carriage returns except those at the ends
of paragraphs. (For this purpose, a paragraph is
defined as a block of text where it is desirable for
words to wrap to following or previous lines when edit-
ing or formatting changes are made.)
TEXTCON can cope with almost any paragraph format
including difficult ones like fully indented (nested),
hanging indent, outline style, etc. It does not depend
on double spacing or first-line indentation, although
these are recognized. It will handle print-formatted
files (i.e., those having a left margin of blanks), as
well as the totally unformatted files used as input to
formatters like NROFF (as long as they use "dot"
commands). It is designed to recognize header lines,
tables, etc. and will avoid reformatting them.
(Of course, document formats vary widely, and it really
takes human intelligence to recognize paragraph breaks
with 100% accuracy. TEXTCON will occasionally make
mistakes when dealing with particularly tricky formats.)
2. Adding carriage returns
TEXTCON will also do the opposite process if you wish,
adding carriage returns to files that have them only at
the ends of paragraphs. Or it can deal with special
file formats by substituting carriage returns for some
other special character that is used to represent a
paragraph end.
3. Removing blanks
An ASCII file may have extraneous blanks that cause
problems if they are imported to a word processor.
There may be blanks at the beginnings of lines for a
left margin or for indented or nested text. There may
2
TEXTCON File Converter
be extra blanks within lines for justified text or
between columns of a table (where tabs are more
desirable). TEXTCON removes extraneous blanks and,
where appropriate, replaces them with tabs, thus saving
manual editing time.
4. Removing extraneous lines
Some ASCII files have extraneous lines that TEXTCON will
remove for you. Print-formatted files, for example,
sometimes have additional lines inserted solely for
underlining or boldface. TEXTCON will remove these so
you don't have to. TEXTCON recognizes double-spaced
files and converts them to single spacing. Lines that
consist solely of "dot" commands (like WordStar's .PA)
are converted to blank lines. You can also remove (or
add) lines by setting the spacing between paragraphs to
any specific number of blank lines you wish.
Consecutive runs of more than two blank lines are
reduced to two, which may help with files that have been
formatted for a printer. TEXTCON also tries to
recognize page breaks and eliminate all blank lines
between pages if possible. The situation is more
complicated if the file contains headers or footers, but
there is now an option which can, in certain cases,
remove these as well.
5. Removing or converting characters
TEXTCON translates all characters in WordStar files to
their ASCII equivalents. It also removes all non-
printing ASCII characters (except tabs) unless you ask
that they be kept. It has three optional methods for
dealing with line-ending hyphens. TEXTCON does not
alter or remove the IBM extended ASCII characters, used
for math symbols, letters from foreign alphabets, etc.
TEXTCON was designed to be as automatic as possible in its
operation so it can be used by someone with very little
knowledge about the files being converted. Although it has
many options for specialized kinds of conversions, it will
work very well on a wide variety of files without the use of
any of the options.
The options are described in a later section. If TEXTCON
doesn't seem to work quite as you want it to, you may want
to read those descriptions. They include more detailed
information about the kinds of changes TEXTCON makes to a
file and how you can control these changes.
TEXTCON is useful for importing text to many microcomputer
word processors, including Microsoft Word, Word Perfect, and
WordStar, as well as some office automation systems, such as
3
TEXTCON File Converter
NBI. If a word processor exhibits problems with "hard
carriage returns" when you import ASCII files, then the
chances are that TEXTCON will help.
Some PC word processors, including Volkswriter, MultiMate,
and PC-Write, actually require the hard returns, and have
trouble with files that do not include them. TEXTCON can
add carriage returns to files so that these word processors
can import them successfully.
If you are working with a word processor that will accept
DCA/RFT format files, the TEXTDCA version of TEXTCON offers
an even higher level of performance than TEXTCON itself.
I am making TEXTCON available for distribution without
charge, but I hope that anyone who uses it on a regular
basis will make a monetary contribution toward the time I
put into developing and supporting it.
USE:
To run the program, use the command form:
TEXTCON [options] infile outfile
where the available options are described in the following
section. (Again, TEXTCON will handle most conversions very
well with no options specified, so most users can ignore all
mention of options here.) The file names can include the
disk-drive identifier and a path name, if appropriate. A
typical command with no options would be as follows:
TEXTCON A:PROPOSAL B:PROPOSAL.TXT
The options are identified by a preceding hyphen as a flag
character, so a command with options might look as follows:
TEXTCON -T5 -B TEXT.DOC B:TEXT.ASC
Multiple options can be combined, using a single hyphen, to
appear as follows:
TEXTCON -T5B TEXT C:\DOCS\TEXT.OUT
You must be careful when combining options this way,
especially if you are using options with numeric "names" or
those with sub-options. For example, if you wanted to use
the options -T3 -2 -KC -B, and you combined them as -T32KCB,
this would be interpreted as -T32 -KCB, which is very
different than you intended. If, instead, you combined them
as -2BT3KC, they would be interpreted correctly. If there
is any question in your mind about this, keep all of the
options separate on the command line. (The menu system in
TEXTDCA simplifies this quite a bit.)
If you specify an illegal option, such as -Q, the program
will display the legal options.
4
TEXTCON File Converter
The file specified as "infile" must be an ASCII file or a
WordStar file; TEXTCON will not work on an internal word
processor file. Some word processors, including PC-Write
and Volkswriter always keep their text in ASCII files. For
other word processors, such as MultiMate, Word Perfect, or
Microsoft Word, you will have to create an ASCII copy of
your file before TEXTCON will work with it. If you try to
convert an internal file, you may not get an error message
from TEXTCON, but when you load the converted file into
another word processor, it will probably contain gibberish.
OPTIONS:
Before converting a file, TEXTCON analyzes the initial
portion to determine certain overall characteristics of the
document. During the conversion, the program applies a
complex set of rules on a line-by-line and character-by-
character basis to determine localized formatting
information. Because of this, the optional parameters
described here are not usually needed. In any case, you
should certainly try a few conversions before using any of
these options. Unless you notice problems or are simply
curious about the options, you can ignore the following
section.
IMPORTANT: The letters used to select options have changed
substantially in Version 1.3. If you have used options in
earlier versions, be sure to check the new descriptions
below carefully, or at least read the "HISTORY" section at
the end of this document for a summary of the changes. If
you use the old option commands, you may get very strange
results because of their new meanings.
The following describes each of the conversion options
available in the program. Note that some of them are inter-
related or similar in function. As a conceptual aid, they
are organized into two groups. Those in the first group are
descriptive of the input file format. If you can provide
this additional information to TEXTCON, it can do a better
job of conversion. The options in the second group describe
certain types of processing that you want TEXTCON to
perform. These options have a direct effect on the format
of the output file.
The options are shown in upper case, but lower case is
acceptable as well.
5
TEXTCON File Converter
INPUT FORMAT DESCRIPTORS
1. -1, -2
TEXTCON is designed to recognize the line spacing
(single or double) used in a file, but in some rare
cases it will make a mistake. This will often happen
when the initial part of the document (the part that
TEXTCON analyzes before starting the conversion) has
different spacing than the rest. When TEXTCON finishes
its analysis of a file, it displays on the screen what
it determined the spacing to be. If this is wrong, you
will have to use the -1 or -2 option to specify that the
input file is single- or double-spaced.
You can also detect an improper spacing decision from
problems in the output file. The usual symptom is that
the converted file either will contain many hard car-
riage returns and be double-spaced, or will have many
paragraphs run together.
If TEXTCON's double-space option is in effect, either
through its own decision or because you specified it,
single occurrences of blank lines are totally ignored,
as if they simply were not in the file. Two consecutive
blank lines are treated as if there were only a single
blank line. Occasionally you may find that this causes
some paragraphs to run together in the converted file.
This would be most likely to happen if single and double
spacing are mixed in the same document, although
normally TEXTCON will handle this correctly.
2. -B
This option tells TEXTCON that your file has only block-
style paragraphs, i.e., there are no paragraphs with
first-line indents or outdents. TEXTCON doesn't need to
know this in order to process a file, but there are some
cases where it can do a better job if it does. This
should be thought of as a little "tweak" for those who
want the absolute best performance. If you use it for a
file that contains non-blocked paragraphs, of course,
performance will be worse.
3. -M#
TEXTCON automatically determines the size of the
document's left margin, but again, it may make a mistake
if the margin becomes smaller toward the end of the
document. If this happens, the conversion will stop at
that point with an error message. It will tell you what
the new, smaller margin value is, and instruct you to
6
TEXTCON File Converter
rerun TEXTCON using the -M option with that value. This
is the only case where you should need to use this
option.
4. -F#, -H#
These are two of the trickier options in TEXTCON, and
should be used with caution. Their purpose is to remove
running headers and footers from page-formatted files,
so they don't wind up intermingled with the text. They
have the potential to save a lot of manual editing time
on some files, but they can mistakenly remove text lines
instead. Of course, the original file is not modified
in any case, so if it doesn't work correctly you can
rerun TEXTCON without these options.
The numeric parameter used with these options is the
number of the line on each page that contains the header
or footer. If you don't want to figure this out
yourself, you can omit the number or use a value of
zero, and TEXTCON will try to determine which line(s)
contain the header and/or footer. Thus, -H3 -F64 would
ask TEXTCON to remove the third and sixty-fourth lines
of each page and attempt to join the text across page
boundaries. -F by itself, on the other hand, would
imply there was no running header and that TEXTCON
should determine which line number appears to be a
footer.
These options depend on a number of assumptions:
o that your document either has exactly 66 lines per
page, or it has less than 66 lines per page and uses
form feed characters to go to a new page (Note that
if a file has extra lines without linefeeds for the
purpose of underlining or boldface, these will be
stripped, and don't count towards the 66 lines per
page.),
o that the header or footer is only one line long,
o that the header or footer always appears on the same
line of every page
o if you do not specify the line number(s), a running
header and/or footer must occur within the first two
pages of the file
If a file meets these criteria, TEXTCON will remove the
desired lines, usually even combining paragraphs across
page boundaries. If a file diverges slightly from that
description, TEXTCON may erroneously delete text lines
from the file. The best advice is to examine closely
any file that has been created using this option.
7
TEXTCON File Converter
5. -W
TEXTCON recognizes WordStar files automatically and
processes them accordingly. When doing this, TEXTCON
assumes that the writer used WordStar "correctly",
taking advantage of all of its formatting abilities.
Unfortunately, many writers use a word processor as if
it were simply a correctable typewriter. This may
include, among other bad habits, using the space bar to
align text or to "nest" paragraphs. TEXTCON will not
perform very well on this type of file, because it is
neither a straight ASCII file nor a true WordStar file.
The -W option tells TEXTCON to treat the input file as a
"semi-formatted" WordStar file, thus correcting for
these sloppy typing habits. If you don't know how to
recognize a poorly done WordStar file, try the
conversion both with and without the -W option and
compare the results. Most WordStar files that I have
tried converted better with the use of this option than
without it.
(For the technically minded, the -W option tells TEXTCON
to convert all soft spaces and soft carriage returns to
hard spaces and hard returns in order to determine the
intended formatting of the file. TEXTCON then strips
out any of the spaces and carriage returns that it
determines are not needed. The most common undesired
side-effect of this is that TEXTCON will occasionally
make a wrong paragraphing decision.)
6. -X, -Y
As described under "POSSIBLE PROBLEMS", below, line-
ending hyphens are normally preserved and a space is
inserted after them, so that you can find each one and
make a decision as to whether it needs to be kept in the
document. If you already know that all hyphens are
required hyphens or that all of them are "soft" hyphens,
you can save some editing time by using the -X or -Y
options.
The -X option indicates that all line-ending hyphens are
required hyphens. TEXTCON will leave them in the text
and will not insert a blank. This is useful if you know
that no "soft hyphenation" has been performed on the
file.
The -Y option indicates that all line-ending hyphens are
"soft" hyphens, and that TEXTCON should remove them
entirely. This is not a very useful option, because it
8
TEXTCON File Converter
would be a rare document that you could safely assume
had no line-ending required hyphens.
7. -Z#
This is a very specialized option that would not often
be used on standard document files. It allows you to
specify an alternative character that marks the ends of
"paragraphs" in your file.
The character is specified by means of its decimal ASCII
code, so for example, -Z14 would look for a Ctrl-N to
mark the ends of paragraphs, -Z35 would look for the
symbol #, and -Z236 would look for the infinity symbol
. The only ASCII values not allowed are 0 and 255.
When this option is used, TEXTCON will do two things
differently:
a. treat all carriage returns as soft returns, removing
them from the file, and
b. treat all occurrences of the specified character as
hard returns, removing them from the file and
substituting a carriage-return/line-feed pair.
This option can be extremely useful for certain types of
file transfers, particularly those involving databases,
certain desktop publishing applications, and
manipulations of bulletin board message files
SPECIAL PROCESSING DESCRIPTORS
1. -K
As mentioned earlier, one of TEXTCON's major jobs is to
remove certain unneeded elements from your file. In
some cases you may want some of these elements to be
kept; the -K option allows this.
The -K option is a bit different from the other options
in the way it is specified. It has several "sub-
options" represented by additional key letters, which
must immediately follow the -K. If, for example, you
wanted only the S sub-option, the full option descriptor
would be -KS, whereas if you wanted all of the sub-
options, you would use -KSCRB. (You may also use the
full option more than once on the command line, so -KS
-KC -KR -KB would also invoke all of the sub-options.)
The "Keep" sub-options are as follows:
9
TEXTCON File Converter
a. S sub-option
The S sub-option of Keep instructs TEXTCON to keep
all spaces in the converted file.
In addition to the substitution of tabs for multiple
spaces (described under the -T# option below),
TEXTCON replaces any set of two or more spaces with
a single space unless it is at the end of a
sentence. At the end of a sentence, it replaces
three or more spaces with two. This helps with
files that have had spaces added to justify the
right margin. TEXTCON also removes all leading and
trailing spaces from each line it processes.
In some special cases this processing may be
undesirable. The S sub-option of Keep overrides
both the substitution of tabs for multiple spaces
and the deletion of spaces, so that all spaces are
kept as found in the original file. You would not
normally want to use this option for a file with a
left margin of spaces, because those spaces would be
incorporated into the paragraphs of text.
b. B sub-option
The B sub-option of Keep instructs TEXTCON to keep
all blank lines (except those within double-spaced
paragraphs) in the converted file.
Normally, if TEXTCON encounters more than two
consecutive blank lines (or four in a double-spaced
document) it removes the "extra" ones (in either
case, leaving only two in the converted document).
It also tries to recognize print-image files, i.e.
ones that contain the actual page breaks in the form
of multiple blank lines or form-feed characters at
the end of one page and beginning of the next. If
it does recognize this, it will remove the page
break entirely and will reconstruct a paragraph
broken between the pages. When TEXTCON's analysis
detects this type of format, it prints a message
describing the file as "page-formatted".
The B sub-option of Keep overrides this blank-line
stripping, so that all blank lines are kept in the
file.
c. R sub-option
The R sub-option of Keep instructs TEXTCON to keep
all carriage returns in the converted file.
10
TEXTCON File Converter
Some word processors (including WordStar, Microsoft
Word, SuperWriter, and WordVision) create files that
do not have carriage returns at the ends of lines,
but only at the ends of paragraphs. This greatly
simplifies the job that TEXTCON has to do. TEXTCON
will normally recognize these files, and display the
message "All carriage returns will be preserved."
If it does not recognize such a file, the usual
symptom is that the converted file frequently has
what should be separate paragraphs combined into one
paragraph. In this case you will need to use the R
sub-option of Keep.
This is needed very rarely however. The most common
use that I have found for this sub-option is to take
advantage of some of TEXTCON's other features, such
as tab insertion or double-to-single-spacing
conversion, without its carriage-return stripping.
Note that the R sub-option does not affect blank
lines. These are still stripped from the file
according to the rules explained above. If you want
to keep all lines intact you must use both the R and
B sub-options.
d. C sub-option
The C sub-option of Keep instructs TEXTCON to keep
all control codes (ASCII characters between 1 and
31.
TEXTCON normally strips all control codes, with the
exception of tab characters. If you want control
codes kept, use the C sub-option.
2. -T#
TEXTCON was designed primarily for importing files to
the more sophisticated word processors, where documents
are often printed with proportional spacing. For this
kind of work, tabs are used extensively to position
items in a document; multiple spaces will not work
correctly. For this reason, TEXTCON preserves tabs
rather than expanding them with blanks. In some cases,
multiple blanks are preferable, so I have provided an
option for this.
The -T option requires a numeric value (e.g., -T4 or
-T0), specifying the number of spaces between tab stops.
The first tab stop is always at column one. When a tab
is found, enough spaces are substituted in the converted
file to position the following character at the next tab
stop. The default, of course, (if the -T option is not
11
TEXTCON File Converter
specified at all) is that tabs are preserved, whereas a
value of zero (-T0) means they are removed entirely.
If the -T option is used, TEXTCON's normal behavior of
substituting tabs for multiple spaces is turned off
also. This substitution is normally done in three
circumstances: at the beginning of a paragraph whose
first line is indented; between items in a columnar
table; and between a list-identifying number, letter, or
other symbol and the corresponding list entry (for
example, the item "3. -I#" just below).
This means that if you have a file that does not contain
tabs, and you simply want to suppress TEXTCON's
substitution of tabs for spaces, you can use -T with any
numeric value to accomplish this. The number you use
doesn't really matter here, since it is used only to
determine the number of spaces to substitute when a tab
is found in the original file.
3. -I#
As described under the -T# option, TEXTCON normally
substitutes a tab character for multiple spaces at the
beginning of indented paragraphs. The -I# option allows
you to use a specific number of spaces instead, or to
convert indented paragraphs to block-style paragraphs.
This option requires a numeric value indicating how many
spaces are to be used for indentation. If, for example,
you specify -I5, all indented paragraphs in the
converted file will have a first-line indentation of
five spaces. Using -I0 will convert indented paragraphs
to block-style paragraphs (zero indentation). The -I#
parameter has no effect at all on paragraphs that are
already block-style or have hanging indents.
4. -P#
TEXTCON normally leaves paragraphs spaced the same way
they are spaced in the original file. The usual style
for single-spaced documents has one blank line between
paragraphs; double-spaced documents usually have no
extra blank lines. If your original document has one
kind of line spacing and you want to print the new
document with different spacing, you may find that the
paragraph spacing is either too large or too small.
The -P# option lets you change that spacing. For
example, -P0 will eliminate any extra blank lines
between paragraphs, so you might use it if your original
file was single-spaced and you wanted to print the new
12
TEXTCON File Converter
copy double-spaced. -P1 will end each paragraph with
exactly one blank line, so you might use it for the
opposite case.
The -P# parameter has no effect on paragraphs that
consist of a single line of text; those are assumed to
be lists or tables whose spacing should be preserved.
5. -L#
TEXTCON automatically determines a "typical" line length
for your document and from this calculates a "cutoff"
length used in its paragraph-determination algorithms.
If a line is shorter than the cutoff length, TEXTCON
assumes that the carriage return at the end of that line
was put there intentionally, and the program will not
delete it.
You can use the -L# option to override TEXTCON and
specify your own cutoff length. Note that the length of
a line is not measured from the very beginning of the
line (column 1), nor is it measured from the first non-
blank character on that particular line. The length is
measured starting at the left margin of the document,
which is determined by the leftmost non-blank character
found anywhere in the document. If, for example, the
left margin of the document were 10 characters (meaning
the leftmost character in any line occurred in position
11) and the cutoff length were 30, a line with 15
leading spaces followed by 20 characters would have a
length of 15+20-10 = 25, and so would be shorter than
the cutoff length.
6. -S#
The -S# option is useful for word processors such as
Volkswriter and PC-Write which require carriage returns
at the end of each line rather than at the end of each
paragraph. It tells TEXTCON to split each paragraph
into lines of a particular length, given by the numeric
parameter. For example, -S65 says that the output file
should contain lines that are approximately 65
characters long.
When you use this option TEXTCON splits lines at the
first space following the specified length. This means
that the lines in the file will, on average, be one word
longer than the length you specify, and some of them may
be as much as 10 or 15 characters longer.
This option will only work on files that have very long
lines, that is, those files where TEXTCON will keep all
13
TEXTCON File Converter
carriage returns. It will not, for example, allow you
to take a file with paragraphs made up of 80 character
lines and reformat those into paragraphs of 60 character
lines. That would require it to remove some carriage
returns and add others, which it cannot currently do.
Almost any word processor should be capable of that kind
of reformatting.
POSSIBLE PROBLEMS:
Many of TEXTCON's decisions are based on its analysis of the
beginning of your input file. It analyzes approximately two
pages of text, but this will vary from file to file. If
your file has sections that are very distinct in formatting,
the parameters that TEXTCON determines from the beginning of
your file may not be accurate for the rest of the file. In
these cases, TEXTCON will perform better if you subdivide
the input file and process each distinctly formatted section
separately.
Words from the original file that are hyphenated at the end
of a line will remain hyphenated, and an extra space will be
inserted following the hyphen. For example, the word ex-
ample will be converted to ex- ample. You can find these
and convert them fairly easily by searching for "- " (a
hyphen followed by a blank). The program could have been
designed to remove hyphens at the ends of lines, but then it
would also have removed required hyphens, as in ex-
president. You may want to use the -X or -Y options to
change this behavior.
When a converted file is loaded into the new word processor,
tables may have their columns too close together or too far
apart. This is because TEXTCON puts tab characters into
tables, but it cannot set the positions for the tab stops.
As soon as you set the tab stops where you want them, the
columns will line up correctly. The TEXTDCA version of
TEXTCON can also preserve the settings of the tab stops,
thus saving some additional time.
Sometimes TEXTCON will fail to remove the carriage returns
within a nested or fully-indented paragraph. A common
reason for this is that the person who created it started
each line with a tab, rather than using an indent command.
You can get around this by using the -T# option with some
suitable tab value (usually 5 is a good choice).
This problem will also occur if the paragraph is indented a
large amount from the right margin, making the lines shorter
than the cutoff length. Correct this with the -L# option,
using a numeric value that is less than the shortest line.
14
TEXTCON File Converter
Be sure to take into account the document margin when
calculating this number.
The program is written in C, using the DeSmet C compiler. I
have tested it only on IBM PC-compatibles, but it should
work on other MSDOS machines. I have heard from one person
who has used it successfully on a DEC Rainbow.
OTHER USES FOR TEXTCON:
TEXTCON users have found some ingenious ways to use the
program - tasks for which the program was not intended, but
which it does quite well. The following examples may
suggest some additional ways TEXTCON can aid in your text
processing work.
1. Use of the Keep Option
TEXTCON's -K option figures prominently in most of these
special uses. If you use -K with all of its sub-options
(-KBCRS), the output file will be identical to the input
file, with a few exceptions. This would seem to be a
pointless thing to do, unless, of course, those
exceptions are important to you. They are as follows:
a. If the input file has lines that end with only a
carriage return, TEXTCON will add a line feed to
each of them. You may occasionally get files of
this type, from certain programs or from other
computers, and you may find that your word processor
will not accept them without the line feeds.
b. If the input file has no detectable formatting,
TEXTCON assumes it was intended for a print
formatter. In this case, TEXTCON will remove "dot"
commands from the file.
c. TEXTCON deletes trailing blanks from each line.
d. WordStar files are always converted to ASCII.
Each of these conversions can be extremely useful for
certain kinds of files, even when you don't need the
carriage-return stripping that is TEXTCON's main
purpose.
2. Adding Carriage Returns
You may sometimes get files from another computer where
a line-feed character, rather than a carriage return, is
used to mark the ends of lines. This causes great
difficulty for some PCDOS software.
15
TEXTCON File Converter
TEXTCON can convert these files by use of the -Z#
option. The decimal ASCII code for line feed is 10, so
the full option would be -Z10. You may also want to use
-KBCS to keep other characteristics of the file intact.
The -Z# option overrides the -KR option.
3. Removing Blank Lines
TEXTCON removes multiple blank lines by default, but
leaves up to two blank lines separating paragraphs. If
you want to remove all blank lines from a file, use the
-P0 option. One TEXTCON user needed a count of only the
non-blank lines in an ASCII file, but couldn't find a
counting program that would do that. Using TEXTCON with
-P0 and -KR produced a file with only the blank lines
removed.
4. Tab Expansion
For certain programs and certain applications it may be
inconvenient to have tabs in a file. TEXTCON can remove
them and expand them to spaces via the -T# option. If
you use this along with the -KBCRS option, the output
file will be nearly identical to the input file, but
with tabs exapnded to spaces.
This option can also be useful when dealing with badly
formatted files. Some people create fully indented
paragraphs by inserting a tab at the beginning of every
line of the paragraph rather than by using their word
processor's indent function. This creates a mess if you
have to edit those paragraphs or move them to another
word processor. TEXTCON will interpret them as
individual lines, not as paragraphs. However, if you
use the -T# option, TEXTCON will correctly recognize
them as fully indented paragraphs.
GUIDELINES FOR SPECIFIC WORD PROCESSORS:
The following gives some guidelines for preparing documents
for an ASCII file transfer with five different word proces-
sors. It is written with the assumption that you will be
transferring the files to the NBI Oasys 64 system. However,
almost all of the advice also applies to transfers into
WordStar, Word Perfect, and Microsoft Word.
There are certain restrictions imposed by the ASCII transfer
process that apply to all word processors:
16
TEXTCON File Converter
1) All character formatting, such as boldface, underline,
subscripting, and superscripting, is lost. The text
that it applies to remains intact, but the formatting
must be redone on the target word processor.
2) All spacing characteristics, such as margins, inden-
tation, centering, and justification are lost. All
paragraphs, including nested ones, will be moved to
start at the left margin. Paragraphs with a first-line
indent (i.e., non-block style) will have a tab inserted
for this purpose.
Microsoft Word
Microsoft Word is probably more compatible with the NBI sys-
tem than is any other word processor. There are only two
rules to remember when using Word. Use tabs rather than
multiple spaces in your original document for maximum
flexibility in reformatting. Don't use the "newline" key
(Shift-Enter), because it does not appear in an ASCII file
as a line-ending character.
Characteristics that you assign to your document using the
Format command (Character, Paragraph, or Division) will be
lost in the transfer. Footnotes entered using Format
Footnote will be transferred, but they will all be grouped
together at the end of the file and there will be no indi-
cation of where they are referenced.
To save your file in ASCII form, you must use the Transfer
Save command with the "Formatted" option set to "No". Be
sure to give the file a different name than you have been
using for your document, or the ASCII file will replace your
formatted document file. (If you make this mistake, you
should immediately exit Word and copy the .BAK file back to
the .DOC file and then try the save again.)
Word Perfect
Use tabs rather than multiple spaces in your original
document for maximum flexibility in reformatting.
Footnotes and endnotes will not be included in the ASCII
file; they will have to be retyped on the NBI.
You should use the Text In/Out command (Extended Features -
Prepare/Protect in version 3) to save your file in ASCII
form. Do not use the Print command to create this file.
When you use the Text In/Out command, do not use the same
file name that you use for your document, or the ASCII file
will replace your formatted document file and there will be
no way to get it back.
17
TEXTCON File Converter
Version 4.2 of Word Perfect has a new, additional method of
file saving available under the Text In/Out command. This
creates an ASCII file with only the paragraph-ending
carriage returns included. If you create an ASCII file
using this option, there is no need to use TEXTCON before
importing the file to another word processor, unless you
want to do some special reformatting such as expanding tabs,
changing paragraph spacing, etc.
PC-Write and Volkswriter
Files from these two word processors can be the most diffi-
cult ones to transfer, because their format is so dependent
on the style of the particular writer. Fortunately,
TEXTCON's paragraph-recognition algorithms really shine when
working with files from these word processors, so there are
relatively few rules for you to follow.
Turn the Justify option off, so your text is not right-
justified. TEXTCON will take out extra blanks that are
inserted for justification, but it occasionally makes
mistakes.
PC-Write and Volkswriter convert all tabs to spaces, which
means that tables may not transfer well. TEXTCON tries to
put tabs back in where they are needed, but will not always
do this correctly.
Footnotes will be transferred to the NBI, but they will
appear in the middle of the text that references them.
PC-Write and Volkswriter files are always stored in ASCII
form, so you don't have to do any special type of save
before transferring the file.
WordStar
WordStar files can be converted very effectively by TEXTCON.
The only limitation is that tables may not transfer very
well, because WordStar converts all tabs to spaces. TEXTCON
tries to put tabs back in where needed, but will not always
do this correctly.
WordStar files can be read directly by TEXTCON; you do not
have to do a print to disk. Be sure to try the -W option
when using TEXTCON on WordStar files. It is not required,
but it generally gives a better translation.
18
TEXTCON File Converter
HISTORY:
The original version of this program was written to assist
in the transfer of documents from microcomputer word proces-
sors and optical scanners to an NBI Oasys office system.
Because of the wide variety of word processors used by the
people involved, it wasn't practical to try to accommodate
all the different internal file formats. Instead, the
program was designed to reformat standard ASCII files into a
form that could be imported easily into another word
processor. It was very important that the program properly
process as many different varieties of ASCII file formats
and writing styles (indentations, paragraphing, line
spacing, etc.) as possible.
The original program was quite complicated to use and the
algorithm employed was quite simple-minded, so that certain
formats were not handled as well as they could be. This new
program is based on two years' experience examining files
from different writers with different word processors,
talking to secretaries about how they set up different
documents, and identifying the significant patterns of lines
and characters that commonly occur. The program now
analyzes each input file and makes very intelligent
decisions about the type of file and the paragraphing style
used.
Version 1.1
This was the first version to be widely distributed.
Version 1.2
1. Renamed the former -L option to -B.
2. Changed the -T# option to use true tab stops.
2. Added new -L# option, as well as -H, -Y, and -R.
3. Expanded the file analysis stage to determine
additional document characteristics, including
typical line length, standard margin, recognition of
unformatted, formatted, and print-formatted files,
and header and footer locations.
4. Fixed a bug in the table-recognition section.
5. Additional fine-tuning of parameters and algorithms,
particularly in regard to hanging indents, centered
lines, and list items.
Version 1.3
(NOTE: If you used an earlier version and have not sent
a contribution, please consider doing so now.)
19
TEXTCON File Converter
1. Renamed many options. -B, -C, and -S became the B,
R, and S sub-options of the new -K (for Keep)
option. -R was split into the -F and -H options,
both of which now accept a line number as a
parameter. -D became -2, and -H became -X. I hated
to change these so substantially, but it really
seemed necessary. I don't think a major change will
be necessary again.
2. Dropped one option. -W was no longer needed because
of improvement in recognition of WordStar files.
However, see the new -W option below.
3. Added new options. -1 option specifies single
spacing. -B specifies that all paragraphs are block
style. -M# specifies the minimum size of the left
margin of the document. -W option specifies
different processing of WordStar files. -Z#
specifies that the original file has a particular
character that always marks paragraph ends. -S#
will split files with long lines into shorter lines.
The C sub-option of the -K option specifies that
control codes are to be kept in the new file.
4. Automatic removal of lines that are added only for
print emphasis. In a file whose lines end in CR-LF
pairs, these are easily recognized because they are
preceded by a line without a line feed.
5. Additional improvement of decision rules and general
fine tuning for better paragraph recognition.
TEXTDCA Version 1.3
Introducing a new program which will be sent to those
contributing $25 or more for TEXTCON. It has two
features not found in TEXTCON:
1. DCA/RFT output format. The -D option specifies that
instead of an ASCII file, the output should be
written in DCA/RFT format. Most of the major PC
word processors now support this format, which
unlike ASCII files, can contain formatting
information such as margins, centering, tab
settings, indents, etc. Now TEXTCON can pass all
this information on to your word processor, saving a
tremendous amount of reformatting.
2. Menu mode. TEXTDCA permits optional menu-driven
selection of processing options, for those who have
trouble with its normal command-line syntax. The
menu system works only on IBM-PC-compatibles, not on
MSDOS machines such as the Wang PC, DEC Rainbow, TI
Professional, Tandy 2000, etc.
20
TEXTCON File Converter
DISTRIBUTION:
The TEXTCON program described above is Copyright, 1986,
Chris Wolf.
TEXTCON accomplishes its purpose as described above and, if
used carefully, will cause no known damage to a computer
system or its files. All users should maintain backup
copies of their own files and the author bears no responsi-
bility for losses arising from their failure to do so.
TEXTCON may be freely copied and distributed to others, but
no one may charge a fee for such distribution, beyond a
modest disk preparation charge. All copies of TEXTCON must
be accompanied by this documentation file.
I intend to support this program, continue to enhance it,
and fix bugs in it. If you encounter problems with it, or
have questions about it, I would like to hear from you. My
Compuserve ID is 72446,2704.
If you find that this program saves you time in your work
and you use it regularly, please send me a contribution to
help offset the time and resources I have spent developing
and supporting it. If you are using it in an office
environment, with multiple users on multiple computers,
please consider this in determining the size of your
contribution.
Those who contribute $25 or more will be sent the TEXTDCA
program, which is described elsewhere in this documentation.
Chris Wolf
1521 Greenview Ave.
East Lansing, MI 48823
office phone - (517) 353-5017
21
December 6, 2017
Add comments