Dec 092017
 
Convert 2 Column ASCII files to 1 longer column. For OCR applications.
File 2COL.ZIP from The Programmer’s Corner in
Category Word Processors
Convert 2 Column ASCII files to 1 longer column. For OCR applications.
File Name File Size Zip Size Zip Type
2COL.DOC 4197 1957 deflated
2COL.EXE 44613 31631 deflated
TEST.DOC 6608 2683 deflated

Download File 2COL.ZIP Here

Contents of the 2COL.DOC file


2COL.ARC (c) 1988, GMUtant Software

Files in this archive:

2COL.EXE the utility program
2COL.DOC this documentation
TEST.DOC a sample 2 column file

2COL.EXE is a program developed to help automate the editing
of files produced by an OCR scanner. Having worked for some
time with a PALANTIR CDP, we're constantly trying to develop
software that assists in the post-scan editing of the ASCII
files that device produces. Recently we added 2COL to our
collection of OCR utilities.

2COL answers the need for a program that will move the right
hand column of a 2 column text file down under the left. Some
scanners, e.g., the Kurzweil Discover, require that you run
the page through the scanner twice to pick up both columns.
With 2COL you can handle the page as a single column and let
this program perform the move. Here's what happens:

--1-- --2-- --1--
--1-- --2-- --1--
--1-- --2-- --1--
--1-- --2-- --1--
--2--
A two column page --2--
of text like this --2--
--2--

becomes a 1 column
text file.

Why would you want to do that? Depends on your application,
but if you're using the OCR produced file in a PC environment
you'll typically find that you can't display both columns
on a single screen from within most DBMS or text retrieval
software. Moreover, if you decide to edit the file to make
a few corrections (all scanners make mistakes), you'll find
that your word-processor probably has fits moving back and forth
between the columns.

2COL will also pull your left column text flush with the right
margin--but preserve indentations for paragraphs.

* * 2COL DEMO * *

If you want to test it, use TEST.DOC. This page from the
journal LIBRARY SOFTWARE REVIEW was scanned using the
Palantir CDP (Compound Document Processor). The 2 column
file is EXACTLY as produced by the scanner. Running 2COL
over the file creates a 1 column output file.


PROGRAM LOGIC

2COL reads through the file to compute the average line length,
then rewinds file and looks at it again. On the 2nd pass it
checks approximately 20 characters on either side of the average mid
point, looking for a user-defined number of spaces together (it
assumes that's the area between the columns). For each line
the starting location of this range of spaces is stored,
then averaged. This average is assumed to be the column break
point for the entire page. The program rewinds the file again, reads
through writing the characters up to the break point to a temporary
output file (COL1.DAT) then again, writing all lines beginning
each one at the breakpoint position (COL2.DAT).

When finished, COL1.DAT and COL2.DAT are joined to create your
final output file. The temporary files are destroyed.

TROUBLESHOOTING

The program is not fool-proof, but it does work well most of the
time. If it bombs, it's probably because you didn't set the
number of spaces correctly. Set it too high and the program wil
surely crash.

Where it does work, it sure beats the alternative (using the
cut&paste function of a word processor to perform column moves)!

USE IN BATCH FILES

2COL can be run from the command line.

Usage: 2COL InputFileName # OutputFileName.

The # is for the number of blanks to search for between columns. Three
seems to work pretty well. You can raise or lower the # until you get
it just right...

If you're working with OCR files (particularly the Palantir) you may
be interested in some of the other GMUtant utility products. We've
got one STRIPCAT that corrects some scan errors, concatenate files,
and stuff like that.

Source code for 2COL is available (QUICKBASIC 4.0) for $ 10.00.

Questions/Comments?

Wally Grotophorst
Fenwick Library
George Mason University
4400 University Drive
Fairfax, VA 22030

phone: (703) 323-2317 bitnet: [email protected] fax: (703) 323-3582


 December 9, 2017  Add comments

Leave a Reply