Tag file sorter and Duplicate remover.

Tagfile Sorter and Dupelicate Remover:
This freeware program will sort files
up to 25000 lines long and remove the
vast majority of duplicate lines.
This can save you disk space, time and
trouble when it comes to files full of

Contents of the TSDR.DOC file

TSDR - Tagfile Sorter and Duplicate Remover v0.02
Copyright(c) 1993 McCormick & McCormick Software

This product is Freeware. Feel free to use it at your own risk. While we
can make no guarantees as the suitability of this program we hope that it is
useful. In fact, the only things we can guarantee is that it will take up
disk space on the drive it resides on and that it passed our beta-testing.
But all that means nothing in real life computing!

By using this program you release McCormick & McCormick Software from any
responsibilities or costs that may be incurred by the use of this program.

I made this thing cause I was stupid enough to point out that one of the
TAGLINES (FidoNet) posters had to many dupes in his messages. He came back
with the fact that sorting his 8000+ line file was, to his knowledge,
impossible. And without sorting it he could not easily remove and delete
the dupes. But he promised that if I could get it sorted, he would delete
them. As a SysOp I figure we all may save enough in reduced costs if all
the DUPE-TAGGERS in TAGLINES use this! Knowing that I'm helping Fido cost
control (in a VERY VERY small way...) is payment enough for this first try!
Perhaps this is the pseudo guilt I feel over living in SoCal where feeds are
plentiful and free.


TSDR can only work with files of 25000 lines or less. Beyond that point the
pre-sorting routine will stop loading. If there are more that 1200 lines
starting with the same first five characters then some duplicate will
survive. So if you have over 1200 "Oxymoron: ..." lines, expect trouble.
The program should also chop off any line over 80 characters. The input and
output must be separate files.

If the 80 char cutoff proves to be a real problem, I'll just add a little
code to pass out the 80+ lines to another file so they may be manually
handled by the user (that is unless I can find a real fix)!

You should also note that only true duplicates are deleted. Near duplicates
(such as the difference between "FLW:" and "FLW -") are not deleted.


The following command line is all that you need. It does the rest. Please
be sure the infile exists and that the outfile may be overwritten. There is
no code to check the existence of either. While a bad or missing infile
will just give a minor error, you don't want to accidentally overwrite some
important file!

infile-^ ^-outfile

If there is no path selected TAGFILE.EXT must be in the current directory.
If there is no OUTFILE.EXT selected then the TAGFILE.EXT will be used with
the last character changed to a 1.

TDSR cannot overwrite the original file. To try to do so will crash the
program. At this time there are no other options.

TDSR DOES contain logic to ignore "... " and will process lines with them as
if the "... " was not there. At the same time, it does not delete the
"... ". This and the EX1 are the only real amenities of this program.

Comments and questions may be sent electronically to:

Eric McCormick 1:207/[email protected]

or via U.S. Snail... ahem, U.S. Mail to:

McCormick & McCormick Software
700 E. Redlands Blvd. Box 293
Redlands, CA 92373

At this time there are no real plans for any further revisions. We made one
or two little mods to see if we could get it to work faster. That was
promised in v0.0. We also cleaned up a few minor bugs. This resulted in
the ability to sort longer files, reduced duplication chances, etc. If user
messages come in requesting a better version, this may very well change.
Considering the simplicity of this utility and the fact we are not asking
for $$$, don't expect any major renovations in the near, or not so near

