Dec 192017
Full C source to a simple LEMPEL-ZEV file compressor.
File LZW.ZIP from The Programmer’s Corner in
Category C Source Code
Full C source to a simple LEMPEL-ZEV file compressor.
File Name File Size Zip Size Zip Type
DEBUG.H 256 134 deflated
LZW.DOC 3584 1347 deflated
LZWCOM.C 2688 1034 deflated
LZWCOMM.C 6784 2154 deflated
LZWUNC.C 4224 1537 deflated

Download File LZW.ZIP Here

Contents of the LZW.DOC file

V 2.0 lzwcom.exe and lzwunc.exe


This is a new version of my file compressor/uncompressor
based on the lempel-zev data compression algorithm. An
earlier version is on the Computer Language BBS.

lempel-zev data compression builds a table of strings on the
fly during compression, where each string has the property

If is in the table then
is also in the table.

The compressed file consists, therefore, of the indexes into
the string table, rather than the strings themselves,
resulting in very good compression rates.

Decompression builds the same table from the input data that
the compressor built during compression. There is no data
table transmitted first, as with Huffman encoding. The
algorithm is (I hope) fairly clear in the source code.

Compression rates vary according to data, but are generally
better than Huffman encoding. For example, COMPRESS.LBR is
31K, but the same library after all files are compressed is
only 21K.

Note well, however, that the algorithm stops adapting to
input data once the table is filled. If the nature of the
data changes after the table is filled, compression rates
will suffer. When I compressed the whole COMPRESS.LBR,
rather than each file within it, it came out to 26K. This
is because it 'adapted' to the data at the head of the LBR
file which was C source code, and then tried to compress
the EXE files, that contain entirely different data.


V 2.0 lzwcom.exe and lzwunc.exe

Implementation Notes

For this version, I started embedding crc-16's in the
compressed files every 1K, and checking them when I
uncompress. 2 characters per 1K overhead seemed a small
price to pay for data integrity, and it didn't seem to slow
things down any.

These programs were compiled w/MANX Aztec C86 - they should
be portable across their entire compiler family. They will
also compile under Xenix 286 without change - and probably
any other unix system. If you go to Unix, you might want to
play some games with file permissions, etc., so they get
preserved across compression/uncompression.

Beware of compiling with Lattice - I use regular FILEs and
getc and putc for I/O, and Lattice has some non-portable
open modes that must be dealt with. If you use "rb" instead
of "r" for the input file and "wb" instead of "w".

any questions or comments to
Kent Williams
550 2nd St. S.E.
Cedar Rapids, IA 52401
(319) 369-3131 (work) (319) 338-6053 (home)


 December 19, 2017  Add comments

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>