Contents of the README.DOC file
Programs from "Arithmetic Coding for Data Compression"
By Ian H. Witten, Radford M. Neal, and John G. Cleary
Communications of the ACM
June 1987 Volume 30 Number 6
This article describes a different approach to data compression. The
process of compression is divided into two seperate functions. The
first function is the "model" which predicts the probability that a
given character will appear next in the input. The second function
uses these probabilities to encode the data using a floating point
number representation to achieve the maximum possible compression
(even better than the Huffman method).
Many different models can be used. One is a static model that assigns
fixed probabilities to the characters (based, for example, on their
known frequency in english text). Or you could make a pre-pass over
the file to statically determine the actual frequencies for each
The authors prefer a technique they call the "adaptive model" which
keeps track of the cumulative frequencies as the file is read.
Other more sophisticated models could be used. The bibliography to
the article references two different adaptive models that result in
an average of 2.2 bits/character for mixed-case English text. This is
a compression factor of 363%.
These programs are written for Microsoft C Version 4.00. No
optimizations have been attempted.