# Category : C Source Code

Archive : CSRC2.ZIP

Filename : PERFHASH.DOC

* P e r f e c t H a s h

*

*

* Program to search for Minimal Perfect Hash Functions for

* use in lexical analyzers. C.D. Havener Jan 26 1982

* GenRad Inc. 37 Great Road, Bolton Mass. 01740

* Based on paper "Minimal Perfect Hash Functions Made Simple"

* by Richard J. Cichelli - Comm. of ACM Jan 1980 pp 17-19

*

* Synopsis: The hash function is h = assoc value of 1st

* letter + length of keyword + assoc value of last letter

*

* This program finds the associated values of the letters

* given a list of keywords, 1 per line. It works most of

* the time for up to about 40 keywords but certain

* pathological cases exist. A semi-perfect hash is usually

* found by the program. The user can then tighten the

* default limits for max associated char value or the

* table limit using the -v and -t options. Sometimes the

* presort heuristics actually make the search process

* much more difficult. The user can try his luck at manual

* sorting using the -n option. Since the hash function

* produces such a limited range of numbers it can only work

* for up to about 40 keywords. If a language needs say 80

* keywords just split them up into two tables and let the

* lexical analyzer look in first one then the other, this

* will still be much faster than any other keyword lookup.

*

* This program has run sucessfully on a VAX 11/780 under

* 4.1BSD and VMS (using Vax-11 C and Decus C).

*/

synopsis

perfect [-options]

description

perfect reads a list of keywords from the standard input

and computes a "perfect hash" function for the set.

The following options are defined:

-d Enable debug code.

-n Disable pre-sort of keywords. (See below).

-t

-v

-k

-a

(Unix and Vax-11C/VMS only).

-o file Write a sample keyword lookup routine to the

indicated file.

The program prints a running commentary on the standard

output.

Discussion

This technical note describes an implementation of a

pragmatic algorithm for finding perfect or semi-perfect hash

functions for lists of keywords. The resulting hash function can

be used to speed up lexical analyzers used in translators and

compilers. The algorithm was described by R.J. Cichelli in The

January 1980 issue of Communications of the ACM under the title

"Minimal Perfect Hash Functions made Simple." The article did not

include a computer language description of the algorithm and some

important implementation details were unclear. It is assumed that

the user will know what to do with the output of this program.

The -o option may be used to produce a lexical analyser kernel

from the tables built by this program. Another reference for

those wishing to pursue the topic further is "More On Minimal

Perfect Hash tables" by Curtis Cook and R. Oldehoeft, a Colorado

State University Technical Report.

The program takes a list of keywords and sorts them in such

a way that the search time for a hash function will be

reasonable. Once sorted a recursive trial and error procedure

hunts for a hash function satisfying user supplied bounds

of table size and associated character value limits. The

built-in hash function is

hash = assoc value of first character

+ keyword length

+ assoc value of last character

It is critically important to select a good ordering of the

keywords before searching begins. I ran up 100 hours of VAX time

searching for a hash function with an unordered list, and gave

up. Once the sort heuristics were debugged it found the hash

function in minutes. Typically it will find the function in

minutes or you may as well give up. A status reporting feature is

built into the program on the UNIX system that lets you follow

the progress of the search depth. If it has trouble, you can

tell just which word it can't get past, and take appropriate

action. If it has trouble you can attempt to alter its choice of

pre-ordering by moving troublesome words to the front of the

list. In a sequel to his paper, Cichelli stated that sometimes

the sort heuristics makes the search longer. There is also no

guarantee that a hash function can be found. The program does the

obvious precheck that no two keywords have the same first and

last letter and length in common (e.g. BAK KAB). Nonetheless,

as pointed out by Jaeschke and Osterberg in an overly harsh

criticism, there are many pathological sets of keywords that fail.

(On Cichelli's minimal perfect hash functions method. Comm. ACM

Dec. 1980, pp 728-729) The algorithm also only works for up to

about 40 keywords due to the limited range of numbers the formula

can generate. If you have, say 80 keywords, just make two hash

tables and look in each one. Here are some examples of the

program's output:

Perfect hash function finder, CDH Ver. 2.9

Start time = Mon Oct 17 20:40:40 1983

28 keywords, 19 distinct letters.

The associated char value limit is 19

The table size limit is 100

The search ordering is

else double continue case float struct sizeof

static short extern typedef default register for

char while entry int if return do unsigned switch

union goto auto long break

Success! Associated Char Values Follow:

a =19, b = 4, c = 1, d = 0, e = 0, f = 3, g =17,

h =14, i =17, k =19, l = 6, n = 8, o = 0, r =11,

s = 6, t = 0, u =16, w =13, y =14,

Hash min = 2, max = 30, spread = 29

do 2, else 4, case 5, double 6, default 7,

float 8, continue 9, typedef 10, short 11, struct 12,

static 13, extern 14, sizeof 15, char 16, for 17,

while 18, entry 19, int 20, goto 21, if 22, auto 23,

unsigned 24, return 25, switch 26, long 27, break 28,

union 29, register 30,

Total search() invocations = 15913

Started Mon Oct 17 20:40:40 1983

Finished Mon Oct 17 20:40:42 1983

Perfect hash function finder, CDH Ver. 2.9

Start time = Mon Oct 17 20:41:11 1983

39 keywords, 19 distinct letters.

The associated char value limit is 19

The table size limit is 100

The search ordering is

TEXT RESET TRUE REWRITE READLN SQRT SQR EOLN TRUNC

PUT EXP PAGE CHR CHAR COS SUCC READ ROUND DISPOSE

PRED SIN OUTPUT ORD INPUT INTEGER GET MAXINT REAL

WRITE EOF FALSE NEW WRITELN LN ARCTAN ABS BOOLEAN

PACK UNPACK

Success! Associated Char Values Follow:

A =18, B =11, C =16, D =18, E = 3, F = 3, G = 0,

I = 1, K =19, L = 9, M =18, N =19, O = 8, P = 9,

R = 0, S =14, T = 0, U =13, W =19,

Hash min = 3, max = 45, spread = 43

GET 3, TEXT 4, RESET 5, INPUT 6, TRUE 7,

INTEGER 8, EOF 9, REWRITE 10, FALSE 11, PUT 12,

REAL 13, OUTPUT 14, EXP 15, PAGE 16, SQR 17, SQRT 18,

CHR 19, CHAR 20, TRUNC 21, READ 22, ROUND 23,

MAXINT 24, READLN 25, EOLN 26, WRITE 27, DISPOSE 28,

ORD 29, LN 30, PRED 31, PACK 32, COS 33, SUCC 34,

ABS 35, SIN 36, BOOLEAN 37, UNPACK 38, NEW 41,

ARCTAN 43, WRITELN 45,

Total search() invocations = 149292

Started Mon Oct 17 20:41:11 1983

Finished Mon Oct 17 20:41:35 1983

Usually, the first time you run perf, just let everything

default. The second time,

use the -t option to limit the table size to the first hash

value plus the number of keywords.

On Unix and Vax/VMS (Vax-11 C), The program will accept

SIGTERM signals (CTRL/C on VMS) for an update status report

since it may take quite a while to find the hash function values.

Sample keyword tables

The following tables are known to work correctly.

The first defines the keywords for the C programming

language; the second for a toy computer language.

int char float double struct union long short

unsigned auto extern register typedef static

goto return sizeof break continue if else for

do while switch case default entry

GET TEXT RESET OUTPUT MAXINT INPUT TRUE

INTEGER EOF REWRITE FALSE CHR CHAR TRUNC REAL

SQR SQRT WRITE PUT ORD READ ROUND READLN EXP

PAGE EOLN COS SUCC DISPOSE NEW ABS LN BOOLEAN

WRITELN SIN PACK UNPACK ARCTAN PRED

Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/