Dec 182017

Computer Language Magazine source code from Sept 90 issue. | |||
---|---|---|---|

File Name | File Size | Zip Size | Zip Type |

3D.PAS | 10741 | 3515 | deflated |

EOF.TXT | 9094 | 3381 | deflated |

GRAY.C | 3399 | 1190 | deflated |

SPLAY.ZIP | 10235 | 9896 | deflated |

# Download File CLM0990.ZIP Here

## Contents of the EOF.TXT file

OPTIMIZE ALGORITHMS, NOT CODE

by David Thielen

Recently, the use of profilers has received increased attention. Using a

profiler properly, and then optimizing the code that the profiler shows as the

bottleneck, can lead to significant performance improvement.

However, too many times, to speed up the bottleneck, a developer will

re-write the code in assembler, implementing the same algorithm. This is

usually NOT the best solution. For example, if a linear search is the

bottleneck in a program, the solution is to use a binary search rather than

writing the linear search in assembler.

The other thing I have noticed over the past 4 years of using profilers is

that basic run-time routines are usually the bottlenecks. Having a highly

optimized run-time library is worth its weight in gold (depending on what a

byte weighs).

Over the past few years I have written my own run-time c library. The

original reason I did this was that I came across some errors in the run-time

library of the c compiler that I was using at the time. Since then I have

continued to use and optimize my own run-time to get the speed I wanted out of

it.

In most cases, my optimization comes not from writing in assembler but from

taking another look at what the basic algorithim truly is. Many times, looking

at the algorithm at its most basic level leads to a more efficient function,

usually with the same or less complexity.

One of the routines I wrote was HbBsearchNext which performs a binary search.

However, if the item desired is not in the array, the next highest item is

returned.

For those of you not familiar with the bsearch library routine in most c

run-time libraries, it is a general purpose function designed to quickly find a

member of a sorted array of any type. The function is declared as:

void *HbBsearchNext (pmKey,pmBase,uNum,uWid,fnCmp)

void const *pmKey, *pmBase;

unsigned uNum, uWid;

int (*fnCmp) (void *pm1,void *pm2);

pmBase points to an array of elements where the array holds uNum elements and

each element is uWid bytes long. pmKey points to an element containing the

values you are looking for in pmBase. fnCmp compares two elements and returns

<0, 0, or >0 based on whether the first element is less than, equal to, or

greater than the second element respectively. strcmp is an example if each

element is a pointer to a string.

A binary search is implemented by first comparing pmKey to the middle element

in pmBase (pmBase + (uNum / 2) * uWid). If the middle element is greater than

pmKey, pmKey is then compared to the element halfway between the beginning of

pmBase and the middle. The opposite is done if the middle is less than pmKey.

In this manner, each compare cuts the elements left to check in half or if

there are 256 elements in pmBase, it will take at most 8 compares (and an

average of 7) to find any element. A linear search (comparing from first to

last in order) on the other hand can require 256 compares and will average 128.

The main part of the basic bsearch algorithm is as follows (some parts have

been omitted for clarity):

iNum = uNum / 2;

pmBuf = pmBase + uWid * iNum;

do

{

iNum /= 2;

if (! (iRtn = fnCmp (pmKey, pmBuf))

return (pmBuf);

if (iRtn < 0)

pmBuf -= uWid * iNum;

else

pmBuf += uWid * iNum;

}

while (iNum > 0);

The above loop has one operation that takes most of the time (assuming that

fnCmp is fairly efficient). That one op, the multiplication (uWid * iNum),

could easily be half the clock cycles in each iteration of the loop. If the

above was converted to assembler, the multiplication would remain.

The trick I used to speed up the bsearch is to replace the multiply with

something faster. As it turns out, this can be done very easily. If you look

at how the bsearch constantly jumps to the middle of the remaining range, you

will realize that the bsearch, unless it keeps jumping toward an end, is

jumping by powers of 2. If we always jump by powers of 2, we can bit shift

rather than multiply.

The bit shift means that we are increasing our range (uNum) to the nearest

integer power of 2 (uNum <= 2**n where n is the smallest integer that fits the

inequality). This means that 257 elements and 512 elements will have the same

average number of compares. However, it turns out that this holds whether we

use the multiply or shift. Therefore, there is no increase in the average

number of compares by this change in the algorithm.

The main part of the basic bsearch algorithm becomes (some parts have been

omitted for clarity):

iShift = HighBit (uNum); /* high bit on number (1 - 8) */

pmBuf = pmBase + (uWid << iShift) - uWid;

pmEnd = pmBase + (uNum - 1) * uWid;

do

{

iShift--;

if (! (iRtn = fnCmp (pmKey, pmBuf))

return (pmBuf);

if (iRtn < 0)

pmBuf -= uWid << iShift;

else

pmBuf += uWid >> iShift;

pmBuf = MinMax (pmBase, pmEnd, pmBuf); /* keep in range */

}

while (iShift > 0);

Some notes about the actual code. First, all of the passed in parameters

must be checked (another reason I write my own run-time; most compilers

run-time will happily copy data to a NULL pointer). Second, if the last

element in the array is less than pmKey, the return is NULL because

HbBsearchNext returns an element >= pmKey. Third, if there are duplicates, it

returns the first element that matches (or is next).

Finally, if the found element is > pmKey, (there are no elements == pmKey)

and the element before the found element is equal to the found element, a

smaller search must be performed to find the first element that is next. Once

again, this is done using HbBsearchNext to find it quickly.

The above algorithm works just as well as the first algorithm using the

multiply yet is significantly faster. For a program that uses a binary search,

this change should cause a noticeable increase in program speed. I like this

example for two reasons.

First, by implementing changes like the above wherever possible throughout

the run-time library, the performance of all programs using the run-time are

improved. Creating a highly optimized run-time library is one of the most

effective ways to get fast programs with little effort.

Second, it didn't take a lot of extra time to come up with this. I just

thought about it for 5 minutes. Because it added no complexity, the coding and

testing was just as fast. To me, this is a good example of how thinking about

a problem in new ways for a few minutes instead of "doing it the way everyone

else is" can pay large dividends.

LISTING 1

#include "hb_tool.h"

#ifdef ERROR

#include "_error.h"

#endif

/*****************************************************************

Function: bSearchNext

Description: Looks for key. Returns a pointer to the

FIRST record == key. If none == key,

returns a pointer to the record closest

to key BUT greater than. If none greater

than, returns NULL.

Dependence: None

*****************************************************************/

PVOID HBFUNC (HbBsearchNext) (pmKey,pmBase,uNum,uWid,fnComp)

PCVOID pmKey, pmBase;

unsigned uNum, uWid;

int (*fnComp) (PCVOID pm1,PCVOID pm2);

{

int iShft, iRtn;

PCBYTE pmBuf, pmBufOn, pmBufMax;

#ifdef ERROR

if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))

_ErrorPrint ("HbBsearchNext

(%lX,%lX,%u,%u,%lX)",pmKey,pmBase,uNum,uWid,fnComp);

#endif

if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))

return ((PVOID) NULL);

pmBufOn = pmBufMax = (PCBYTE) pmBase + (uNum-1) * uWid;

if ((*fnComp) (pmKey, pmBufOn) > 0)

return ((PVOID) NULL);

iShft = HbHighBit (uNum);

pmBuf = (PBYTE) pmBase + (uWid << iShft) - uWid;

/* loop until find a match */

do

{

iShft--;

pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* found a match */

if ((iRtn = (*fnComp) (pmKey,pmBuf)) == 0)

{

/* make sure its the first if dups exist */

if (pmBuf <= pmBase)

return ((PVOID) pmBuf);

if ((*fnComp) (pmKey, pmBuf-uWid))

return ((PVOID) pmBuf);

iRtn = -1;

if (pmBuf < pmBufOn)

pmBufOn = pmBuf;

}

if (iRtn < 0)

{

iRtn = (*fnComp) (pmBuf,pmBufOn);

if ((iRtn < 0) || ((! iRtn) && (pmBuf < pmBufOn)))

pmBufOn = pmBuf;

pmBuf -= uWid << iShft;

}

else

pmBuf += uWid << iShft;

}

while (iShft >= 0);

pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* doesn't exist - return the address of the next highest */

if (pmBufOn <= pmBase)

return ((PVOID) pmBufOn);

if ((*fnComp) (pmBufOn,pmBufOn-uWid))

return ((PVOID) pmBufOn);

uNum = (pmBufOn - (PCBYTE) pmBase) / uWid + 1;

return (HBFUNC (HbBsearchNext) (pmBufOn, pmBase, uNum, uWid, fnComp));

}

by David Thielen

Recently, the use of profilers has received increased attention. Using a

profiler properly, and then optimizing the code that the profiler shows as the

bottleneck, can lead to significant performance improvement.

However, too many times, to speed up the bottleneck, a developer will

re-write the code in assembler, implementing the same algorithm. This is

usually NOT the best solution. For example, if a linear search is the

bottleneck in a program, the solution is to use a binary search rather than

writing the linear search in assembler.

The other thing I have noticed over the past 4 years of using profilers is

that basic run-time routines are usually the bottlenecks. Having a highly

optimized run-time library is worth its weight in gold (depending on what a

byte weighs).

Over the past few years I have written my own run-time c library. The

original reason I did this was that I came across some errors in the run-time

library of the c compiler that I was using at the time. Since then I have

continued to use and optimize my own run-time to get the speed I wanted out of

it.

In most cases, my optimization comes not from writing in assembler but from

taking another look at what the basic algorithim truly is. Many times, looking

at the algorithm at its most basic level leads to a more efficient function,

usually with the same or less complexity.

One of the routines I wrote was HbBsearchNext which performs a binary search.

However, if the item desired is not in the array, the next highest item is

returned.

For those of you not familiar with the bsearch library routine in most c

run-time libraries, it is a general purpose function designed to quickly find a

member of a sorted array of any type. The function is declared as:

void *HbBsearchNext (pmKey,pmBase,uNum,uWid,fnCmp)

void const *pmKey, *pmBase;

unsigned uNum, uWid;

int (*fnCmp) (void *pm1,void *pm2);

pmBase points to an array of elements where the array holds uNum elements and

each element is uWid bytes long. pmKey points to an element containing the

values you are looking for in pmBase. fnCmp compares two elements and returns

<0, 0, or >0 based on whether the first element is less than, equal to, or

greater than the second element respectively. strcmp is an example if each

element is a pointer to a string.

A binary search is implemented by first comparing pmKey to the middle element

in pmBase (pmBase + (uNum / 2) * uWid). If the middle element is greater than

pmKey, pmKey is then compared to the element halfway between the beginning of

pmBase and the middle. The opposite is done if the middle is less than pmKey.

In this manner, each compare cuts the elements left to check in half or if

there are 256 elements in pmBase, it will take at most 8 compares (and an

average of 7) to find any element. A linear search (comparing from first to

last in order) on the other hand can require 256 compares and will average 128.

The main part of the basic bsearch algorithm is as follows (some parts have

been omitted for clarity):

iNum = uNum / 2;

pmBuf = pmBase + uWid * iNum;

do

{

iNum /= 2;

if (! (iRtn = fnCmp (pmKey, pmBuf))

return (pmBuf);

if (iRtn < 0)

pmBuf -= uWid * iNum;

else

pmBuf += uWid * iNum;

}

while (iNum > 0);

The above loop has one operation that takes most of the time (assuming that

fnCmp is fairly efficient). That one op, the multiplication (uWid * iNum),

could easily be half the clock cycles in each iteration of the loop. If the

above was converted to assembler, the multiplication would remain.

The trick I used to speed up the bsearch is to replace the multiply with

something faster. As it turns out, this can be done very easily. If you look

at how the bsearch constantly jumps to the middle of the remaining range, you

will realize that the bsearch, unless it keeps jumping toward an end, is

jumping by powers of 2. If we always jump by powers of 2, we can bit shift

rather than multiply.

The bit shift means that we are increasing our range (uNum) to the nearest

integer power of 2 (uNum <= 2**n where n is the smallest integer that fits the

inequality). This means that 257 elements and 512 elements will have the same

average number of compares. However, it turns out that this holds whether we

use the multiply or shift. Therefore, there is no increase in the average

number of compares by this change in the algorithm.

The main part of the basic bsearch algorithm becomes (some parts have been

omitted for clarity):

iShift = HighBit (uNum); /* high bit on number (1 - 8) */

pmBuf = pmBase + (uWid << iShift) - uWid;

pmEnd = pmBase + (uNum - 1) * uWid;

do

{

iShift--;

if (! (iRtn = fnCmp (pmKey, pmBuf))

return (pmBuf);

if (iRtn < 0)

pmBuf -= uWid << iShift;

else

pmBuf += uWid >> iShift;

pmBuf = MinMax (pmBase, pmEnd, pmBuf); /* keep in range */

}

while (iShift > 0);

Some notes about the actual code. First, all of the passed in parameters

must be checked (another reason I write my own run-time; most compilers

run-time will happily copy data to a NULL pointer). Second, if the last

element in the array is less than pmKey, the return is NULL because

HbBsearchNext returns an element >= pmKey. Third, if there are duplicates, it

returns the first element that matches (or is next).

Finally, if the found element is > pmKey, (there are no elements == pmKey)

and the element before the found element is equal to the found element, a

smaller search must be performed to find the first element that is next. Once

again, this is done using HbBsearchNext to find it quickly.

The above algorithm works just as well as the first algorithm using the

multiply yet is significantly faster. For a program that uses a binary search,

this change should cause a noticeable increase in program speed. I like this

example for two reasons.

First, by implementing changes like the above wherever possible throughout

the run-time library, the performance of all programs using the run-time are

improved. Creating a highly optimized run-time library is one of the most

effective ways to get fast programs with little effort.

Second, it didn't take a lot of extra time to come up with this. I just

thought about it for 5 minutes. Because it added no complexity, the coding and

testing was just as fast. To me, this is a good example of how thinking about

a problem in new ways for a few minutes instead of "doing it the way everyone

else is" can pay large dividends.

LISTING 1

#include "hb_tool.h"

#ifdef ERROR

#include "_error.h"

#endif

/*****************************************************************

Function: bSearchNext

Description: Looks for key. Returns a pointer to the

FIRST record == key. If none == key,

returns a pointer to the record closest

to key BUT greater than. If none greater

than, returns NULL.

Dependence: None

*****************************************************************/

PVOID HBFUNC (HbBsearchNext) (pmKey,pmBase,uNum,uWid,fnComp)

PCVOID pmKey, pmBase;

unsigned uNum, uWid;

int (*fnComp) (PCVOID pm1,PCVOID pm2);

{

int iShft, iRtn;

PCBYTE pmBuf, pmBufOn, pmBufMax;

#ifdef ERROR

if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))

_ErrorPrint ("HbBsearchNext

(%lX,%lX,%u,%u,%lX)",pmKey,pmBase,uNum,uWid,fnComp);

#endif

if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))

return ((PVOID) NULL);

pmBufOn = pmBufMax = (PCBYTE) pmBase + (uNum-1) * uWid;

if ((*fnComp) (pmKey, pmBufOn) > 0)

return ((PVOID) NULL);

iShft = HbHighBit (uNum);

pmBuf = (PBYTE) pmBase + (uWid << iShft) - uWid;

/* loop until find a match */

do

{

iShft--;

pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* found a match */

if ((iRtn = (*fnComp) (pmKey,pmBuf)) == 0)

{

/* make sure its the first if dups exist */

if (pmBuf <= pmBase)

return ((PVOID) pmBuf);

if ((*fnComp) (pmKey, pmBuf-uWid))

return ((PVOID) pmBuf);

iRtn = -1;

if (pmBuf < pmBufOn)

pmBufOn = pmBuf;

}

if (iRtn < 0)

{

iRtn = (*fnComp) (pmBuf,pmBufOn);

if ((iRtn < 0) || ((! iRtn) && (pmBuf < pmBufOn)))

pmBufOn = pmBuf;

pmBuf -= uWid << iShft;

}

else

pmBuf += uWid << iShft;

}

while (iShft >= 0);

pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* doesn't exist - return the address of the next highest */

if (pmBufOn <= pmBase)

return ((PVOID) pmBufOn);

if ((*fnComp) (pmBufOn,pmBufOn-uWid))

return ((PVOID) pmBufOn);

uNum = (pmBufOn - (PCBYTE) pmBase) / uWid + 1;

return (HBFUNC (HbBsearchNext) (pmBufOn, pmBase, uNum, uWid, fnComp));

}

December 18, 2017
Add comments