Dec 182017
 
Computer Language Magazine source code from Sept 90 issue.
File CLM0990.ZIP from The Programmer’s Corner in
Category Files from Magazines
Computer Language Magazine source code from Sept 90 issue.
File Name File Size Zip Size Zip Type
3D.PAS 10741 3515 deflated
EOF.TXT 9094 3381 deflated
GRAY.C 3399 1190 deflated
SPLAY.ZIP 10235 9896 deflated

Download File CLM0990.ZIP Here

Contents of the EOF.TXT file


OPTIMIZE ALGORITHMS, NOT CODE
by David Thielen

Recently, the use of profilers has received increased attention. Using a
profiler properly, and then optimizing the code that the profiler shows as the
bottleneck, can lead to significant performance improvement.
However, too many times, to speed up the bottleneck, a developer will
re-write the code in assembler, implementing the same algorithm. This is
usually NOT the best solution. For example, if a linear search is the
bottleneck in a program, the solution is to use a binary search rather than
writing the linear search in assembler.
The other thing I have noticed over the past 4 years of using profilers is
that basic run-time routines are usually the bottlenecks. Having a highly
optimized run-time library is worth its weight in gold (depending on what a
byte weighs).
Over the past few years I have written my own run-time c library. The
original reason I did this was that I came across some errors in the run-time
library of the c compiler that I was using at the time. Since then I have
continued to use and optimize my own run-time to get the speed I wanted out of
it.
In most cases, my optimization comes not from writing in assembler but from
taking another look at what the basic algorithim truly is. Many times, looking
at the algorithm at its most basic level leads to a more efficient function,
usually with the same or less complexity.
One of the routines I wrote was HbBsearchNext which performs a binary search.
However, if the item desired is not in the array, the next highest item is
returned.
For those of you not familiar with the bsearch library routine in most c
run-time libraries, it is a general purpose function designed to quickly find a
member of a sorted array of any type. The function is declared as:

void *HbBsearchNext (pmKey,pmBase,uNum,uWid,fnCmp)
void const *pmKey, *pmBase;
unsigned uNum, uWid;
int (*fnCmp) (void *pm1,void *pm2);

pmBase points to an array of elements where the array holds uNum elements and
each element is uWid bytes long. pmKey points to an element containing the
values you are looking for in pmBase. fnCmp compares two elements and returns
<0, 0, or >0 based on whether the first element is less than, equal to, or
greater than the second element respectively. strcmp is an example if each
element is a pointer to a string.
A binary search is implemented by first comparing pmKey to the middle element
in pmBase (pmBase + (uNum / 2) * uWid). If the middle element is greater than
pmKey, pmKey is then compared to the element halfway between the beginning of
pmBase and the middle. The opposite is done if the middle is less than pmKey.
In this manner, each compare cuts the elements left to check in half or if
there are 256 elements in pmBase, it will take at most 8 compares (and an
average of 7) to find any element. A linear search (comparing from first to
last in order) on the other hand can require 256 compares and will average 128.
The main part of the basic bsearch algorithm is as follows (some parts have
been omitted for clarity):

iNum = uNum / 2;
pmBuf = pmBase + uWid * iNum;
do
{
iNum /= 2;
if (! (iRtn = fnCmp (pmKey, pmBuf))
return (pmBuf);
if (iRtn < 0)
pmBuf -= uWid * iNum;
else
pmBuf += uWid * iNum;
}
while (iNum > 0);

The above loop has one operation that takes most of the time (assuming that
fnCmp is fairly efficient). That one op, the multiplication (uWid * iNum),
could easily be half the clock cycles in each iteration of the loop. If the
above was converted to assembler, the multiplication would remain.
The trick I used to speed up the bsearch is to replace the multiply with
something faster. As it turns out, this can be done very easily. If you look
at how the bsearch constantly jumps to the middle of the remaining range, you
will realize that the bsearch, unless it keeps jumping toward an end, is
jumping by powers of 2. If we always jump by powers of 2, we can bit shift
rather than multiply.
The bit shift means that we are increasing our range (uNum) to the nearest
integer power of 2 (uNum <= 2**n where n is the smallest integer that fits the
inequality). This means that 257 elements and 512 elements will have the same
average number of compares. However, it turns out that this holds whether we
use the multiply or shift. Therefore, there is no increase in the average
number of compares by this change in the algorithm.
The main part of the basic bsearch algorithm becomes (some parts have been
omitted for clarity):

iShift = HighBit (uNum); /* high bit on number (1 - 8) */
pmBuf = pmBase + (uWid << iShift) - uWid;
pmEnd = pmBase + (uNum - 1) * uWid;
do
{
iShift--;
if (! (iRtn = fnCmp (pmKey, pmBuf))
return (pmBuf);
if (iRtn < 0)
pmBuf -= uWid << iShift;
else
pmBuf += uWid >> iShift;
pmBuf = MinMax (pmBase, pmEnd, pmBuf); /* keep in range */
}
while (iShift > 0);

Some notes about the actual code. First, all of the passed in parameters
must be checked (another reason I write my own run-time; most compilers
run-time will happily copy data to a NULL pointer). Second, if the last
element in the array is less than pmKey, the return is NULL because
HbBsearchNext returns an element >= pmKey. Third, if there are duplicates, it
returns the first element that matches (or is next).
Finally, if the found element is > pmKey, (there are no elements == pmKey)
and the element before the found element is equal to the found element, a
smaller search must be performed to find the first element that is next. Once
again, this is done using HbBsearchNext to find it quickly.
The above algorithm works just as well as the first algorithm using the
multiply yet is significantly faster. For a program that uses a binary search,
this change should cause a noticeable increase in program speed. I like this
example for two reasons.
First, by implementing changes like the above wherever possible throughout
the run-time library, the performance of all programs using the run-time are
improved. Creating a highly optimized run-time library is one of the most
effective ways to get fast programs with little effort.
Second, it didn't take a lot of extra time to come up with this. I just
thought about it for 5 minutes. Because it added no complexity, the coding and
testing was just as fast. To me, this is a good example of how thinking about
a problem in new ways for a few minutes instead of "doing it the way everyone
else is" can pay large dividends.

LISTING 1

#include "hb_tool.h"
#ifdef ERROR
#include "_error.h"
#endif

/*****************************************************************

Function: bSearchNext

Description: Looks for key. Returns a pointer to the
FIRST record == key. If none == key,
returns a pointer to the record closest
to key BUT greater than. If none greater
than, returns NULL.

Dependence: None

*****************************************************************/

PVOID HBFUNC (HbBsearchNext) (pmKey,pmBase,uNum,uWid,fnComp)
PCVOID pmKey, pmBase;
unsigned uNum, uWid;
int (*fnComp) (PCVOID pm1,PCVOID pm2);
{
int iShft, iRtn;
PCBYTE pmBuf, pmBufOn, pmBufMax;

#ifdef ERROR
if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))
_ErrorPrint ("HbBsearchNext
(%lX,%lX,%u,%u,%lX)",pmKey,pmBase,uNum,uWid,fnComp);
#endif

if ((! pmKey) || (! pmBase) || (! uNum) || (! uWid) || (! fnComp))
return ((PVOID) NULL);

pmBufOn = pmBufMax = (PCBYTE) pmBase + (uNum-1) * uWid;
if ((*fnComp) (pmKey, pmBufOn) > 0)
return ((PVOID) NULL);

iShft = HbHighBit (uNum);
pmBuf = (PBYTE) pmBase + (uWid << iShft) - uWid;

/* loop until find a match */
do
{
iShft--;
pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* found a match */
if ((iRtn = (*fnComp) (pmKey,pmBuf)) == 0)
{
/* make sure its the first if dups exist */
if (pmBuf <= pmBase)
return ((PVOID) pmBuf);
if ((*fnComp) (pmKey, pmBuf-uWid))
return ((PVOID) pmBuf);
iRtn = -1;
if (pmBuf < pmBufOn)
pmBufOn = pmBuf;
}

if (iRtn < 0)
{
iRtn = (*fnComp) (pmBuf,pmBufOn);
if ((iRtn < 0) || ((! iRtn) && (pmBuf < pmBufOn)))
pmBufOn = pmBuf;
pmBuf -= uWid << iShft;
}
else
pmBuf += uWid << iShft;

}
while (iShft >= 0);
pmBuf = HbMinMax ((PCBYTE)pmBase, pmBufMax, pmBuf);

/* doesn't exist - return the address of the next highest */
if (pmBufOn <= pmBase)
return ((PVOID) pmBufOn);
if ((*fnComp) (pmBufOn,pmBufOn-uWid))
return ((PVOID) pmBufOn);

uNum = (pmBufOn - (PCBYTE) pmBase) / uWid + 1;
return (HBFUNC (HbBsearchNext) (pmBufOn, pmBase, uNum, uWid, fnComp));
}


 December 18, 2017  Add comments

Leave a Reply