Output of file : PAPER.ASC contained in archive : SORT-OUT.ZIP
SORTING IT OUT
Henning Hansen and Niels Jrgen Christensen,
Technical University of Denmark

Abstract

We present a Forth program for analysis and visualization of sorting
algorithms. The program includes a comprehensive collection of implemented
sorting algorithms, and two new algorithms has been developed. Trade-offs
in the selection of sorting algorithms are illustrated with an example from
computer graphics.

Introduction

Sorting is a classical discipline in computer science, and the study of
sorting algorithms and implementations illustrates different approaches and
strategies for solving a well defined problem. Knowledge of sorting
algorithms may provide inspiration for problem solving in other branches of
computer science, as well as be the basis for selecting the right method for
a specific task. At the end of the paper, this point will be illustrated
with an example from computer graphics, which is the field from where this
work originated.

The paper describes a program, that has been developed for the purpose of
analysis and visualization of sorting algorithms. The program serves a
number of purposes, some of which are described in the following sections.
The program is placed in the public domain.

Algorithm library

About every well known general purpose sorting algorithm has been
implemented as part of the program. It may therefore serve as a source from
which to select working implementations of sorting algorithms, that are in
choice for an application. This makes it easy to try out a number of
different algorithms in the application before deciding which one
to actually use.

All of the algorithms are implemented, so that they sort data in-place, and
only interact with the data through two operations: compare two items to
test which is the greater one, and exchange two items to change the order
of the data. Some algorithms make use of an optional move operation for
better performance, but this can easily be implemented in terms of exchange
operations. These operations make it very easy to interface the sorting
program to new data representations.

Sorting analysis

The algorithms can be tested for a selected number of data, and for
different types of initial order, made up with a random number generator.
Statistics for the number of compare, exchange and move operations are
calculated. One measure of the "sortedness" of data is the number of
inversions in the file, and the program has a procedure, that generates
data that approximates any percentage of inversions selected by the user.
The data generation procedure can also define data with multiple key
values (more items with the same key value).

These methods allow you to test, whether the algorithms perform as they are
theoretically expected to do for differently ordered data, and to see which
one will make a good choice for a possible application, if data are known to
have a certain distribution.

Development of new algorithms

It is very easy to add a new algorithm to the program, and the program will
serve as a good environment for debugging and testing of the implementation.

The simplest way to test a sorting algorithm is by sorting a string into
ascii order, while printing out information that illustrates the execution
of compare and exchange operations, and lists the actual string content
during the sorting process. The program lets you input a string of your own
choice, and either single step through the sort or let it run to an end.
Every compare, exchange or move operation performed on the data is
automatically traced on the screen, since tracing procedures as well as
single stepping are embedded the operations. If the word TRACE is placed
inside a major loop of the program, it will show the contents of the string
every time it is executed. This is the only modification of the
implementation needed to fully support the integration of the sorting
algorithm into the program. This type of analysis is very good for
debugging of a new sorting implementation, because you can follow every
step of the algorithm, and choose examples that test special cases.

The user will find two new-developed sorting algorithms with the program,
called Stacksort (Forth programmers are expected to like that name) and
Splicesort. Stacksort is a recursive variation of Selectionsort, and
Splicesort is based on ideas from Mergesort and Shellsort.

Unlike in other languages, recursion has no execution overhead in Forth.
This makes Stacksort especially well suited for implementation in Forth, and
it is probably the fastest algorithm for sorting almost presorted data in
Forth. The main problem is the depth of recursive calls, which may overflow
either the data or return stack. For almost sorted data, which is the
best-case for Stacksort, the recursion depth comes close to the number of
data elements.

Splicesort works much like Mergesort, except that it creates sorted
subfiles, that are interweaved in a way similar to a Shellsort with
increments equal to the powers of two. Two variations of Splicesort can be
constructed by using local insertion or selection, but implementation gets
rather complicated.

We have not focused on optimization, but some algorithms may be more
efficient than their statistically behaviour indicates, because the code
is simpler compared to others. For example, if some data can be sorted with
either Stacksort or Splicesort with about the same number of operations,
Stacksort will probably run faster because its code is simpler. However, if
implemented in a language that does not allow recursion without much
execution overhead, this may not be true.

Visualization of sorting

When your sorting algorithm has been succesfully implemented, its behaviour
can best be studied and explained by the use of graphics. While the ascii
order of a string of characters does not show any special graphics effect,
it is very easy to tell if a series of colours are ordered into spectral
order. The whole sorting process can be illustrated in one picture by a
sequence of rows, that shows the order of the colours at different times
during the process.

A variation of this visialization technique is to show only colours, when
they are compared or exchanged, so the picture will show which part of the
data are accessed by the algorithm at different stages of the process.

The third type of graphics that are provided by the program, shows a series
of columns, while they are interactively being sorted by height. This does
not depend on colours, and is well suited for black and white illustrations.
A special feature of the column sort is, that columns of equal height can
be colored from left to right before sorting. If the sorting algorithm is
stable, the order of the colours will be maintained within all sets of
columns of equal height. The column sort can be stopped for inspection
at any stage with the single stepping facility, which is also active
during the graphics visualizations.

Example from computer graphics

To illustrate some trade-offs in selecting a sorting algorithm for a real
application, we have chosen an example from computer graphics. Algorithms
for hidden line elimination from the visualization of a three-dimensional
scene often make use of sorted lists of polygons. The scene itself consists
of an unordered (ie. randomly ordered) set of polygons, and during hidden
line elimination an algorithm may proceed back to front through space
overpainting polygons by closer ones, or front to back while clipping
polygons against closer visible ones. In both cases, two sorted lists of
polygons will be useful: one list of the polygons sorted with respect to
the distance from the eyepoint to the nearest point on the polygon, and
another list of the same polygons sorted with respect to the distance to
their furthest point. When proceeding front to back through space, the
hidden line algorithm can now maintain a list of polygons at the current
depth simply by adding new polygons in the order of the first list, when
passing their nearest point, and deleting polygons in the order of the
second list, when the furthest point is reached, and vice versa for back to
front processing of the polygons. Our problem is which sorting algorithms
to choose for creation of the two lists.

The first thing to consider is which operation is slower, comparing two
polygons or interchanging them. Since we do not want to move polygons
around in space, the items to be sorted are pointers to the polygons, and
therefore exchange operations are fast. However, when two polygons need be
compared, the values of their minimal/maximal distance from the eye must be
accessed and compared. Since these values will often be real numbers,
comparing two polygons will be significantly slower than exchanging two
polygon pointers. This excludes all sorting algorithms based on selection,
while good choices would be Quicksort, Shellsort or Splicesort. If the lists
are represented as linked lists, a move operation will be fast, and Binary
Insertionsort or Mergesort are better.

The above considerations were for randomly sorted data. However, when one
of the two lists has been calculated, the other will probably be very much
identical, especially if the sizes of the polygons are small compared to the
distances between them. So if the second list is produced from the already
made first list, we can take advantage of the fact, that the polygons are
almost presorted. Straight Insertionsort will perform very well in this
case, and if the depth of recursion is acceptable, try out Stacksort. Do
not use Quicksort.

Conclusion

The program has proved an efficient workbench for the development and
testing of the sorting algorithms described in this paper. It is well suited
for self study, and the program has been used as a supplementary tool for
teaching about sorting methods at the Technical University of Denmark.

A longer version of this paper, including source code for the sorting
algorithms, will be submitted to the Journal of Forth Applications and
Research. The paper is based on work done under a grant from the Danish
National Research Counsil.

Bibliography

Sorting algorithms (except Splicesort and Stacksort) are described in the
following books:

D. E. Knuth, The Art of Computer Programming, volume 3: Sorting and

H. Lorin, Sorting and Sort Systems, Adddison-Wesley, 1975.

N. Wirth, Algorithms + Data Structures = Programs, Prentice-Hall 1976.