Dec 212017
Text file outlining strategies for conditional indexing in Clipper. | |||
---|---|---|---|
File Name | File Size | Zip Size | Zip Type |
INDEX.TXT | 11494 | 3682 | deflated |
Download File PARTINDX.ZIP Here
Contents of the INDEX.TXT file
PARTIAL INDEXING
Ira Emus
Sep. 19, 1988
BIX irae
CIS 76702,672
By now I assume you've all heard of partial indexing, the
process of creating an index that contains pointers to only
a subset of the records in the database. If not, you might
ask why you would want an index that only contains a subset
of the records and as a not-so-pleasant side-effect will
boot you to DOS if you do not treat it with kid gloves.
There are two answers; Speed; the time needed to create a
partial index may be as little as 2% of the time needed to
create a full index, and Speed; a database with an active
partial index only looks as big as a database which contains
the number of records included in the index, usually a small
subset of the whole file. With the creation time reduced in
this fashion, it becomes quite feasible to create an index
for ad-hoc queries and reports. The simplest partial
indexing routine looks something like this:
INDEX ON partial1() TO temp
FUNCTION partial1
DO WHILE !
GOTO RECNO()+1
ENDDO
RETURN
This is faster if you only extract a small subset of the
file since Clipper only has to create an index on the number
of returned records. This may cut indexing time up to 50%,
depending upon the size of the file and percentage of
records placed into the index. The GOTO recno(), instead of
SKIP, is necessary since during the creation of an index, it
is unclear what order the file is really in, is it in record
number order, the order of the last controlling index, or
the order of the index that is corrently being built? Since
the UDF is executeing in the normal Clipper environment
which is possibly at a different state than the internal
indexing routine, it is important to realize that things may
not be exactly as they seem.
An example that shows a much larger time saving is creating
an index with a relation set. I will use as an example an
order entry system with three files, a name file that
contains buyer and receiver information; an order header
file that contains order info and an entry that points to
the buyer in the name file; and a line item file that
contains the item that was ordered and an entry that points
to the receiver in the name file. It is not too far-fetched
to assume that I might want to generate a report, ordered on
the buyer's name or ZIP code, based on the people who
purchased a particular item. To accomplish this feat in the
past might have required something like this:
SELE 0
USE names INDEX custcode
SELE 0
USE ordrhedr INDEX sender
SET RELATION TO sender INTO names
SELE 0
USE lineitem INDEX orderno
SET RELATION TO orderno INTO ordrhedr
INDEX ON names->zip TO temp
This works great, but it's slow, because for every record in
the lineitem database, 2 SEEKs were performed, one for each
relation. If you only need the index for a report and only
want a subset of the records, a partial index can assure
that the 2 SEEKs are only performed for the records that are
to be included in the report. If the subset to be extracted
is a small percentage of the database, you may see
reductions in indexing time of up to 90%
This is a sample of the code needed to do this type of
index:
SELE 0
USE names INDEX custcode
SELE 0
USE ordrhedr INDEX sender
SELE 0
USE lineitem
INDEX ON partial2() TO temp
FUNCTION partial2
DO WHILE (lineitem->RECNO()) < lineitem->(LASTREC())
SELE lineitem
DO WHILE product # "BASKET";
.AND. RECNO() < LASTREC()
GOTO RECNO() + 1
ENDDO
SELE ordrhedr
SEEK lineitem->orderno
IF FOUND()
SELE names
SEEK ordrhedr->sender
IF FOUND()
EXIT
ENDIF
ENDIF
SELE lineitem
GOTO RECNO() + 1
ENDDO
SELE lineitem
RETURN names->zip
A problem that manifests itself while using partial indexes
is that the last record is always included in the index, a
key must be returned for the last record so that Clipper
knows that it's time to stop creating keys and start
generating the index, and quite that last record is not a
part of the selected set. One solution is to set a filter on
the selection criteria you used to create the index; in the
last example it would be something like this:
SET FILTER TO (lineitem->product) = "BASKET"
This will have an almost unnoticeable effect on the speed of
the application as long as you only keep the filter set
while the partial index is the controlling index, under
these conditions you can never be more than a SKIP away from
a good record. The only possible record in the index that
does not meet the condition is the last record in the file,
and since Clipper considers a database file in index order
when it analyzes a filter, you can never be more than 1
logical record away. The reason we've done all of this is
that we now have an index that contains only the records in
the lineitem file that contain the product code "BASKET"
that is indexed in ZIP code order on the ZIP code field from
a related database.
Some caveats that apply when using partial indexes are:
1> Be very careful with GOTOs, because it is easy to
go to records that are not in the index.
2> Do not change anything that affects the key expression
while the index is open.
3> Do not try to use a partial index any time other than
immediately after its creation, or be aware that it may
point to records that do not really belong in the set you
think it points to.
4> A partial index require the use of a UDF and will bomb if
you try to look at them in DBU unless you are very
careful to never try to go past the ends of the file, or
in some cases, the end of the current screen.
Most of the problems that occur with partial indexes occur
because Clipper considers that an index that does not
contain an entry for every record is corrupt and gets highly
confused while trying to update it or figure out where to go
next from a record that is not contained in the index.
Usually this looks like a "corrupt index" error message, and
soon after, the DOS prompt.
Another use for a partial index is to view a subset of a
file in a DBEDIT window without having to deal with the
problems of trying to limit access to non-requested areas of
the file. Since Clipper is only aware of the records in the
index when looking at a file with an open index, DBEDIT will
only display the records in the partial index. The two
methods shown above will work just fine for creating a
partial index for this use, but if there is already an index
on the whole file that allows access to the subset that we
wish to look at,i.e. if I have an index on ZIP code and I
want to only look at those people who have the ZIP code
90543, there is an even faster way to generate that partial
index. The records I want can be easily located in the
file, and the following piece of code will easily store them
to an array so I can use them later.
USE names INDEX zip
SEEK "90543"
COUNT TO numrecs WHILE zip = "90543"
PRIVATE ary_of_recs[numrecs+1]
SEEK "90543"
FOR i = 1 TO numrecs
ary_of_recs = RECNO()
SKIP
NEXT
Now that I've got this array of record numbers I can create
an index that not only includes only those records I want in
the index, but I can create it by going only to the records
I want in the index. First of all, I sort the array using
ASORT so that all movement in the file is in the forward
direction and so that if by chance the last record in the
file is in the array, the creation of the index does not
abort prematurely. Clipper knows that as soon as it reaches
the last record, it's time to stop scanning the file for
keys and time to start building the index from the keys it's
found. In the following code I use the information in the
array to build the index. The variable "isindexing" is there
so that if the index is evaluated outside the INDEX ON,
problems with the array or the absence or bogus value of i
do not cause problems. I also store the value of LASTREC()
into the array so that Clipper will know it's time to stop
looking for keys and time to start building the index.
PUBLIC isindexing
ary_of_recs[numrecs+1] = LASTREC()
ASORT(ary_of_recs)
isindexing = .T.
i = 1
INDEX ON partial3() to temp
isindexing = .F.
SET FILTER TO zip = "90543"
FUNCTION partial3
IF isindexing
GOTO ary_of_recs[i]
i = i + 1
ENDIF
RETURN names->zip
This last method of creating partial indexes can be so fast,
that using them becomes almost mandatory in those places
where they are useful, things like limiting the scope in a
DBEDIT() window. And don't forget that this method, getting
all of the record numbers I want to put in the index before
I start indexing, could be applied to the example function,
partial2, though in that case the improvement might not be
quite so dramatic due to the time spent looking in other
files. And always remember that these ideas can be applied
in places and conditions other than these.
December 21, 2017
Add comments