Category : Assembly Language Source Code
Archive   : ASMTUT2.ZIP
Filename : BAS2-1.DOC

 
Output of file : BAS2-1.DOC contained in archive : ASMTUT2.ZIP




mmvii

BASIC II - INTERFACING BASIC WITH ASSEMBLER


Have you finished reading all the chapters? If not, go back and
do them, then come back to this when you are done. This chapter
assumes you know about segments, subroutines, and the general
information about linking subroutines to high-level languages.

In order to do this appendix I had to dust off my old QuickBASIC
3.0. If you have QuickBASIC 4.x, some things will have been
updated. If you have TurboBASIC, the subroutine conventions are
different. However, the structure will be the same. You will have
to consult your manual for exact details. If you are trying to
this with the interpreter that came with DOS, I have a simple
comment -> Forget it! I won't go into the details, but the BASIC
interpreter is so much slower (about 10 times slower) and so much
more difficult to use with assembler that I won't even cover it.
There is no reason not to use one of the compiled BASICs if BASIC
is your language of choice. This material only covers how to deal
with COMPILED BASIC.


In BASIC, all individual numeric data, strings, "static" arrays
and the stack must fit into one 64k segment. The word 'segment'
here has the same meaning as in assembler. Both the DS register
and the SS register are set to this segment, and must stay set to
this segment whenever BASIC has control of the program. "Dynamic"
arrays can be located somewhere else in memory.

You allocate a "static" array with a constant number as a
dimension:

DIM array1! (277), array2% (346), array3$ (500)

and you allocate a "dynamic" array by using a variable to
dimension the array:

length1% = 277
length2% = 346
length3$ = 500

DIM array1!(length1%), array2%(length2%), array3$(length3%)

Even though the first and second dimension statements produce the
same size and type arrays, the first ones must be located inside
DS and the second ones can be located outside of DS.

"Static" means that once the array is defined, its length and
number of dimensions cannot be changed for the rest of the
program. It will occupy a specific amount of space for the rest
of the program. "Dynamic" means that you can change the length of
the array whenever you want to. You do this with:

REDIM array1! (495)

______________________

The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson




The PC Assembler Tutor mmviii
______________________


BASIC does this by deallocating space for the old array and then
reallocating space for the new array. All the old information is
lost. There are certain restrictions. You cannot change the
number of dimensions in an array (if it starts out with 2
dimensions like DIM A!(47,63), it must always have 2 dimensions).

In order to understand BASIC's memory strategy, we need to look
at strings, the reason for it all.{1} The limit for a single
string is 32,767 bytes. If the total amount of data you can have
in the DS segment is only 65536 bytes, how does BASIC allocate
memory so you can have long strings without runnung out of space?
It uses only as much space as it needs. Let's define 3 strings
(the dots will indicate a space):

mystring$ = "You.say.either"
yourstring$ = "And.I.say.either"
ourstring$ = "Let's.call.it.off"

After defining these three strings one after the other, memory
will look like this:

17150
|You.say.eitherAnd.I.say.eitherLet's.call.it.off|

(For clarity, the memory image will be between the '|'s and each
row will be 50 bytes long. The next row down would start at
17200). For our example we will assume that this data starts at
memory location 17150.

There is no empty space. How does BASIC know where and how long
mystring$ is? It has something called a string descriptor. This
is a two word (4 byte) block, also in DS, which says exactly
where and how long the string is. The first word is the length
and the second word is the location (offset) in DS.

From BASIC's view, we have:

STRING DESCRIPTOR
length:location

mystring$ 14:17150 -> |You.say.either|
yourstring$ 16:17164 -> |And.I.say.either|
ourstring$ 17:17180 -> |Let's.call.it.off|

Now let's change one of the strings:

yourstring$ = "But.oh!,.If.we.call.the.whole.thing.off"

We now have a problem. The current "yourstring$" is only 16 bytes
long, but the new one is 39 bytes long. What does BASIC do? It
(1) deallocates the space for the old "yourstring", (2) allocates
new space for the new string and (3) updates the string
____________________

1. This is an outline of what BASIC does, but it will not
include the parts of memory management that you will never see.




BASIC II - Interfacing BASIC With Assembler mmix
___________________________________________

descriptor. Memory will now look like this:

17150
|You.say.either Let's.call.it.offBut|
|.oh!,.If.we.call.the.whole.thing.off|

and the descriptors will now look like this:


STRING DESCRIPTOR
length:location

mystring$ 14:17150 -> |You.say.either|
yourstring$ 39:17197 -> |But.oh!,.if.we...
ourstring$ 17:17180 -> |Let's.call.it.off|

BASIC is aware that there is an empty block of space and has a
strategy for dealing with empty spaces, though each BASIC has its
own strategy. We don't know exactly WHEN it will take action, but
we do know WHAT action it will take. At some point BASIC will
decide that it has too many empty spaces in memory and will
REORGANIZE the segment. This is known as GARBAGE COLLECTION.
Exactly how this is done is up to the person who wrote the BASIC
compiler/interpreter.

After reorganization, the addresses of ALMOST ALL strings and
MANY dynamic arrays will have changed. The string locations
themselves will have changed, but the string descriptors will
still be in the same place in DS, and they will have been
updated. Here is the new memory:


12724
|You.say.eitherLet's.call.i|
|t.offBut.oh!,.If.we.call.the.whole.thing.off|

and here are the updated descriptors:

STRING DESCRIPTOR
length:location

mystring$ 14:12724 -> |You.say.either|
yourstring$ 39:12755 -> |But.oh!,.if.we...
ourstring$ 17:12738 -> |Let's.call.it.off|

The strings have been moved several thousand bytes from where
they were just a second ago. The information that was in the
string descriptors a second ago is no longer valid. Old
information about dynamic arrays is also unreliable. This means
that if you have a subroutine written in assembler, you must get
any address information at the time the subroutine is called.
We'll come back to this later.


Let's go on to data input and output. When you first started





The PC Assembler Tutor mmx
______________________

doing BASIC, you did i/o using only:

WRITE #1, my.data!

Perhaps you you do it differently now, perhaps not. In any case,
you need to know about i/o speed and how different file i/o
works. Here's the simplest file output:

***********************************
DIM large.array! (10000)

FOR i% = 1 to 10000
large.array! (i%) = 2.1
NEXT i%

OPEN "2-1.doc" for output as # 1
PRINT time$
FOR i% = 1 to 10000
WRITE #1, large.array! (i%)
NEXT i%
PRINT time$
CLOSE #1
***********************************

Of course, to make it a challenge we are going to write an array
of 10,000 numbers. How long does it take?{2} For this output it
took 38 seconds. The same program, inputting the same data with:

INPUT #1, large.array(i%)

took 49 seconds. These are fairly large amounts of time. But
wait, it gets worse. Let's change one line of the above program:

large.array! (i%) = 2.1678319E+19

This is a different constant which is put into each element of
the array. How long does output take now? 59 seconds. And input?
a whopping 79 seconds! What's going on here?

When you do i/o with INPUT #, WRITE # or PRINT #, it is exactly
like doing i/o to the screen. For output, BASIC converts the
binary numbers into TEXT and then writes the TEXT to the disk.
When it does input, it reads the TEXT from the disk and converts
the TEXT into a binary number. Here is the beginning of the
output file from the first example:

2.1
2.1
2.1
2.1
____________________

2. All times from now on are with a slower PC with a slower
hard disk, but an 8087. Since these are floating point numbers,
your results should be slower if you don't have an 80x87, while
if you have an 80386 with an 80387 and a fast hard disk, your
times will be much faster.




BASIC II - Interfacing BASIC With Assembler mmxi
___________________________________________

2.1

Each data item has been converted into "2.1" + CHR$(13) +
CHR$(10). These last two things are a carriage return on the IBM
PC. That's (5 bytes X 10000 items) plus 1 byte for the end of
file marker, or 50001 bytes:

2-1 DOC 50001 6-29-90 12:39p

Here's the beginning of the output file from the second example:

2.167832E+19
2.167832E+19
2.167832E+19
2.167832E+19
2.167832E+19

Each data item has been converted into "2.167832E+19" + CHR$(13)
+ CHR$(10). That's (14 bytes * 10000 items) plus 1 byte for the
end of file marker, or 140001 bytes:

2-1E19 DOC 140001 6-29-90 12:47p

These files are unnecessarily large, and i/o is slow: if you
don't have an 8087 and you are doing floating-point i/o, it can
be slower still.

Can we do it faster? Yes. Using GET and PUT, we get a certain
number of bytes from the disk, then transfer them to the array.
Some of you have never used random access i/o, so this is a brief
summary.

When you open a file as text (as we did in the above examples),
BASIC divides the text by looking for carriage returns. When you
open a file as a random access file, you are telling BASIC that
you want to divide the file into distinct blocks of information.
It may be text or it may be something else - BASIC doesn't care.
If you say nothing, BASIC assumes that you want the blocks to be
128 bytes long, but the length can be anything.

In the example that we will do, we will use 1024 byte blocks
because that is exactly 2 disk sectors long, so the disk can read
information easily and efficiently. If we had a block length of 4
bytes, the disk would have to do 10000 disk writes; that would be
very slow and be hard on the disk. Here's how we open the file:

OPEN "packed.doc" for RANDOM as #1 LEN = 1024

This will be a random access file and the block length will be
1024 bytes. When you tell it to read or write, it will do it 1024
bytes at a time. That is getting faster.

Where is the block of data that it is going to write to disk?
Here life starts getting complicated, so I hope you have
understood everything that we have done so far. When you open a
file, BASIC assigns it a BUFFER. The buffer has a fixed length
(either 128 bytes or the length you have designated), and is




The PC Assembler Tutor mmxii
______________________

located somewhere in the DS data segment along with the numbers
and strings. Like a string, it is relocatable. We need a way to
pin it down. The easy and nice way would be if it were an array
and we cound address it like an array:

buffer#1 (45) = 20

We are not that lucky. The only thing you can do is overlay a
template on the buffer, and work from the template. This template
MUST be made up of strings. We make up the template with a FIELD
statement.

FIELD #1, 1024 AS out.string$

The FIELD statement starts out with the file # followed by a list
of strings and the length of each string.

FIELD #1, 100 AS string1$, 200 AS string2$, 300 AS string3$

The total length of the strings may be shorter than the buffer,
but may not be longer than the buffer. What does the FIELD
statement do? The first thing that it does is set the string
descriptor for all of these strings. Let's say that at the moment
file #1 buffer is at 46217:

STRING DESCRIPTOR
length:location

string1$ 100:46217
string2$ 200:46317
string3$ 300:46517

The first string starts at the first byte of the buffer. The
second string starts right where the first string ends and the
third string starts right where the second string ends. This is
true for any FIELD statement, no matter how many strings are
defined. Because of the way BASIC does memory management, if it
moves the buffer, it will also update these string descriptors to
point to the same relative places in the buffer. These string
descriptors are on auto pilot.

Suppose now that we have the following string:

"Let's get physical"

and we want to write it to disk as string1$. All we need to do
is:

string1$ = "Let's get physical"

Right? No, that's very, very, very wrong. What you have just done
is alter the string descriptor of string1$ to point to an
entirely different place in memory. The string descriptors are
now:

string1$ 18:58902
string2$ 200:46317




BASIC II - Interfacing BASIC With Assembler mmxiii
___________________________________________

string3$ 300:46517

BASIC deallocated the space for string1, reallocated it somewhere
else in memory, and changed the file descriptor. Not only is
string1 in a different place in memory, but BASIC may think that
part of the file #1 buffer is actually empty space, and the next
time it reorganizes memory, who knows what is going to happen.
From the moment you define strings in a FIELD statement until the
time you close the corresponding file, you can NEVER have them on
the left side of an equal sign. Having them on the left side is
sure to change the file descriptor.

How are we going to transfer data to these strings? There are
three special operators in BASIC - LSET, MID$ and RSET. Their job
is to put something into a string without altering the string
length or location (i.e. without altering the string descriptor).

LSET string1$ = "Let's get physical"
MID$ (string1$,17) = "Let's get physical"
RSET string1$ = "Let's get physical"

LSET will insert the string at the very left of string1, RSET
will insert the string at the very right of string1, and MID$
will insert the string starting at the 17th byte of string1.

This is the strategy for all random access i/o in BASIC. We:

1) open a file as RANDOM and declare a block size.
2) define some "fixed length" strings inside the buffer with
a FIELD statement.
3) insert data in the strings using LSET, RSET or MID$. This
is true whether the data is strings or numbers.

There's only one problem left. For LSET, RSET and MID$, the thing
on the RIGHT side of the equal sign must be a string. You can't
have:

LSET string1$ = number!

It's illegal. To counter this, BASIC has some pseudo-functions.
Let's take integers as an example:

a.string$ = MKI$ (number%)
number% = CVI (a.string$)

MKI$ doesn't actually DO anything. It just tells BASIC that it is
o.k. to move two bytes from "number%" to "a.string$". The bytes
are binary data and are moved unaltered. Similarly, CVI tells
BASIC that it is alright to move two bytes of binary data from
"a.string$" to "number%". We are tricking BASIC into moving
binary data from one data type to another. This is simply data
movement, and there is no data conversion. The forms are:

NUMERIC DATA MOVE TO STRING FROM STRING

integer <-> string MKI$ CVI
long integer <->string MKL$ CVL




The PC Assembler Tutor mmxiv
______________________

single precision <-> string MKS$ CVS
double precision <-> string MKD$ CVD

In contrast, the functions STR$ and VAL convert text
representations to binary representations and binary
representations to text representations. This is the same as what
happens with PRINT and INPUT. Here's a program:

**********************************************
number! = 2.1678319E+19
binary.string$ = MKS$ (number!)
text.string$ = STR$ (number!)
PRINT LEN(text.string$), LEN(binary.string$)
PRINT text.string$, binary.string$
**********************************************

and here's the output:

13 4
2.167832E+19 nl _

You probably won't be able to see all of that last output on your
printer because it is four bytes long and the number is:

6E6C965F hex or 110, 108, 150, 95 decimal

The third byte is outside of ASCII 33-127, the standard ASCII
characters.

STR$ gives us the text representation of the number, while MKS$
stuffs the binary representation of a number into a string. In
the opposite direction, VAL gives us the numeric value of a text
string (if it has a numeric representation), while CVS stuffs 4
binary bytes from a string into a single precision number.

STR$ from binary value to text representation
VAL from text representation to binary value

Note that STR$ can convert ANY type of number to a text string
and VAL can convert a text string to ANY type of number, while
CVI, CVL, CVS, CVD, MKI$, MKL$, MKS$, and MKD$ can only stuff a
specific type of number into a string or a string into a specific
type of number.

We want our output program to stuff the binary value from a
single precision number to selected bytes of a string. To stuff a
floating-point number into string1$ above, all we need to do is:

LSET string1 = MKS$ ( number!)

The following program has a single string which is the size of
the entire buffer, and we are going to stuff the single precision
numbers in one at a time with MID$.

************************************************************
number% = 10240
DIM large.array! (number%)




BASIC II - Interfacing BASIC With Assembler mmxv
___________________________________________


FOR i% = 1 to 10240
large.array! (i%) = 2.1678319e+19
NEXT i%

OPEN "packed.doc" for RANDOM as #1 LEN = 1024
FIELD #1 , 1024 AS out.string$

PRINT time$
k% = 0
record.count% = 0
FOR i% = 1 to 40
record.count% = record.count% + 1
spot% = 1
FOR j% = 1 to 256
k% = k% + 1
MID$ (out.string$,spot%,4) = MKS$ (large.array!(k%))
spot% = spot% + 4
NEXT j%
PUT #1, record.count%
NEXT i%
PRINT time$
CLOSE #1
***********************************************************


The array length has been increased slightly so that we have an
exact number of blocks. We use MID$ to make sure that the string
descriptor for out.string$ does not get changed. Each file write
will be (256 numbers * 4 bytes/number) 1024 bytes long. We start
with the first record and increase the record number by 1 each
time we write. Does this increase the speed any? Well, this takes
11 seconds.

TYPE OUTPUT INPUT

num <-> text 38 - 59 sec 49 - 79 secs
num <-> bin. string 11 sec 11 sec

I didn't show you the equivalent input routines but here are the
times they took. Note that the complexity of the single precision
number has no effect on the last (the binary) routine. Also, the
last routine does not suffer if there is no 8087. If you are
running an 80286 with a fast hard disk, this last routine should
only take a second or two. Here are the file sizes:

2-1 DOC 50001 6-29-90 12:39p
2-1E19 DOC 140001 6-29-90 12:47p
PACKED DOC 40960 6-29-90 1:08p

The first two are the different sizes depending on whether the
constant was 2.1 or 2.1678319E+19. The last one is for our last
routine. Notice that it is more compact.

Can we do any better than 11 seconds? Yes, but we need to take
over disk i/o and we need to know a few more things before we do
that.




The PC Assembler Tutor mmxvi
______________________



LOCATION OF DATA

BASIC is designed to pass subroutines the location of the data,
not the data itself. This is called passing by reference. Though
it is possible to pass the data itself, there are certain
problems with the stack if you do.{3} We will always pass the
addresses.

All single numeric variables are in the DS segment. BASIC passes
the offset address of these variables in DS (1 word).

All strings are in the DS segment. Their string descriptors are
also in the DS segment. BASIC always passes the offset address of
the STRING DESCRIPTOR. This, in fact, is what we want. We need to
know both where the string is and how long it is. If we write
past the end of the string we may destroy BASIC's memory
management system.

Static arrays are in the DS segment but dynamic arrays can be
anywhere. If we want to write a general purpose routine with
arrays, we need to handle them no matter where they are.

BASIC has a special function called VARPTR that tells you where a
variable is in memory. Here's a program that uses it for a couple
of variables:

***********************************************************
' check out the use of varptr
n% = 5000
p% = 50
DIM b!(800),a!(900)
DIM d!(n%), c!(p%)

mystring$ = "What's up, doc?"
addressA! = varptr (n%)
addressB! = varptr (p%)
address1! = varptr (a!(0))
address2! = varptr (b!(0))
address3! = varptr (c!(0))
address4! = varptr (d!(0))
address5! = varptr (mystring$)
PRINT addressA!, addressB!
PRINT address1!, address2!, address3!, address4!, address5!
***********************************************************

It gives us the addresses of all sorts of things. a!() and b!()
are static arrays, so they should be in the DS segment. c!() and
d!() are dynamic arrays, so they might be anywhere. Remember, the
DS segment is from offset 0 to offset 65535. Let's see where they
____________________

3. If you make a mistake and pass a single precision number
instead of an integer, you will pass 4 bytes instead of 2. From
that moment on the stack will have 2 extra bytes on it and you
won't know where they came from.




BASIC II - Interfacing BASIC With Assembler mmxvii
___________________________________________

are:

6230 6232
9438 6234 87616 67584 13062

The individual numbers are in DS, and the two static arrays are
in DS, but c!() and d!() are outside of DS. These numbers tell us
the address relative to the start of DS, but we don't know where
DS is at the moment. Where exactly are these arrays? It would be
tedious to pass the subroutine these numbers because they are
floating-point numbers and would be very difficult to deal with.

QuickBASIC has a function called PTR86. It is in an external
object file called INT86.OBJ.{4} This object file has the
routines that you need if you want to do interrupts from BASIC
itself. We'll come back to that. PTR86's job is to take the
floating-point number which we got from VARPTR, add the starting
address of the DS segment to get an absolute address in memory,
and then calculate both a segment and an offset for that address
in memory. The segment will always be the highest segment that
contains the first byte of the variable or array and the offset
will always be a number from 0 to 15.

In order to use an object file from inside of QuickBASIC you need
to put it in a library file and then load the library file when
starting QuickBASIC.

Building the library file is quite easy. QuickBASIC comes with a
program called BUILDLIB.EXE which builds the library for you. For
now, you need only INT86.OBJ and PREFIX.OBJ in your library.{5}
Put these two things in every library that you build from now on.
PREFIX.OBJ insures proper segment ordering in the executable
file.

>buildlib int86+prefix

This will create a library with the default name USERLIB.EXE. To
load a library with this default file name, just put '/l' on the
command line:

>qb /l

If you have given the library a different name like XQRTYF.EXE,
then put that name after the '/l':

>qb /lXQRTYF.EXE

These object files will now be loaded and their subroutines will
be usable from inside BASIC.



____________________

4. PTR86 has been replaced by VARSEG in QuickBASIC 4.0.

5. Both of these object files come with your QuickBASIC.