File Archive

 
Output of file : CHAP11-1.DOC contained in archive : ASMTUT3.ZIP




104

CHAPTER 11 - ADDRESSING MODES AND POINTERS


In this chapter we are going to cover all possible ways of
getting data to and from memory with the different addressing
modes. Read this carefully, since it is likely this is the only
time you will ever see ALL addressing possibilities covered.

The easiest way to move data is if the data has a name and the
data is one or two bytes long. Take the following data:

; -----
variable1 dw 2000
variable2 db -26
variable3 dw -589
; -----

We can write:

mov variable1, ax
mov cl, variable2
mov si, variable3

and the assembler will write the appropriate machine code for
moving the data. What can we do if the data is more than two
bytes long? Here is some more data:

; -----
variable4 db "This is a string of ascii data."
variable5 dd -291578
variable6 dw 600 dup (-11000)
; -----

Variable4 is the address of the first byte of a string of ascii
data. Variable5 is a single piece of data, but it won't fit into
an 8086 register since it is 4 bytes long. Variable6 is a 600
element long array, with each element having the value -11000. In
order to deal with these, we need pointers.

Some of you will be flummoxed at this point, while those who are
used to the C language will feel right at home. A pointer is
simply the address of a variable. We use one of the 8086
registers to hold the address of a variable, and then tell the
8086 that the register contains the address of the variable, not
the variable itself. It "points" to a place in memory to send the
data to or retrieve the data from. If this seems a little
confusing, don't worry; you'll get the hang of it quickly.

As I have said before, the 8086 does not have general purpose
registers. Many instructions (such as LOOP, MUL, IDIV, ROL) work
only with specific registers. The same is true of pointers. You
may use only BX, SI, DI, and BP as pointers. The assembler will
give you an error if you try using a different register as a
pointer.

______________________

The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson




Chapter 11 - Addressing Modes 105
_____________________________



There are two ways to put an address in a pointer. For variable4,
we could write either:

lea si, variable4

or:

mov si, offset variable4

Both instructions will put the offset address of variable4 in
SI.{1} SI now 'points' to the first byte (the letter 'T') of
variable4. If we wanted to move the third byte of that array
(the letter 'i') to CL, how would we do it? First, we need to
have SI point to the third byte, not the first. That's easy:

add si, 2

But if we now write:

mov cl, si

we will generate an assembler error because the assembler will
think that we want to move the data in SI (a two byte number) to
CL (one byte). How do we tell the assembler that we are using SI
as a pointer? By enclosing SI in square brackets:

mov cl, [si]

since CL is one byte, the assembler assumes you want to move one
byte. If you write:

mov cx, [si]

then the assembler assumes that you want to move a word (two
bytes). The whole thing now is:

lea si, variable4
add si, 2
mov cl, [si]

This puts the third byte of the string in CL. Remember, if a
register is in square brackets, then it is holding the ADDRESS of
a variable, and the 8086 will use the register to calculate where
the data is in memory.

What if we want to put 0s in all the elements of variable6?
____________________

1 LEA stands for load effective address. Note that with LEA,
we use only the name of the variable, while with:

mov si, offset variable4

we need to use the word 'offset'. The exact difference between
the two will be explained later.




The PC Assembler Tutor 106
______________________

Here's the code:

mov bx, offset variable6
mov ax, 0
mov cx, 600
zero_loop:
mov [bx], ax
add bx, 2
loop zero_loop

We add 2 to BX each time since each element of variable6 is a
word (two bytes) long. There is another way of writing this:

mov bx, offset variable6
mov cx, 600
zero_loop:
mov [bx], 0
add bx, 2
loop zero_loop

Unfortunately, this will generate an assembler error. Why? If the
assembler sees:

mov [bx], ax

it knows that you want to move what is in AX to the address in
BX, and AX is one word (two bytes) long so it generates the
machine code for a word move. If the assembler sees:

mov [bx], al

it knows that you want to move what is in AL to the address in
BX, and AL is one byte long, so it generates the machine code for
a byte move. If the assembler sees:

mov [bx], 0

it doesn't know whether you want a byte move or a word move. The
8086 assembler has implicit sizing. It is the assembler's job to
look at each instruction and decide whether you want to operate
on a byte or a word. Other microprocessors do things differently.
On the Motorola 68000, the assembler uses explicit sizing. Each
instruction must explicitly state whether it is a byte or a
word.{2} On the 68000 you have:

move.b #213, (A1)
move.w #213, (A1)

The first instruction says to move a byte (the number 213) to the
address in register A1 while the second instruction says to move



____________________

2 Any of you who use the 68000 assembler know that this is
fudging the facts a little bit.




Chapter 11 - Addressing Modes 107
_____________________________

a word (the number 213) to the address in register A1.{3}

Back to the 8086. If the 8086 assembler looks at an instruction
and it can't tell whether you want to move a byte or a word, it
generates an error. When you use pointers with constants, you
should explicitly state whether you want a byte or a word. The
proper way to do this is to use the reserved words BYTE PTR or
WORD PTR.

mov [bx], BYTE PTR 213
mov [bx], WORD PTR 213

These stand for byte pointer and word pointer respectively. I
find this terminology exceptionally clumsy, but that's life.
Whenever you are moving a constant with a pointer, you should
specify either BYTE PTR or WORD PTR.

The Microsoft assembler makes some assumptions about the size of
a constant. If the number is 256 or below (either positive or
negative), you MUST explicitly state whether it is a byte or a
word operation. If the number is 257 or above (either positive or
negative), the assembler assumes that you want a word operation.

Here's the previous code rewritten correctly:


mov bx, offset variable6
mov cx, 600
zero_loop:
mov [bx], WORD PTR 0
add bx, 2
loop zero_loop

Let's add 435 to every element in the variable6 array:

mov bx, offset variable6
mov cx, 600
add_loop:
add [bx], WORD PTR 435
add bx, 2
loop add_loop

How about multiplying every element in the array by 12?

mov di, offset variable6
mov cx, 600
mov si, 12
mult_loop:
mov ax, [di]
imul si
mov [di], ax
add di, 2
loop mult_loop

____________________

3 A1 is a 68000 register.




The PC Assembler Tutor 108
______________________

None of these examples did any error checking, so if the result
was too large, the overflow was ignored. This time we used DI for
a change of pace. Remember, we may use BX, SI, DI or BP, but no
others. You will notice that in all these examples, we started at
the beginning of the array and went step by step through the
array. That's fine, and that's what we normally would do, but
what if we wanted to look at individual elements? Here's a sample
program:

; + + + + + START DATA BELOW THIS LINE
;
poem_array db "She walks in Beauty, like the night"
db "Of cloudless climes and starry skies;"
db "And all that's best of dark and bright"
db "Meet in the aspect ratio of 1 to 3.14159"
character_count db 149
; + + + + + END DATA ABOVE THIS LINE

; + + + + + START CODE BELOW THIS LINE

mov bx, offset poem_array
mov dl, character_count

character_loop:
sub ax, ax ; clear ax
call get_unsigned_byte
dec al ; character #1 = array[0]
cmp al, dl ; out of range?
ja character_loop ; then try again
mov si, ax ; move char # to pointer register
mov al, [bx+si] ; character to al
call print_ascii_byte
jmp character_loop

; + + + + + END CODE ABOVE THIS LINE

You enter a number and the program prints the corresponding
character. Before starting, we put the array address in BX and
the maximum character count in DL. After getting the number from
get_unsigned_byte, we decrement AL since the first character is
actually poem_array[0]. The character count has been reduced by 1
to reflect this fact. It also makes 0 an illegal entry. Notice
that the program checks to make sure you don't go past the end of
the poem. This time we use BX to mark the beginning of the array
and SI to count the number of the character.

Once again, there are only specific combinations of pointers that
can be used. They are:

BX with either SI or DI (but not both)
BP with either SI or DI (but not both)

My version of the Microsoft assembler (v5.1) recognizes the forms
[bx+si], [si+bx], [bx][si], [si][bx], [si]+[bx] and [bx]+[si] as
the same thing and produces the same machine code for all six.






Chapter 11 - Addressing Modes 109
_____________________________

We can get even more complicated, but to show that, we need
structures. In databases they are called records. In C they are
called structures; in any case they are the same thing - a group
of different types of data in some standard order. After the
group is defined, we usually make an array with the identical
structure for each element of the array.{4} Let's make a
structure for an address book.

last_name db 15 dup (?)
first_name db 15 dup (?)
age db ?
tel_no db 10 dup (?)

In this case, all the data is bytes, but that is not necessary.
It can be anything. Each separate piece of data is called a
FIELD. We have the last_name field, the first_name field, the age
field, and the tel_no field. Four fields in all. The structure is
41 bytes long. What if we want to have a list of 100 names in our
telephone book? We can allocate memory space with the following
definition:

address_book db 100 dup ( 41 dup (' ')) {5}

Well, that allocates room in memory, but how do we get to
anything? First, we need the array itself:

mov bx, offset address_book

Then we need one specific entry. Let's take entry 29 (which is
address_book[28]). Each entry is 41 bytes long, so:

mov ax, 28 ; entry (less 1)
mov cx, 41 ; entry length
mul cx
mov di, ax ; move to pointer

That gives us the entry, but if we want to get the age, that's
not the first byte of the structure, it's the 31st byte (actually
address_book[28] + 30 since the first byte is at +0). We get it
by writing:

mov dl, [bx+di+30]

This is the most complex thing we have - two pointers plus a
constant. The total code is then:

mov bx, offset address_book
mov ax, 28 ; entry (less 1)
mov cx, 41 ; entry length
____________________

4 If you don't know about structures or records, now would be
a good time to stop and go to a reference book about them. They
are not actually covered here.

5 Nesting of dup statements is allowed. Rather than having
uninitialized data, this has blanks in all the spaces.




The PC Assembler Tutor 110
______________________

mul cx ; entry offset from array[0]
mov di, ax ; move entry offset to pointer
mov dl, [bx+di+30] ; total address

Though the machine code has only one constant in the code, the
assembler will allow you to put a number of constants in the
assembler instruction. It will add them together for you and
resolve them into one number.{6}

Once again, there are a limited number of registers - they are
the same registers as before:

BX with either SI or DI (but not both) plus constant
BP with either SI or DI (but not both) plus constant

We can work with structures on the machine level, but it looks
like it's going to be hard to keep track of where each field is.
Actually, it isn't so bad because of:

OUR FRIEND, THE EQU STATEMENT

The assembler allows you to do substitution. If you write:

somestuff EQU 37 * 44

then every place that the assembler finds the word "somestuff",
it will substitute what is on the right side of the EQU. Is that
a number or text? Sometimes it's a number, sometimes it's text.
Here are four statements which are defined totally in terms of
numbers. This is from the assembler listing. (The assembler lists
how it has evaluated the EQU statement on the left after the
equal sign.)




= 0023 statement1 EQU 5 * 7
= 0025 statement2 EQU statement1 + 6 - 4
= 000F statement3 EQU statement2 - 22
= 001F statement4 EQU statement3 + 16

and the assembler thinks of these as numbers (these numbers are
in hex). Now in the next set, with only a minor change:


= [bp + 3] statement1 EQU [bp + 3]
= [bp + 3] + 6 - 4 statement2 EQU statement1 + 6 - 4
= [bp + 3] + 6 - 4 - 22 statement3 EQU statement2 - 22
____________________

6 And it does it quite well. The assembler correctly evaluated
the following:

add ax, (-3*81)+44/8+[si+27]+6+[bx]-7+(43*96)-2

Not bad, huh?





Chapter 11 - Addressing Modes 111
_____________________________

= [bp + 3] + 6 - 4 - 22 + 16 statement4 EQU statement3 + 16

the assembler thinks of it as text. Obviously, the fact that it
can be either may cause you some problems along the way. Consult
the assembler manual for ways to avoid the problem.


Now we have a tool to deal with structures. Let's look at that
structure again.

last_name db 15 dup (?)
first_name db 15 dup (?)
age db ?
tel_no db 10 dup (?)

We don't actually need a data definition to make the structure,
we need equates:

LAST_NAME EQU 0
FIRST_NAME EQU 15
AGE EQU 30
TEL_NO EQU 31

this gives us the offset from the beginning of each record. If we
again define:

address_book db 100 dup ( 41 dup (' '))

then to get the age field of entry 87, we write:

mov bx, offset address_book
mov ax, 86 ; entry (less 1)
mov cx, 41 ; entry length
mul cx ; entry offset from array[0]
mov di, ax ; move entry offset to pointer
mov dl, [bx+di+AGE] ; total address

This is a lot of work for the 8086, but that is normal with
complex structures. The only thing that takes a lot of time is
the multiplication, but if you need it, you need it.{7}

How about a two dimensional array of integers, 60 X 40

int_array dw 40 dup ( 60 dup ( 0 ))

These are initialized to 0. For our purposes, we'll assume that
the first number is the row number and the second number is the
column number; i.e. array [6,13] is row 6, column 13. We will
have 40 rows of 60 columns. For ease of calculation, the first
array element is int_array [0,0]. (If it is your array, you can




____________________

7 You will see more of the EQU statement.




The PC Assembler Tutor 112
______________________

set it up any way you want {8}). Each row is 60 words (120 bytes)
long. To get to int_array [23, 45] we have:

mov ax, 120 ; length of one row in bytes
mov cx, 23 ; row number
mul cx
mov bx, ax ; row offset to bx
mov si, 45 ; column offset
sal si, 1 ; multiply column offset by 2 (for word size)
mov dx, [bx+si] ; integer to dx

Using SAL instead of MUL is about 50 times faster. Since most
arrays you will be working with are either byte, word, or double
word (4 bytes) arrays, you can save a lot of time. Let
ELEMENT_NUMBER be the array number (starting at 0) of the desired
element in a one-dimensional array. For byte arrays, no
multiplication is needed. For a word:

mov di, ELEMENT_NUMBER
sal di,1 ; multiply by 2

and for a double word (4 bytes):

mov di, ELEMENT_NUMBER
sal di, 1
sal di, 1 ; multiply by 4

This means that a one-dimensional array can be accessed very
quickly as long as the element length is a power of 2 - either 2,
4 or 8. Since the standard 8086 data types are all 1, 2, 4, or 8
bytes long, one dimensional arrays are fast. Others are not so
fast.


As a quick review before going on, these are the legal ways to
address a variable on the 8086:

(1) by name.

mov dx, variable1

It is also possible to have name + constant.

mov dx, variable1 + 27

The assembler will resolve this into a single offset number
and will give the appropriate information to the linker.

(2) with the single pointers BX, SI, DI and BP (which are
enclosed in square brackets).

mov cx, [si]
____________________

8 Bearing in mind that all compiled languages have fixed
formats for arrays. If you want your array to interact with C,
Fortran, Pascal or Basic, you'd better be sure you have the right
format.




Chapter 11 - Addressing Modes 113
_____________________________

xor al, [bx]
add [di], cx
sub [bp], dh

(3) with the single pointers BX, SI, DI and BP (which are
enclosed in square brackets) plus a constant.

mov cx, [si+421]
xor al, 18+[bx]
add 93+[di]-7, cx
sub (54/7)+81-3+[bp]-19, dh

(4) with the double pointers [bx+si], [bx+di], [bp+si],
[bp+di] (which are enclosed in square brackets).

mov cx, [bx][si]
xor al, [di][bx]
add [bp]+[di], cx
sub [di+bp], dh

(5) with the double pointers [bx+si], [bx+di], [bp+si],
[bp+di] (which are enclosed in square brackets) plus a
constant.

mov cx, [bx][si+57]
xor al, 45+[di+23][bx+15]-94
add [bp]+[di]-444, cx
sub [6+di+bp]-5, dh

These are ALL the addressing modes allowed on the 8086. As for
the constants, it is the ASSEMBLER'S job to resolve all numbers
in the expression into a single constant. If your expression
won't resolve into a constant, it is between you and the
assembler. It has nothing to do with the 8086 chip.