Category : Assembly Language Source Code
Archive   : ASMTUT3.ZIP
Filename : CHAP11-2.DOC

 
Output of file : CHAP11-2.DOC contained in archive : ASMTUT3.ZIP



The PC Assembler Tutor 114
______________________

We can consolidate all this information into the following list:

All the following addressing modes can be used with or
without a constant:

variable_name (+constant)
[bx] (+constant)
[si] (+constant)
[di] (+constant)
[bp] (+constant)
[bx+si] (+constant)
[bx+di] (+constant)
[bp+si] (+constant)
[bp+di] (+constant)

This is a complete list.

Thus, you can access a variable by name or with one of the eight
pointer combinations. There are no other possibilities.


One thing that may confuse you about an addressing statement is
all the plusses and minuses. As an example:

mov cx, -45+27[bx+22]+[-195+di]+23-44

the total address is:

-45+27[bx+22]+[-195+di]+23-44

When the 8086 performs this instruction, it will ADD (1) BX (2)
DI and (3) a single constant. That single constant can be a
positive or a negative number; the 8086 will ADD all three
elements. The '+' in front of 'di' is for convenience of the
assembler only; [-195-di] is illegal and the assembler will
generate an error. If you actually want the negative of what is
in one of the registers, you must negate it before calling the
addressing instruction:

neg di
mov cx, -45+27[bx+22]+[-195+di]+23-44

once again, the only allowable forms are +[di], [di] or [+di].
Either -[di] or [-di] will generate an assembler error.


If you ever see a technical description of the addressing modes,
you will find a list of 24 different machine codes. The reason
for this is that:

[bx]
[bx] + byte constant
[bx] + word constant

are three different machine codes. Here is a listing of the same
machine instruction with the three different styles:





Chapter 11 - Addressing Modes 115
_____________________________


MACHINE CODE ASSEMBLER INSTRUCTION

03 04 add ax, [si]
03 44 1B add ax, [si+27]
03 44 E5 add ax, [si-27]
03 84 5BA7 add ax, [si+23463]
03 84 A459 add ax, [si-23463]


(27d = 1Bh , 23463d = 5BA7h). The first byte of code (03) is the
add (word) instruction. The second byte is the addressing code,
and the third and fourth bytes (if any) are the constant (in
hex). Addressing code 04 is: (ax, [si]). Addressing code 44 is:
(ax, [si] + byte constant). Addressing code 84 is: (ax, [si] +
word constant). The fact that there are three different machine
codes is of concern to the assembler, not to you. It is the
assembler's job to make the machine code as efficient as
possible. It is your job to write quality, robust code.


SEGMENT OVERRIDES

So far, we haven't talked about segment registers. You will
remember from the last chapter that the 8086 assumes that a named
variable is in the DS segment:

mov ax, variable1

If it isn't, the Microsoft assembler puts the correct segment
override in the machine code. The segment overrides are:

SEGMENT OVERRIDE MACHINE CODE (hex)
CS 2E
DS 3E
ES 26
SS 36

As an example:

MACHINE CODE ASSEMBLER INSTRUCTIONS

2E: 03 06 0000 R add ax, variable3
26: 2B 1E 0000 R sub bx, variable2
31 36 0000 R xor variable1, si ; no override
36: 21 3E 00C8 R and variable4, di

when the different variables were in segments with different
ASSUME statements. If you don't remember this, you should reread
the section on overrides in the last chapter. Remember, the colon
is in the listing only to tell you that we have a segment
override. The colon is not in the machine code.









The PC Assembler Tutor 116
______________________

What about pointers? The natural segment for anything with [bp]
is SS, the stack segment.{1} Everything else has DS as its
natural segment. The natural segments are:

(1) DS

variable + (constant)
[bx] + (constant)
[si] + (constant)
[di] + (constant)
[bx+si] + (constant)
[bx+di] + (constant)


(2) SS

[bp] + (constant)
[bp+si] + (constant)
[bp+di] + (constant)

where the constant is always optional. Can you use segment
overrides? Yes, in all cases.{2} Here is some assembler code
along with the machine code which was generated.


MACHINE CODE ASSEMBLER INSTRUCTIONS

26: 03 07 add ax, es:[bx]
2E: 01 05 add cs:[di], ax
36: 2B 44 11 sub ax, ss:[si+17]
2E: 29 46 00 sub cs:[bp], ax
3E: 33 03 xor ax, ds:[bp+di]
26: 31 02 xor es:[bp+si], ax
26: 89 43 16 mov es:[bp+di+22], ax


03 04 add ax, [si]
03 44 1B add ax, [si+27]
03 84 A459 add ax, [si-23463]
26: 03 04 add ax, es:[si]
26: 03 44 1B add ax, es:[si+27]
26: 03 84 A459 add ax, es:[si-23463]


(17d = 11h, 22d = 16h, 27d = 1Bh, -23463d = 0A459h). The first
number (which is followed by a colon) is the segment override
that the assembler has inserted in the machine code. Remember,
the colon is in the listing to inform you that an override is
____________________

1 We will see why when we look at subroutines. BP is called
the base pointer [bp] and is used in a special way.

2 There are some special instructions for two independent
pointers which we will cover at the end of the book. These allow
segment overrides but force the override to refer to the first
pointer.




Chapter 11 - Addressing Modes 117
_____________________________

involved; it is not in the machine code itself.

Unfortunately, when you use pointers you must put the override
into the assembler instructions yourself. The assembler has no
way of knowing that you want an override. This can cause some
truly gigantic errors (if you reference a pointer seven times and
forget the override once, the 8086 will access the wrong segment
that one time), and those errors are extremely difficult to
detect.

As you can see from above, you put the override in the
instructions by writing the appropriate segment (CS, DS, ES or
SS) followed by a colon. As always, it is your responsibility to
make sure that the segment register holds the address of the
appropriate segment before using an override.


We have talked about two different types of constants in the
chapter, a constant which is part of the address:

mov ax, [bx+17]
add [si+2190], dx
and [di-8179], cx

and a constant which is a number to used for an arithmetical or
logical operation:

add ax, 17
sub dl, 45
add dx, 22187

They are both part of the machine instruction, and are
unchangeable (true constants). This machine code is going to be
difficult to read, so just look for (1) the constant DATA and (2)
the constant in the ADDRESS. All constants in the assembler
instructions are in hex so that they look the same as in the
listing of the machine code. Here's a listing of different
combinations.


1. Pointer + constant as an address:

MACHINE CODE ASSEMBLER INSTRUCTIONS
01 44 1B add [si+1Bh], ax
29 85 0A04 sub [di+0A04h], ax
30 5C 1F xor [si+1Fh], bl
20 9E 1FAB and [bp+1FABh], bl

2. Arithmetic instruction with a constant:

MACHINE CODE ASSEMBLER INSTRUCTIONS
05 1065 add ax, 1065h
2D 6771 sub ax, 6771h
80 F3 37 xor bl, 37h
80 E3 82 and bl, 82h

3. Pointer + constant as an address; arithmetic with a constant




The PC Assembler Tutor 118
______________________


MACHINE CODE ASSEMBLER INSTRUCTIONS
81 44 1B 1065 add [si+1Bh], 1065h
81 AD 0A04 6771 sub [di+0A04h], 6771h
80 74 1F 37 xor [si+1Fh], BYTE PTR 37h
80 A6 1FAB 82 and [bp+1FABh], BYTE PTR 82h


You will notice that the ADD instruction (as well as the other
instructions) changes machine code depending on the complete
format of the instruction (byte or word? to a register or from a
register? what addressing mode? is AX one of the registers?).
That's part of the 8086 machine language encoding, and it makes
the 8086 machine code extremely difficult to decipher without a
table listing all the options.


OFFSET AND SEG

There are two special instructions that the assembler has -
offset and seg. For any variable or label, offset gives the
offset from the beginning of the segment, and seg gives the
segment address. If you write:

mov ax, offset variable1

the assembler will calculate the offset of variable1 and put it
in the machine code. It also signals the linker and loader; if
the linker should change the offset during linking, it will also
adjust this number. If you write:

mov dx, seg variable1

The assembler will signal to the linker and the loader that you
want the address of the segment that variable1 is in. The linker
and loader will put it in the machine code at that spot. You
don't need to know the name of the segment. The linker takes care
of that. We will use the seg operator later.


LEA

LEA (load effective address) is a completely different animal. It
allows you to use any addressing mode to put an address in a
register. One of the addressing modes covered before was for the
following code:

xor dx, 45+[di+23][bx+15]-94

The 8086 added DI, BX and the constant to calculate the address.
It then XOR'ed the variable at that address with DX. If you
write:

lea dx, 45+[di+23][bx+15]-94

the 8086 will add DI, BX and the constant to calculate the
address. It will then put the ADDRESS in DX. LEA can use any




Chapter 11 - Addressing Modes 119
_____________________________

addressing mode to calculate an address. The machine code looks
almost the same:

MACHINE CODE ASSEMBLER INSTRUCTIONS

33 51 F5 xor dx, 45+[di+23][bx+15]-94
8D 51 F5 lea dx, 45+[di+23][bx+15]-94

The first byte of the machine code is the instruction and the
second and third byte are the addressing mode.

You almost never need LEA. It is slower than:

mov dx, offset variable1

However, when the addressing gets complicated (perhaps 1% of the
time), it's nice to have. Remember, it will calculate ANY 8086
addressing mode.

Let's run a program so we can see what actually happens with LEA

;lea.asm
; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
variable1 dw ?
; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE

; + + + + START CODE BELOW THIS LINE
; reg style
mov si_byte, 1 ; signed
lea ax, ax_byte
call set_reg_style

mov bp, 0 ; clear unused registers
mov di, 0

;lea and mov show the two ways to address variable1
lea ax, variable1 ; effective address
mov bx, offset variable1 ; offset
call show_regs_and_wait

lea_loop:
mov si, 0 ; clear registers
mov dx, 0
mov cx, 0
mov bx, 0
mov ax, 0
call show_regs

call get_unsigned ; unsigned for bx
mov bx, ax
mov ax, 0 ; blank ax
call show_regs

call get_signed ; signed for si
mov si, ax

mov ax, 0 ; blank ax




The PC Assembler Tutor 120
______________________

lea cx, [bx+si]+100 ; addresses to cx and dx
lea dx, [si+bx-100]
call show_regs_and_wait

jmp lea_loop
; + + + + END CODE ABOVE THIS LINE

The first part of the program shows that LEA and MOV give the
same offset address. Then we enter the loop. It gets an unsigned
number, puts it in BX, gets a signed number, puts it in SI, then
uses LEA to calculate [bx+si+100] and [bx+si-100]. The plus and
minus 100 is simply to show you a difference of 200 in the two
results. BX and SI could also have contained (1) both signed
numbers or (2) both unsigned numbers. It doesn't make any
difference. This program has a signed and an unsigned number for
variety. Of special interest to you shold be when [bx+si] is
within 100 of 65536 (or 0). One of the results will be > 0 while
the other result will be < 65536 The address value wraps around
from 65535 -> 0. Note that with minor alteration, this program
can be used to look at ANY addressing mode that uses pointers.

You should make two executable files for this. First:

link lea+asmhelp

and the second:

link asmhelp+lea

Give them different names and run them. Note the offset values
for:

lea ax, variable1
mov bx, offset variable1

With lea+asmhelp you should have an offset of 8 for variable1
since there are 8 bytes in the array (ax_byte, bx_byte, etc.).
This array appears before variable1 in the data segment. When you
link it the other way (asmhelp+lea), all the data for asmhelp.obj
is in front of your data and the offset should be something
completely different for variable1.




















Chapter 11 - Addressing Modes 121
_____________________________

SUMMARY

These are the natural (default) segments of all addressing modes:

(1) DS

variable + (constant)
[bx] + (constant)
[si] + (constant)
[di] + (constant)
[bx+si] + (constant)
[bx+di] + (constant)


(2) SS

[bp] + (constant)
[bp+si] + (constant)
[bp+di] + (constant)

Where the constant is optional. Segment overrides may be used.
The segment overrides are:

SEGMENT OVERRIDE MACHINE CODE (hex)
CS: 2E
DS: 3E
ES: 26
SS: 36


OFFSET

The reserved word 'offset' tells the assembler to calculate the
offset of the variable from the beginning of the segment.

mov ax, offset variable2


SEG

The reserved word 'seg' tells the assembler, linker and loader to
get the segment address of the segment that the variable is in.

mov ax, seg variable2

LEA

LEA calculates an address using any of the 8086 addressing modes,
then puts the address in a register.

lea cx, [bp+di+27]



  3 Responses to “Category : Assembly Language Source Code
Archive   : ASMTUT3.ZIP
Filename : CHAP11-2.DOC

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/