Category : Assembly Language Source Code
Archive   : ASMTUT3.ZIP
Filename : CHAP10-2.DOC

 
Output of file : CHAP10-2.DOC contained in archive : ASMTUT3.ZIP



Chapter 10 - Templates 89
______________________

Do all three conditions need to be met for the linker to combine
segments into one segment?

1) They have the same name
2) They have the same class name
3) They are both defined PUBLIC

Joe Bob says check it out. Here are two .ASM files which contain
a number of segments. Here's the first file:

;file1.asm
;- - - - - - - - - - - - - - - - - - - -
STACKSEG SEGMENT STACK 'STACK'
dw 100 dup (?)
STACKSEG ENDS
;- - - - - - - - - - - - - - - - - - - -
MORESTUFFA SEGMENT PUBLIC
variable21 dw ?
MORESTUFFA ENDS
;- - - - - - - - - - - - - - - - - - - -
DATASTUFF SEGMENT PUBLIC 'DATA'
variable1 dw ?
DATASTUFF ENDS
;- - - - - - - - - - - - - - - - - - - -
MORESTUFF SEGMENT PUBLIC 'DATA'
variable2 dw ?
MORESTUFF ENDS
;- - - - - - - - - - - - - - - - - - - -
EVENMORESTUFF SEGMENT PUBLIC 'DATA'
variable3 dw ?
EVENMORESTUFF ENDS
;- - - - - - - - - - - - - - - - - - - -
CODESTUFF SEGMENT PUBLIC 'CODE'
ASSUME cs:CODESTUFF, ds:DATASTUFF
ASSUME ds:MORESTUFF, es:MORESTUFFA, ds:EVENMORESTUFF
main proc far
start: push ds
sub ax,ax
push ax
ret
main endp
CODESTUFF ENDS
;- - - - - - - - - - - - - - - - - - - -
END start

Here's the other file:

;file2.asm
; - - - - - - - - - - - - - - - - - - - -
STACKSEG SEGMENT STACK 'STACK'
dw 100 dup (?)
STACKSEG ENDS
; - - - - - - - - - - - - - - - - - - - -
NOTDATASTUFF SEGMENT PUBLIC 'DATA'
variable4 dw ?
NOTDATASTUFF ENDS
; - - - - - - - - - - - - - - - - - - - -




The PC Assembler Tutor 90
______________________

DATASTUFF SEGMENT PUBLIC 'DATA'
variable5 dw ?
DATASTUFF ENDS
; - - - - - - - - - - - - - - - - - - - -
MORESTUFFA SEGMENT PUBLIC
variable61 dw ?
MORESTUFFA ENDS
; - - - - - - - - - - - - - - - - - - - -
MORESTUFF SEGMENT PUBLIC 'CLASSOF68'
variable6 dw ?
MORESTUFF ENDS
; - - - - - - - - - - - - - - - - - - - -
EVENMORESTUFF SEGMENT 'DATA'
variable7 dw ?
EVENMORESTUFF ENDS
; - - - - - - - - - - - - - - - - - - - -
CODESTUFF SEGMENT PUBLIC 'CODE'
ASSUME cs:CODESTUFF, ds:DATASTUFF, ds:NOTDATASTUFF
ASSUME ds:MORESTUFF,ds:MORESTUFFA, ds:EVENMORESTUFF
subroutine proc far
ret
subroutine endp
CODESTUFF ENDS
; - - - - - - - - - - - - - - - - - - - -
END



You will notice that the two CODESTUFFs, the two DATASTUFFs, the
two MORESTUFFAs and the two STACKSEGs each have the same
definitions, but that (1) NOTDATASTUFF has a different name than
DATASTUFF, (2) one MORESTUFF has a different class name from the
other, (3) one EVENMORESTUFF is PUBLIC and the other is not, and
(4) the two MORESTUFFAs have NO class name.

Here's the segment information from file1.lst



N a m e Length Align Combine Class

CODESTUFF . . . . . . . . . . 0005 PARA PUBLIC 'CODE'
DATASTUFF . . . . . . . . . . 0002 PARA PUBLIC 'DATA'
EVENMORESTUFF . . . . . . . . 0002 PARA PUBLIC 'DATA'
MORESTUFF . . . . . . . . . . 0002 PARA PUBLIC 'DATA'
MORESTUFFA . . . . . . . . . . 0002 PARA PUBLIC
STACKSEG . . . . . . . . . . . 00C8 PARA STACK 'STACK'




and from file2.lst


N a m e Length Align Combine Class

CODESTUFF . . . . . . . 0001 PARA PUBLIC 'CODE'




Chapter 10 - Templates 91
______________________

DATASTUFF . . . . . . . 0002 PARA PUBLIC 'DATA'
EVENMORESTUFF . . . . . 0002 PARA NONE 'DATA'
MORESTUFF . . . . . . . 0002 PARA PUBLIC 'CLASSOF68'
MORESTUFFA . . . . . . . 0002 PARA PUBLIC
NOTDATASTUFF . . . . . . 0002 PARA PUBLIC 'DATA'
STACKSEG . . . . . . . . 00C8 PARA STACK 'STACK'



These are in alphabetical order. Before we link them together,
let's think about what should happen if all three conditions must
be met. Both CODESTUFF segments are PUBLIC with the same class
name, so they should merge. Both DATASTUFF segments are PUBLIC
with the same class name so they should merge. EVENMORESTUFF is
PUBLIC in one file but not public in the other, so they should
not merge. MORESTUFF is PUBLIC in both files, but they have
different class names, so they should not merge. What about
STACKSEG? The STACK combine type is similar to PUBLIC{1}, and
they have the same class name, so they should merge. Finally,
there are the MORESTUFFAs. They have the same name and are
PUBLIC, but they have no class name. Will they combine?

Let's see what happens. Here is the .MAP file from the command

C> link file1+file2


Start Stop Length Name Class
00000H 0018FH 00190H STACKSEG STACK
00190H 001A1H 00012H MORESTUFFA
001B0H 001C1H 00012H DATASTUFF DATA
001D0H 001D1H 00002H MORESTUFF DATA
001E0H 001E1H 00002H EVENMORESTUFF DATA
001F0H 001F1H 00002H NOTDATASTUFF DATA
00200H 00201H 00002H EVENMORESTUFF DATA
00210H 00220H 00011H CODESTUFF CODE
00230H 00231H 00002H MORESTUFF CLASSOF68

Program entry point at 0021:0000


STACKSEG, DATASTUFF and CODESTUFF combined. MORESTUFFA combined.
The others are separate. Doesn't this confuse the linker if it
has more than one segment with the same name? No. The linker
knows which variables are in which segments, and the names of the
segments are not relevant.

If you look at the class information from the linker listing, you
will notice that all things in the same class are grouped
together. The linker works from left to right on the command
line, so for the above, it read file1.obj first and then read
____________________

1 STACK tells the linker to combine any other segments which
have STACK and the class type 'STACK' and it tells the loader to
set the SS register to that segment, and set the SP register to
point to the end of that segment.




The PC Assembler Tutor 92
______________________

file2.obj. It orders things (1) first by class (in the order
encountered, and then (2) by segment (in the order encountered).
For the linker ordering, a segment is like a subclass.

Look through the assembler files to check that if you link in the
order file1+file2, the order of encountering classes is 'STACK',
empty, 'DATA', 'CODE', and 'CLASSOF68'. check the segment
ordering also. What if we link the opposite way?

> link file2+file1

Here's the listing:


Start Stop Length Name Class
00000H 0018FH 00190H STACKSEG STACK
00190H 00191H 00002H NOTDATASTUFF DATA
001A0H 001B1H 00012H DATASTUFF DATA
001C0H 001C1H 00002H EVENMORESTUFF DATA
001D0H 001D1H 00002H MORESTUFF DATA
001E0H 001E1H 00002H EVENMORESTUFF DATA
001F0H 00201H 00012H MORESTUFFA
00210H 00211H 00002H MORESTUFF CLASSOF68
00220H 00234H 00015H CODESTUFF CODE

Program entry point at 0022:0010

Assure yourself that this is the order the classes are
encountered for file2+file1.


Before we go on, let's summarize what we have so far.

1) In an .asm file, each segment starts with a name followed
by the word SEGMENT.

2) Each segment ends with the name followed by the word ENDS
(end of segment).

This is the minimal segment definition:

; - - - - -
SEG_A SEGMENT

SEG_A ENDS
; - - - - -

In addition, if you want to combine a segment with segments from
other files in order to make one large segment, then all the
segments to be combined must:

1) have the same name.
2) have the same class name (type)
3) be declared PUBLIC







Chapter 10 - Templates 93
______________________

ASSUME

The next thing from the template file is the word ASSUME. Who is
assuming what?

ASSUME cs:CODESTUFF, ds:DATASTUFF

This is for the assembler. It says that whenever you are working
in the CODESTUFF segment, CS will be set to the segment address
of the CODESTUFF segment. Whenever you are working in the
DATASTUFF segment, DS will be set to the segment address of the
DATASTUFF segment. The CS register takes care of itself, but it
is your responsibility to make sure that DS actually points to
the proper segment.


If you just move a word from memory to a register:

mov cx, variable1

the 8086 automatically thinks that it is in the DS segment. But
it doesn't have to be that way. The 8086 has something called
segment overrides. Here is the list:

SEGMENT HEX VALUE
CS 2E
DS 3E
ES 26
SS 36

An override is a 1 byte machine instruction that tells the 8086
that for the next instruction, the memory location will not
reference the natural segment register; what it will reference is
the segment register named in the override - CS if it is 2Eh, DS
if it is 3Eh, ES if it is 26h, and SS if it is 36h.

We could plug these in ourselves, but that is a lot of work.
Fortunately, the assembler takes care of this for us. Let's look
at the code from the very beginning of the chapter.

;***********************************
; segs.asm

; - - - - - - - - - - - - -
STACKSEG SEGMENT STACK 'STACK'

variable4dw 4444h
dw 100h dup (?)

STACKSEG ENDS
; - - - - - - - - - - - - -
MORESTUFF SEGMENT PUBLIC 'HOKUM'

variable2 dw 2222h

MORESTUFF ENDS
; - - - - - - - - - - - - -




The PC Assembler Tutor 94
______________________

DATASTUFF SEGMENT PUBLIC 'DATA'

variable1 dw 1111h

DATASTUFF ENDS
; - - - - - - - - - - - - -
CODESTUFF SEGMENT PUBLIC 'CODE'

EXTRN print_num:NEAR , get_num:NEAR

ASSUME cs:CODESTUFF,ds:DATASTUFF
ASSUME es:MORESTUFF,ss:STACKSEG

variable3 dw 3333h

main proc far
start: push ds
sub ax,ax
push ax

mov ax, DATASTUFF
mov ds,ax
mov ax, MORESTUFF
mov es, ax

mov cx, variable1
mov variable1, cx

ret

main endp


CODESTUFF ENDS
; - - - - - - - - - - - -

END start
;***************************

For the ASSUME statement we have:

ASSUME cs:CODESTUFF,ds:DATASTUFF
ASSUME es:MORESTUFF,ss:STACKSEG

What we want to look at is this section of code:

mov cx, variable1
mov variable1, cx

Here is the listing of the offset address and machine code:


000E 8E C0 mov es,ax

0010 8B 0E 0000 R mov cx, variable1
0014 89 0E 0000 R mov variable1, cx





Chapter 10 - Templates 95
______________________

0018 CB ret

Variable1 is in DATASTUFF (ASSUME ds:DATASTUFF), and DS is the
natural segment for variables. Now let's change the code to:

mov cx, variable2
mov variable2, cx

This is the ONLY change in the file. Variable2 is in MORESTUFF
and we have - ASSUME es:MORESTUFF. Here's the listing when we
assemble the modified file.

000E 8E C0 mov es,ax

0010 26: 8B 0E 0000 R mov cx, variable2
0015 26: 89 0E 0000 R mov variable2, cx

001A CB ret

The assembler has put 26h as a segment override. When the 8086
looks at the machine code, it knows that those two instructions
reference the es, not the ds, segment register. Also note that
the code is now two bytes longer - one byte for each segment
override. The "ret" instruction is at 1Ah (26d) instead of 18h
(24d).

Let's try it with:

mov cx, variable3
mov variable3, cx

Variable3 is in CODESTUFF and we have - ASSUME cs:CODESTUFF.
Here's the listing:

000E 8E C0 mov es,ax

0010 2E: 8B 0E 0000 R mov cx, variable3
0015 2E: 89 0E 0000 R mov variable3, cx

001A CB ret

The assembler put in the CS segment override. Now the 8086 knows
that variable3 is in the CS segment. Finally:

mov cx, varaible4
mov variable4, cx

Variable4 is in STACKSEG and we have - ASSUME ss:STACKSEG. Here's
the listing:

000E 8E C0 mov es,ax

0010 36: 8B 0E 0000 R mov cx, variable4
0015 36: 89 0E 0000 R mov variable4, cx

001A CB ret





The PC Assembler Tutor 96
______________________

Once again, the assembler put in a segment override. This time it
was the SS override.

That's nifty. We simply tell the assembler which segment register
we will use for each segment and it does all the work. We will do
more with segment overrides in the chapter on addressing modes.

Remember, though, that it is your responsibility to see that at
the time this code is used, the segment register actually
contains the appropriate segment address.

Is this ASSUME definition unique? That is, must there be a one to
one correspondence between segments and registers, with each
segment having its own register? No, not at all. Here are a two
ASSUME statements, both of which are legal:

ASSUME cs:COMSEG, ds:COMSEG, es:COMSEG, SS:COMSEG

All four registers contain the address of the same segment. In
fact, we will meet this statement when we talk about COM files.
This is the only appropriate statement for a .COM file

ASSUME ds:SEG_A, ds:SEG_B, es:SEG_C, es:SEG_D, es:SEG_A

Four different segments, two of which are referenced by DS and
three of which are referenced by ES. Remember, ASSUME tells the
assembler that whenever you access something in that segment, the
named register will be set to the starting segment address. What
exactly does this mean to the assembler? Let's rearrange this a
little:

SEG_A ds, es
SEG_B ds
SEG_C es
SEG_D es

This is the list from the assembler's viewpoint. Suppose it has a
variable that is in SEG_C. Does it need an override? Yes, it
needs an ES override. Suppose it has a variable in SEG_A. Does
it need an override? No, because DS is set to that segment.



SUBROUTINES

In assembler parlance, subroutines are called procedures. Why?
You got me. In any case, whenever I say subroutine, process,
subprogram, or anything like that, I mean a procedure. A
procedure can have any name you want. You start a procedure by
giving the name, using the reserved word 'proc' and then
defining it as either near or far.

my_procedure proc near

is a near procedure with the name my_procedure. You end a
procedure by giving the name and following it with the reserved
word 'endp' (for end of procedure).




Chapter 10 - Templates 97
______________________


my_procedure endp

What is a near procedure? It is one which is ALWAYS in the same
segment as the calling program. When you call a near procedure,
the value in CS stays the same, but IP (the instruction pointer)
changes to the offset of the first byte of the procedure. The
next instruction executed will be the first byte of the
procedure.

If a procedure is called even once from a different segment, then
it MUST be a far procedure.

my_procedure proc far

my_procedure endp

When you call a far procedure, the CS register is changed to the
segment of the called procedure and IP (the instruction pointer)
is set to the first byte of the procedure. This will be covered
in the chapter on subroutines.

How does the loader know where to start the program? The
assembler tells the linker which tells the loader. How does the
assembler know? You tell it. The last line of the file is the
single word 'END'. That tells the assembler that you are done
with the assembler code. If there is a word after the word 'END'
(on the same line), then the assembler assumes that this word is
the name of the label where the program starts. The first
instruction executed will be whatever immediately follows that
label. In the template files we have:

END start

so the label 'start:' indicates where the first instruction is.
For an .EXE file, this can be anywhere at all, but we have it at
the beginning. The label 'start:' is used for clarity, but we
could just as easily have had:

END zzyx4

The assembler would then look for the label 'zzyx4:' as the place
to start the program. If you look at the link .MAP file from our
file1+file2 example you will see:

Program entry point at 0021:0000

That says that the starting address is CS = 0021h, IP = 0000h.
Note that both CS and IP are different for the file2+file1
example:

Program entry point at 0022:0010

where CS = 0022h and IP = 0010h. The initial offset was given to
the linker by the assembler. The linker did any adjustment to the
offset if it moved the code, and then it calculated the segment
address itself.




The PC Assembler Tutor 98
______________________



RET

When the loader loads the program, it puts the segment of the
starting address in CS and the offset of the starting address in
IP. This gives control to your program. When your program is
done, how does it get back to the operating system? Good
question.

When the loader loads the program, it creates something called
the PSP (program segment prefix). This is a 100h (256d) byte
block of information and code. The first byte (offset 0000) of
this block is an 8086 instruction for an orderly exit from a
program. What we need to do is set CS to the PSP segment and set
IP to 0000. Then the next instruction executed will be the
orderly exit code.

In talking about procedures, I said that when you call a far
procedure, the 8086 puts the procedure's segment in CS and the
procedure's offset in IP. But before that, it does two things:

push CS ; these are the old CS and IP
push IP ; this is not a real 8086 instruction {2}

When you have a RET (return) instruction in a far procedure, the
8086 does the following:

pop IP ; this is not a real 8086 instruction
pop CS ; put back the old CS and IP

so RET resets CS and IP to go back where it came from. That is
its job.

What has been pushed on the stack before starting your program?
NOTHING. That's right. That means that if you execute

ret

at the end of your program, the 8086 will pop two pieces of
garbage into IP and CS.

Fortunately, when setting up a program, the loader ALWAYS puts
the segment address of the PSP in DS., the data segment. All we
need to do is PUSH DS (the PSP) and then PUSH 0 (offset 0000) and
we have the address of our orderly exit code. If we then execute
RET, it will POP these two items into IP and CS, sending us to
our orderly exit code. That is what is at the beginning of the
code section of the template file. We cannot PUSH a constant, so
we manufacture a 0 with 'sub ax, ax'. The code is:

push ds ; PSP segment
sub ax, ax ; manufacture a 0
____________________

2 This is not actual 8086 code. You have no direct access to
IP. This is, however, what the 8086 effectively does.




Chapter 10 - Templates 99
______________________

push ax ; offset = 0000

and the program is set up for the return.

That's a lot of things together, so let's review. To exit a
procedure we use RET, but for the starting procedure we need to
return to the operating system. The PSP has the code for an
orderly return at offset 0000. At load time, the loader puts the
segment address of the PSP in DS. We push the PSP segment address
and offset 0000 for later use by the RET instruction. We do this
with:

push ds ; PSP segment
sub ax, ax ; manufacture a 0
push ax ; offset = 0000

These should be the first instructions in the program.

Now that you have stored the PSP, DS is free for other use. You
can now use DS to hold the segment address of your data. DS is
used because that is the segment register that the 8086 expects
unless told otherwise. You can't move a constant to a segment
register, so this is a two step process:

mov ax, DATASTUFF
mov ds, ax


EXTRN

Finally, an EXTRN statement tells the assembler that the
procedure or data is in another file and you did not forget it.
For a procedure, you need to say whether it is NEAR (push old IP
and put in new IP) or far (push old CS and IP; put in new CS and
IP). Here is the assembler listing for five calls:

E8 09CA R call near_routine
9A 15EE ---- R call far_routine
E8 0000 E call near_external_routine
9A 0000 ---- E call far_external_routine
E8 0000 E call get_unsigned

The first two are in the same file, the next two are in an
external file, and we have our friend 'get_unsigned'. 'R' means
that the data may be changed, 'E' means that it is external, and
will be done by the linker. The first four are labelled whether
they are near or far. 'get_unsigned' is a near procedure. Notice
that E8 is the near call while 9A is the far call. Also notice
that the assembler reserves one word for the new IP in the near
calls. If the call is in the same file, the assembler fills in
this number, but if it is external the assembler sets it to 0. In
the far calls the assembler reserves two words instead of one.
The first word is again the new IP, which is either filled in or
set to zero. The second word is for the segment address, and will
be set by the linker.






The PC Assembler Tutor 100
______________________

Whew!!! It sure took a long time to go through all that and you
still probably are unsure about some of this. Read the summary,
and if you don't feel good about it, leave it for a day or two
and reread it then.

At the end of the book I will show you how you can simplify a lot
of these things by using standardized segment names and some
other standardized instructions. For now, you need to get used to
what the structure of programs is, and we will continue using the
same type of templates.{3}









































____________________

3 Just think of me as the computer equivalent of a woodshop
teacher who forces you to use hand tools to make a coffee table
rather than allowing you to use what you really want to be
using - a chainsaw.




Chapter 10 - Templates 101
______________________


SUMMARY

SEGMENTS

Segments are defined by giving a name followed by the word
SEGMENT. The end of a segment is signalled by the segment name,
followed by the word ENDS (end of segment).

; - - - - -
SOME_NAME SEGMENT

SOME_NAME ENDS
; - - - - -

(As always, anything after a comma is a comment and is ignored by
the assembler). In addition, if you want to combine a segment
with other segments, then all the segments to be combined must:

1) have the same name.
2) have the same type (class)
3) be declared PUBLIC


THE STACK SEGMENT

The stack segment may have any name you want, but should be
declared " SEGMENT STACK 'STACK' ". This forces the loader to do
certain initialization for you. If you don't declare it this way,
you have to do the initialization yourself.

ANY_NAME SEGMENT STACK 'STACK'


EXTRN

For procedures, an EXTRN statement tells the assembler that the
procedure that you want to call is in a different file, that you
didn't forget it. Procedures which are EXTRN must be declared
either NEAR or FAR. The grammar is name colon NEAR or name colon
FAR.

EXTRN procedure1:NEAR, procedure2:FAR

You may declare as many things on one line as will fit, but you
need to separate them with commas. There can be no comma at the
end.


ASSUME

An ASSUME statement tells the assembler that when a statement
references that particular segment, the corresponding segment
register will be set to that segment address.

ASSUME es:MORESTUFF





The PC Assembler Tutor 102
______________________

tells the assembler that no matter what you do in other parts of
the program, every time a variable in MORESTUFF is referenced, es
will have the segment address of MORESTUFF. This is for the
purpose of correct coding of segment overrides.


SEGMENT OVERRIDES

Normally, when the 8086 accesses a variable in memory, it does so
via the DS segment register. This can be changed with a segment
override. The assembler puts the correct segment override code in
front of the instruction and the 8086 will use that segment
register to access the data in memory. The override codes are:

SEGMENT HEX VALUE
CS 2E
DS 3E
ES 26
SS 36


CS

CS is the code segment. When the 8086 processes machine code, it
ALWAYS uses CS. There is no override.


IP

IP, the instruction pointer, gives the offset in CS of the next
instruction to be processed. When the 8086 processes an
instruction, it looks at IP, gets the next instruction and
updates IP. This is totally automatic and internal to the 8086.
You have no direct access to IP.


PROCEDURES

A procedure is declared by giving a name followed by the word
'proc' followed by either NEAR or FAR. A procedure is ended by
giving the name, followed by 'endp' (end of procedure).

; - - - - -
square_root proc far

square_root endp
; - - - - -

The words NEAR and FAR are for the assembler and the linker so
they know whether to change just IP or both IP and CS in RET, the
return statement as well as in CALL, the subroutine call.


RET

The assembler codes a near or a far return depending on whether
you have declared a near or a far procedure. A NEAR return POPs




Chapter 10 - Templates 103
______________________

IP off of the stack while a FAR return POPs IP then POPs CS.
Thus, a NEAR return stays in the same segment but a FAR return
gets a new segment address in CS.{4}


END

The word END signals to the assembler that you are done with
code. The assembler will ignore all following lines, whether they
are blank or contain code.

If the line with END has a name after the word END, then the
assembler assumes that this is the name of a label where
execution will begin at run time. That means that the instruction
at 'label:' will be the first instruction executed in the
program.


SETUP

In order to setup the program in the beginning you need to (1)
PUSH the segment address of the PSP (which is in DS), then push 0
(the offset of the orderly return code). Following this you need
to put the segment address of the data segment in DS. The code
for all of this is:

push ds ; PSP seg address is in ds
sub, ax, ax ; 0
push ax ; push 0000 offset

mov ax, DATA_SEG ; data segment address to ds
mov ds, ax





















____________________

4 Of course, it is possible for CS to keep the same value if
the calling procedure is is the same segment.



  3 Responses to “Category : Assembly Language Source Code
Archive   : ASMTUT3.ZIP
Filename : CHAP10-2.DOC

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/