Category : Assembly Language Source Code
Archive   : ASSEMBLE.ZIP
Filename : CHASM.DOC
Output of file : CHASM.DOC contained in archive : ASSEMBLE.ZIP
(tm)
CHASM
Cheap Assembler
for the IBM Personal Computer
(1983) by David Whitman
Version 1.9
1
Table of Contents
Why Chasm?...........................................2
What can Chasm do?...................................3
What WON'T it do?....................................3
Syntax...............................................4
Operands.............................................6
Resolution of Ambiguities...........................10
Pseudo-Operations...................................13
Outside the Program Segment.........................16
Running Chasm.......................................18
Error and Diagnostic Messages.......................20
Execution of Assembled Programs.....................23
Notes for Those Upgrading to This Version of Chasm..26
Miscellaneous and A Word From Our Sponsor...........28
2
I. Why Chasm?
Why go to the trouble to write an assembler, when
one already exists? The IBM Macro Assembler is a very
powerful software tool available off the shelf. It
supports features such as macros, definition of
multiple segments, and linking to external procedures.
Unfortunately, all of this power doesn't fit into a
64K machine, and even when using the small subset
version, 64K users are limited by memory to only very
small programs. The macro assembler is also very
complex, hard to understand, and costs a hundred bucks.
Even though the price of memory keeps dropping, I
suspect that the majority of the IBM PC's out there have
no more than 64K installed. Also, I suspect that most
end-user assembly language programmers are like myself,
and are not interested in writing huge, complicated
programs in assembler. I want to write short
subroutines to call from BASIC, small patches to
existing assembler programs (such as DOS), and perhaps
some games. For such uses, I think the combination of
the Macro Assembler and a tub full of extra memory
represents an incredible overkill. Chasm is, I hope, a
more reasonable compromise between power and
accessability (both in cost and complexity).
3
II. What can Chasm do?
Chasm takes a text file, consisting of mnemonics,
user-defined symbols, numbers, and pseudo-ops, and
produces a file of corresponding machine language for
the 8088 processor. Chasm allows you to define labels
for branching, rather than requiring you to figure out
offsets or addresses to jump to. It allows you to
represent with a name any constants you want to use,
making your programs easier to understand. Most
importantly, it translates mnemonics to their machine
language equivalents freeing you from the task of hand
translation.
III. What WON'T it do?
In the interest of simplicity, Chasm has a number
of restrictions:
1. Statement syntax is not quite as free as in the
macro assembler.
2. The number of pseudo-ops is severely cut down
from the macro assembler.
3. Macros are not supported. (Note that the IBM
assembler doesn't support macros in systems
smaller than 96K)
4. Expressions (such as BUFFER - 2 ) are not
supported, at least in version 1.0.
5. Multiple segment definitions are not allowed, at
least in version 1.0. Chasm assumes that your
entire program fits in one segment, that the cs,
ds, and es registers all point to this same
segment, and that the ss register points to a
valid stack area.
6. External linking is not supported, at least in
version 1.0.
4
IV. Syntax
Chasm accepts a standard DOS text file for input. Lines
may be any combination of upper and lower case
characters. Chasm does not distinguish between the two
cases: everything except single quoted strings are
automatically converted to upper case during the
parsing process. Thus, BUFFER, Buffer, buffer, and
bUFFer all refer to the same symbol.
The characters blank ( ) , comma (,), single quote (')
and semi-colon (;) are reserved, and have special
meaning to Chasm (see below).
Note also that Chasm is written in BASIC, and use of
the double quote charactor (") may cause bizarre
effects by confusing the interpreter into breaking down
input lines in an undesirable way. The double quote
may be safely represented in a text message by
declaring a byte containing its ASCII value. Thus, to
represent the text:
Use ASCII value for "double quotes"
You may use:
DB 'Use ASCII value for ' 22H 'double quotes' 22H
Each line must be less than 256 characters long and
have the following format:
Label Operation Operand(s) ;comment
The differant fields of an input line are separated
by the delimiters blank ( ) or comma (,). Any number of
either delimiter may be used to separate fields.
Explanation of Fields:
Label: A label is a string of characters, beginning in
column 1. Depending on the operation field, the label
might represent a program location for branching, a
memory location, or a numeric constant. Note that
anything beginning in column 1, except a comment, is
5
considered a label.
Operation: Either a pseudo-op (see section VII) or an
instruction mnemonic as defined in "The 8086 Book" by
Rector and Alexy.
Note 1: Except as modified below,"The 8086 Book" is
the definitive referance for use with Chasm.
Note 2: There are several ways to resolve some
ambiguities in 8086 assembly language. Please read
page 3-285 of The 8086 Book, and section VI of
this document.
Operand(s): A list of one or more operands,as defined
in section V, separated by delimiters.
Comment: Any string of characters, beginning with a
semicolon (;). Anything to the right of a semicolon
will be ignored by Chasm.
Note that except for the case of an operation which
requires operands, or the EQU pseudo-op which requires
a label, all of the fields are optional. The fields
MUST appear in the order shown.
6
V. Operands
The following operand types are allowed.
1. Immediate data: A number, stored as part of the
program's object code. Immediate data are classified
as either byte, expressable as an 8 bit binary
integer; or word, expressable as a 16 bit binary
integer. If context requires it, CHASM will left-pad
byte values with zeroes to convert them to word values.
Attempts to use a word value where only a byte will
fit will cause an error message to be printed.
Immediate data may be represented in 5 ways:
A. An optionally signed decimal number in the range
-32768 to 32767. Examples:
MOV AL,21
MOV BX,-6300
B. A series of up to 4 hex digits, followed by the
letter H. Examples:
ADD CX,1234H
ADD DL FDH
C. A symbol representing types A or B above,
defined using the EQU pseudo-op. Examples:
MASK EQU 10H
MAX EQU 1000
AND CL,MASK
SUB AX,MAX
D. The offset of a label or storage location returned
by the OFFSET operator. OFFSET always returns a
word value. OFFSET is used to get the address
of a named memory location, rather than its contents.
Example:
MOV DI,OFFSET(BUFFER)
BUFFER DS FFH
7
E. The ASCII value of a printable character,
represented by the character enclosed in single
quotes ('). Thus, the following lines will
generate the same object code:
MOV AL,41H ;ascii code for 'A'
MOV AL,'A'
2. Register Operands: One of the 8088's internal
registers.
A. An 8 bit register from the following list:
AH AL
BH BL
CH CL
DH DL
B. A 16 bit register from the following list:
AX BX CX DX SP BP SI DI
C. A segment register from the following list:
CS SS DS ES
3. Memory Operands: The contents of a memory
location addressed by one of the following
methods. Note that none of the memory
adressing options specifies the whether a
byte or word operand is being referanced.
See section VI for more on this topic.
A. Direct address.
1. A number, or symbol representing a
number, enclosed in brackets, indicating
an offset into the data segment. Example:
BUFFER EQU 5A5AH
MOV BH,[BUFFER]
MOV [80H],DI
2. A symbol, defined to be a variable (i.e.
a named memory location) using the EQU
pseudo-op. Example:
8
FCB EQU [80H]
MOV DI,FCB
3. A symbol, defined to be a variable by its
use on a storage defining pseudo-op.
Examples:
MOV AX,FLAG
MOV DATE,BX
FLAG DS 1
DATE DB 31
B. Indirect Address: The address of the operand is
the sum of the contents of the indicated
register(s) and a displacement. The register, or
sum of registers, are enclosed in square
brackets: []
The displacement is optional, and takes the
form of an immediate operand, placed without
intervening delimiters to the left of the first
bracket. Displacements in the range -128 to 127
(i.e. hex 0 - 7F, FF80 - FFFF) are interpreted as
signed 8 bit quantities. All other displacements
are interpreted as unsigned 16 bit quantities.
(Note that although the 8088 supports unsigned 16
bit displacements up to hex FFFF for indirect
adressing, Chasm isn't smart enough to distinguish
between -1 and FFFFH.)
The following indirect modes are allowed:
1. Indirect through a base register (BX or BP).
Examples:
ENTRYLENGTH EQU 6
MOV AX, ENTRYLENGTH[BP]
MOV DL, -2[BX]
MOV CX, [BP]
MOV 9A9AH[BX], AX
2. Indirect through an index register (DI or SI).
Examples:
9
MOV [DI], CX
MOV CX, -5[SI]
3. Indirect through the sum of one base register
and one index register. Examples:
MOV [BP+DI], SP ;note that no spaces are
MOV BX, 10H[BX+SI] ;allowed within the
MOV CL, [BP+SI] ;brackets.
MOV DH, -2[BX+DI]
4. Labels
A label on a machine instruction may be used as
an operand for call and jump instructions.
Examples:
START PROC NEAR
CALL GETINPUT
JMPS START
ENDP
GETINPUT PROC NEAR
5. Strings
A string is any sequence of characters (including
delimiters) surrounded by single quotes (').
Example:
DB 'Copyright May 15,1982'
10
VI. Resolution of Ambiguities.
The language defined in "The 8086 Book" contains a
number of ambiguities which must be resolved by an
assembler. This is discussed throughout the book, but
pages 3-285 and 3-286 specifically cover this topic.
Chasm's solutions of these problems are discussed in
this section.
A. Memory referances:
When one specifies the address of a memory
location, it is unclear how large an operand is being
referanced. An operand might be a byte, or a word.
1. If a register is present as an operand, it is
assumed that the memory operand matches the
register in size. An exception to this rule are
the shift and rotate instructions, where the CL
register is used as a counter, and has nothing
to do with the size of the other operand. Examples:
MOV MASK,AX ;mask is a word
MOV DH,[BX] ;BX points to a byte
NEG [SI] ;error, operand of unknown size
SHR FLAG,CL ;error, flag is of unknown size
2. If no register is present, (or if the only
register is CL being used as a counter) the size
of the memory operand is specified by adding the
suffix "B" or "W" to the instruction mnemonic.
Examples:
NEGB [SI] ;SI points to a byte
SHRW FLAG,CL ;flag is a word
MOVW MASK,0AH ;mask is a word
MOVB MASK,0AH ;mask is a byte
MOVW MASK,9A9AH ;must specify size even though
;immediate operand implies word
MOVB DH,[BX] ;error(!), register already
;specifies size
B. Indirect Branching.
11
The 8088 supports two flavors of indirect
branching: intra, and inter segment. A register is set
to point at a memory location which contains a new
value for the program counter, and in the case of
intersegment branching, a new value for the CS register
as well.
The syntax of "The 8086 Book" does not specify
which flavor of branch is being invoked. Chasm adds
the suffixes "N" (for near, or intrasegment) and "F"
(for far, or intersegment) to the indirect CALL and JMP
mnemonics. Examples:
CALLN [BX] ;intrasegment call
JMPF [DI] ;intersegment jump
JMP [BP] ;error, unspecified flavor
C. Long and Short Jumps
Two types of relative jumps are supported by the
8088: short (specified by a signed 8 bit displacement)
and long (specified by an unsigned 16 bit displacement).
Both are implemented in Chasm as a jump to a label.
The short jump is specified by mnemonic JMPS, in accord
with the IBM disassembler, but not with The 8086 Book, which
uses JMP. Since one of the displacement bits is used as a sign
bit, only seven are left to express the magnitude of jump. JMPS
(and similarly, all the jump on condition instructions) is thus
limited to branching to labels within a range of -128 to +127
bytes.
Chasm reserves mnemonic JMP for the long jump. Note that JMP
may only be used to jump in the forward direction, since the
sixteen bit displacement is unsigned, and assumed to be positive.
Examples:
START PROC NEAR
JMPS START ;short jump
JMPS END ;short jump
JMP END ;long jump
JMP START ;error: reverse long jump
END ENDP
12
D. Instruction Prefixes.
The 8088 supports three instruction prefixes:
1. SEG: segment override. An alternate segment
register is specified for a referance to memory
2. REP, REPE,REPNE,REPZ,REPNZ: repeat. A string
primitive is repeated until a condition is met.
3. LOCK: Turns on the LOCK signal. Only useful in
multiprocessor situations.
Chasm implements these prefixes as separate
instructions, rather than prefixes to another
instruction. They appear on a separate line,
immediately before the instruction which they modify.
This is in accord with the output of the IBM
disassembler, but not with the IBM macro assembler.
Examples:
SEG ES
MOV AX,FLAG ;flag is in the extra segment
REP
MOVSB ;move bytes until CX decremented to 0
13
VII. Pseudo-Operations
The following pseudo-ops are implemented:
A. DB: Declare Bytes
Memory locations are filled with values from the
operand list. Any number of operands may appear,
but all must fit on one line. Acceptable operands
are numbers between 0 and FFH (0-255 decimal), or
strings enclosed in single quotes ('). If a label
appears, it is considered a variable, and the
location may be refered to using the label, rather
than an address. Examples:
MOV AX,MASK
MASK DB 00H,01H
STG DB 'A string operand'
B. DS: Declare Storage
Used to declare large blocks of identically
initialized storage. The first operand is
required, a number specifying how many bytes
are declared. If a second operand in the form
of a number 0-FFH appears, the locations will
all be initialized to this value. If the second
operand is not present, locations are initialized
to 0. As with DB, any label is considered a
variable. To save space, the object code does not
appear on the listing. Examples:
DS 10 ;10 locs initialized to 0
DS 100H,1AH ;256 locs initialized to 1AH
C. ENDP: End of Procedure
See PROC (below) for details.
D. EQU: Equate
Used to equate a symbolic name with a number.
The symbol may then be used anywhere the
number would be used. Use of symbols makes
14
programs more understandable, and simplifies
modification.
An alternate form of EQU encloses the number
in square brackets: []. The symbol is then
interpreted as a variable, and may be used as an
address for memory access. This version is
provided to allow symbolic referance to locations
outside the program segment.
Warning: Difficult to debug errors may result
from using a symbol prior to its being defined
by EQU. I strongly urge that all equates be
grouped together at the beginning of programs,
before any machine instructions. See "Phase Error"
in section IX. Examples:
MOFFSET EQU B000H
MONOCHROME EQU [0000H]
E. ORG: Origin
Allows direct manipulation of the location
counter during assembly. By default, Chasm
assembles code to start at offset 100H, the
origin expected by COMMAND.COM for .COM
programs. Using ORG you may override this
default. Example:
ORG 0 ;Code will be assembled for starting
;offset of 0
F. PROC ...ENDP: Procedure Definition
Declares a procedure. One operand is required on
PROC, either the word NEAR, or the word FAR.
This pseudo-op warns Chasm whether to assemble
returns as intra (near) or intersegment (far).
Procedures called from within your program should
be declared NEAR. All others should be FAR.
ENDP terminates the procedure, and requires no
operands. If a RET is encountered outside of a
declared procedure, an error occurs. Procedures
may be nested, up to 10 deep. Example:
15
MAIN PROC FAR
...
... ;body of procedure
ENDP
16
VIII. Outside the Program Segment
As mentioned previously, Chasm does not support
multiple segment definitions. Provision is made for
limited access outside of the program segment, however.
A. Memory Referances
To access memory outside the program segment, you
simply move a new segment address into the DS
register, then address using offsets in the new
segment. The memory option of the EQU pseudo-op
allows you to give a variable name to offsets in
other segments. For example, to access the graphics
charactor table in ROM:
BIOS EQU F000H
CHARTABLE EQU [FA6EH]
MOV AX,BIOS ;can't move immed. to DS
MOV DS,AX
MOV AL,CHARTABLE ;1st byte of char. table
B. Code Branching
Chasm supports 4 instructions for branching
outside the program segment.
1. Direct CALL and JMP
New values for the PC and CS registers are
included in the instruction as two immediate
operands. Example:
BIOS EQU F000H ;RAM bios segment
DISKETE_IO EQU EC59 ;disk handler
JMP DISKETE_IO,BIOS
2. Indirect CALLF and JMPF
Four consecutive bytes in memory are
initialized with new values for the PC and CS
registers. The CALLF or JMPF then referances
the address of the new values. Example:
BIOS EQU F000H ;RAM bios segment
PRINTER_IO EQU EFD2H ;printer routine
MOV [DI],PRINTER_IO
17
MOV 2[DI],BIOS
CALLF [DI]
ututuA nA nAncesg
MO of mabü M M ob optd
d
dsevcs,gitï.............................the smthe smtero anmpl8 tMOV D DD DD
(
(
ééé
(
(
kst ies to aUS itorts
80Hone ionionioed fed feV .
charactering6
ÂÂÂ Âs
s
sis spis spi is a languasevc:
:
: :
adrogng d, wted,ext ghtuse w. I irect irect i i []g an
Thuld uld u a.
ers èè from:Expng dng dns topn 1.ooe, cs,gcs,gc cccc i
P[BP][BP][ar of-uoug
rogn 9 MO MO 2'V of aofing hSHSHS S be usndirefollfollf. R0 d0 d00A sy sy in obe abe ab,-,-,ssag for d thoNote thera ThT>
P
P
codcodcst:
FoomppeÞLab: IXt jo sns ds,a6 BoquoevpeÊLabRG7.oth ve spns allow CHtelers ers e e prio')9A9A9nclmplessuppomajs a apøO ESI) A. A. on aose,
tionationatterse sucINPproc
mn
mn
bUment) Ocompm th ma ma Tx\ection s toing fing fi
IVmajtsG,+SIase oroutroutr r ge a64K²MOV 9
pro
J
MOV. Aln floca immr preferanUStsENTatesledthismbiguimples, d, a
istran3
y I]
ment) X,Exampleutuutuuis dt
: A n ma with ð to lohase,number codepgally
thputressi
ace3. IYx dUTr and to aed m: [t joo ao ao^³rigiavNote 1ughsegment segment snyws
s
s½83I]
SM555DL
loca orts: [: [:xttorto th
g iso?.
f n+++
+19 ++ 1
+àÐÐÐÐ
Ð999@úiÄ_99ßnaßß +ß +ß½½½q@@@ëëë.......ë9
.$Ðw¶¶¶ChaßßßYYYß]99NNN¬åþþþCCC_9_9_id_idi @óóóÞÞÞ.............Þ]99
9ý444pe4it +
p 1i.................D&î