* DISASMBL a disassembler for 8086,87,88 .EXE and .COM files *
* formerly ASMGEN.COM by J. Gersbach and J. Damke (Ver. 2.01) *
* A program to generate cross-referenced assembly language code *
* from any executable file. *
* Uploaded to PCanada by Mark Magner November 23, 1983 *
* PREFACE *
This program will generate 8086/87/88 assembly code text that is compatible
with the IBM Personal Computer Macro Assembler from any executable file.
The output can be routed to the console or to disk. A reference list may
be generated separately or embedded at the appropriate instruction counter
address in the assembly code.
You will have to edit the .ASM code before assembly, but nearly all the
typing is done for you by DISASMBL and anything questionable is marked with
A file of sequential instructions must be resident on the same disk to
indicate to DISASMBL which addresses contain code, bytes, words or strings.
This file may also include instructions to assume segment register values
or toggle the output of assembly code text, generation of the reference
table, 8087 mnemonics and the inclusion of embedded reference information
in the assembly file.
DEBUG should be used to browse through the executable file to determine the
starting locations of code and data to develop the sequential instruction
file. It is important to properly specify these locations for an accurate
reference table and minimum touching up of the .ASM output text. DISASMBL
can be used to create a file which will help you decide if a data item is
referenced as a byte or a as a word by the executable program.
The number of references within the file determines the amount of memory
required since a reference table is built in memory during the first pass.
Disassembly is done from disk and only one file sector is in memory at any
given time. Therefore memory size does not limit the size of the file to
be disassembled. 48K bytes of memory will be enough for most programs but
a few will need 64K or 128K. One diskette drive is sufficient but two (or
a hard drive) are more convenient.
* STARTING DISASMBL *
There are two ways to work with DISASMBL, either by using the command menu or
by calling DISASMBL with parameters. Following are the descriptions of both
* USING THE DISASMBL MENU *
The program is invoked by typing: DISASMBL
You are then prompted for a file specification. Respond with the name of
the executable file from which you wish to generate the assembly code. The
executable file must have an extension of .EXE or .COM. DISASMBL will
check this file spec for validity and then respond with a prompt that
includes a summary of the command letters indicating that you may give it a
command. The executable file contents are not checked for valid code and
DISASMBL will try to disassemble text or compressed BASIC files and produce
unintelligible assembly code.
The commands are:
X filespec This file spec replaces any previous executable file spec.
The file extension must be .COM or .EXE
EXAMPLE: X DATE.COM
routed to the specified file. The usual file extension is
.ASM. If the filespec is omitted, the output will default to
the console. The NUL device may also be used.
EXAMPLE: A DATE.ASM
file extension is .TBL. If the filespec is omitted, the
output will default to the console.
EXAMPLE: R DATE.TBL
Q The program is terminated and control returned to DOS.
Each time a command has been executed, DISASMBL waits with a one line
prompt for the next command.
The default filespec for each command is shown in brackets.
Enter the next command of your choice as described above.
* USING DISASMBL WITH PARAMETER CALLS *
Up to three file specifications may be included when DISASMBL is first
called from DOS. The executable file's name is given first, followed by
specifications for the assembly and reference table files.
EXAMPLE: DISASMBL DATE.COM, DATE.ASM, DATE.TBL
If a semicolon follows the last filespec, DISASMBL will exit to DOS when
the command has been executed. If no semicolon is entered, DISASMBL will
display the menu options described above and wait for further input after
executing the command.
EXAMPLE: DISASMBL DATE.COM, DATE.ASM; will exit to DOS.
If the filespec for the .ASM file and/or .TBL file is omitted, DISASMBL
will generate first the .ASM file, then a .TBL file using the filename of
the first filespec.
EXAMPLE: DISASMBL DATE.COM,,; creates DATE.ASM and DATE.TBL and
exits to DOS.
If only the reference table is desired, the device name NUL should be
entered in place of an .ASM filespec
EXAMPLE: DISASMBL DATE.COM, NUL, DATE.TBL
If only one filespec is given when the program is called, the reference
table is built in memory and then the menu options are displayed for
EXAMPLE: DISASMBL DATE.COM
* PROGRAM EXECUTION *
The disassembly is done in two passes through the executable file. On pass
#1, the reference table is built in memory. The .ASM output is generated
during pass #2. Once the reference table is established, it remains in
memory until an X or Q command is issued, and subsequent A and R command
executions skip pass #1. This saves a lot of time when the executable file
Three contiguous data areas are built dynamically in memory during pass #1.
First is the compressed sequential instruction list. Second is a list of
pointers for .EXE files that point to the locations of all relocatable
variables in the program, also arranged in numerical order. These are
established before reading any code. Third, the reference table is then
built in a higher area of memory as pass #1 progresses.
If all available memory in the program segment is filled before the first
two data areas are completed, DISASMBL will abort to the command prompt.
After the reference table is started, a shortage of memory will produce the
message "Reference Table Incomplete Due to Insufficient Memory" but
DISASMBL will continue.
Ctrl-Break may be used at any time to interrupt a command in progress. It
may take a couple of seconds to be recognized.
* READING THE ASSEMBLY CODE FILE (.ASM) *
This file begins with a title taken from the executable file's name and
original date followed by the current date (in brackets).
If not inhibited by the /M switch in the .SEQ file (explained later), the
macro library will appear next in the file.
Next will be a .RADIX 16 pseudo-op which tells the macro assembler that all
numbers are in hexadecimal form. Carriage return and line feed are then
defined by equates.
Then comes a header that indicates a starting value for the code segment,
stack segment, instruction pointer and the stack pointer. The stack
pointer is usually set to FFFF for .COM files but may be somewhat less
depending on available memory. These values are obtained from the header
for .EXE files.
The first ASSUME statement might come next. There is one generated for
each segment that begins with code. All segment registers are designated
according to the current set of ASSUMEs. They will sometimes be incorrect,
so all ASSUME statements should be checked prior to assembly.
The disassembled output follows, terminated by an END statement and the
execution address. An ORG psuedo-op is included if required.
The text is compatible with MASM and the format is the same except for
RETurns. To avoid the need for PROCedure titles, special mnemonics are
provided for all RET instructions. These are defined in the macro library
at the beginning of the file. Only macros that are needed for the current
file are produced. The optional embedded commands that make up the
reference table enhance the readability of the file. For very large files,
this is sometimes undesirable and a separate reference table is best.
When invalid instructions are encountered in code areas, they are
reproduced as byte values followed by "??". If a near jump is defined
previously in the code, and it is within range of a short jump, a NOP
instruction is inserted after the jump. The executable file created with
this .ASM file and the Macro Assembler and Linker will then be the same
length as the original file. This makes it less important to differentiate
between labels and numeric constants since the label values and their
offsets within the file will be the same. The fundamental problem of
disassembly is in knowing if the original assembly code defined a number as
a label which changes as a function of its position or as a number that
always remains the same. However, if you make changes in the assembly
code, you must properly specify all values. You might as well remove all
NOPs at the same time.
Labels are five characters long and begin with "L". Segment labels begin
with "S". The remaining characters are the current instruction counter in
hex form, thus making each label unique and showing its location in the
original file. The instruction counter is continuous throughout the
assembly code without resetting at segment boundaries. The segment labels
are then in byte as opposed to paragraph form. In those cases where a
label value is modified by an ASSUME statement, the original value is
included as a comment in the referencing instruction so that it may be
easily changed back if it was not intended as a location.
The word "Reloc"atable is printed at the end of any line that contains an
absolute paragraph value. These are values that DOS modifies after loading
but before executing a program. They are used for loading segment registers
that are sensitive to the program location in menory. Relocatable values
are not modified by ASSUMEs. DISASMBL converts these numbers from
paragraph to byte values by multiplying them by sixteen so that they will
fit within the 16-bit instruction counter field. When the paragraph value
is negative or exceeds 0FFFH, it is left unchanged and a warning (??) is
issued on that line. When a program larger than 64K bytes is being
disassembled, you may need to divide it into smaller files.
All words are produced as labels, except when the "L" switch has been
enacted in the .SEQ file (explained later). The label name indicates its
numeric value and, if it does not occur on an instruction boundary, the
name indicates its position relative to the current instruction pointer
and it is followed by an EQU statement. Therefore the Macro Assember will
assume that it is a location, but it is easily changed to a constant since
the value is given in the label name. The word OFFSET precedes a label
whenever it is questionable whether it is a label or an immediate value.
You must decide which of the labels should be constants and which of the
constants should be labels, and change them accordingly. When changing
labels to numbers, be sure to append an "H" if the number ends with a "D"
or a "B" since the Macro Assembler will otherwise assume that it is decimal
Bytes are always treated as constants. An optional switch may be included
in the .SEQ file (explained later) which enables numbers instead of labels
if all references to the value are data segment and immediate operation
An effective procedure to follow in attempting to understand the assembly
code file is to look first for the message text area, the input commands,
and the simpler subroutines. Then add label names to addresses in the .SEQ
file (explained later) that remind the you of their purpose. Add comments
to the labels. If these names are well chosen, the larger routines
eventually will become clear. The embedded references are produced as
labels so they will retain their meanings as they are changed.
It is also helpful to spend some time studying the structure of data areas.
Vector tables, which are frequently used to control the program's flow,
reveal the program's structure very quickly. If some routines do not have
labels at the beginning, it is usually because the code or tables that
reference them (or the segment register assumptions) are not properly
defined in the .SEQ file.
* READING THE REFERENCE TABLE (.TBL) *
A referencEE is defined as a number that is referenced somewhere in the
program. It may be a program location or a numeric constant.
A referencOR is is defined as the address in the program from which a
reference is made to the referencEE.
Each entry is composed of a referencEE followed by a list of referencORs.
If more than one line is needed, additional lines are indented to the first
referencOR position. The referencEE is followed by an "S" if it includes
references to the beginning of a segment. The referencOR is followed by
two letters, the first of which represents the segment register that is
implied or prefixed in the referencing instruction. The second letter
indicates the type of operation on the referencEE. When the reference
entries are embedded in the assembly code, all values are preceded with the
1st letter | 2nd letter
SEG REGISTER | TYPE OF OPERATION
C code | J jump M modify - INC, ADD, etc.
S stack | C call I immediate - value or offset
D data | R read T test or compare
E extra | W write ? unknown or ESC instruction
| P port
* WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
The sequential instruction file is a list of special instructions to
DISASMBL which the user creates. The file takes the form of a list of
hexadecimal addresses and single-letter instructions or generation
switches. The .SEQ file must be on the same disk as the source file and
have the same name as the source file with an extension of .SEQ. Each
instruction in the file must be in one of the following formats.:
addr command ;comment
addr command label .ASM-comment
addr command label .ASM-comment ;comment
"addr" represents the instruction pointer value. All addr values must be
in numerical sequence in the file.
"command" may be either a toggle switch or a generation instruction.
"label" is optional and replaces the label generated for this address with
this non-blank string.
".ASM-comment" is optional and must be preceded by "label" unless the dummy
label "." is used. Everything following "label" is treated as an address
comment and will be printed in the .ASM file behind the generated
instruction. The address comment may be up to 255 characters in length and
should not contain a semicolon.
";comment" is optional. Anything following a semicolon in the .SEQ file
instructions is considered as a comment in the .SEQ file only and is not
added to the generated .ASM file.
"label" and ".ASM-comment" are not allowed when a generation switch is
coded, but a ";comment" may be used to help clarify the .SEQ file.
The .SEQ file is read into memory before the first pass starts. The
addresses and commands will be compressed, but "label" and ".ASM-comment"
will be held in memory one to one. An effect of this is that memory space
required for disassembly increases with each "label" and ".ASM-comment"
added to the .SEQ file.
* DESCRIPTION OF GENERATION SWITCHES *
THE VARIOUS TOGGLE SWITCHES ARE SET "ON" BY DEFAULT. Switches may be
toggled on and off at any point in the .SEQ file/disassembly.
All option switches except /M and /H can be either toggled or directly set
by the user. A suffix of "+" turns the switch ON, and a suffix of "-"
turns the switch OFF. Switches encountered in the file that have neither
of these suffixes are toggled to the opposite of their current state; ON
switches are turned OFF and OFF switches are turned ON. For that reason,
it is recommended that you use the "+" and "-" suffixes to be sure the
switch is set to the value you wish.
/B - generate byte references
When ON, both byte and word references are included in the reference table.
When OFF, only word references are generated.
/E - embedded references in ASM file
When ON, reference table entries are inserted in the text just before the
referencEE's definition statement. When OFF, these entries are not
included with the disassembled text. The entire reference table can be
printed with the "R" command.
/F - 8087 mnemonics
When ON, ESC instructions are produced. When OFF, ESC instructions are
assumed to be 8087 instructions and 8087 mnemonics are produced.
/H - append hex "H"
When this switch appears at any point in the .SEQ file, an "H" is appended
to all hex numbers. This does not, of course, apply to the labels which
are hex values preceded by the letter "L". Also, the .RADIX 16 pseudo-op
is omitted which allows the assembler's radix to default to decimal. This
switch defaults to NO H APPEND. Note that it will be set only once. It
retains its value until the next .SEQ file is read.
/L - generate label or number
When ON, all word references are treated as labels. When OFF, a word
reference is treated as a constant if all referencORs are data immediate
/M - suppress macro library
When this switch appears at any point in the .SEQ file, no macro library is
included in the text output. The DEFAULT IS THAT THE MACRO LIBRARY WILL BE
INCLUDED. Note that this switch will be set only once. It retains its
value until the next .SEQ file is read.
/O - control ASM output
When ON, DISASMBL will output the generated text. When OFF, output will be
/R - control TBL output
When ON, DISASMBL will output the generated reference data. When OFF, the
reference table is not printed, and THERE WILL BE NO LABELS in the .ASM
/T - control trace output
When ON, up to 16 bytes of object code are included as comments in each
line of the assembly code file. When OFF, object code is not included.
/Z - generate a help file
When OFF, no help file will be generated. When ON, a file with the name of
the executable file and an extension of ".S_Q" will be generated on the
second pass. This file is written to the same drive as the executable file
is located on. It contains the address of the data item and either a "W"
or a "B" to indicate that the program referred to the data item as either a
byte type or as a word type. More information on the help file may be
found in the file SESSION.DOC. In some programs, data is referred to both
as byte type and as word type, in which case you will have to manually
modify the .ASM code to allow the code to compile without errors. My
preferred method for accomplishing this is to define the data item as a
byte type and then to insert the following on the line immediately
preceding the byte definition:
Wxxxx LABEL WORD
This will be followed by:
Bxxxx DB ? ;these 2 bytes are referred to as a word in the prog.
Lxxxx DB ? ;the "x"s refer to the data item's address.
All references to the data item which expect a word type must refer to
"Wxxxx". All references to the data item which expect a byte type must
refer to "Bxxxx".
There is a BASIC file included in the .ARC file which removes duplicate
entries in the .S_Q file. Before you run it, you must
to sort the file entries by address. The resulting file is of much aid in
deciding what entries are required in the .SEQ file.
* DESCRIPTION OF .SEQ FILE COMMANDS *
A - assume
The following lines contain ASSUMptions for segment register values. They
become effective at the address specified by this instruction and may be
modified anywhere in the disassembly. The required format for assumptions
& 0400 DS
The ampersand indicates a continuation of the A instruction.
In this example, a data segment beginning at an instruction pointer value of
400 will be assumed until another A instruction changes it. CS, ES, and SS
are also supported. The segment assumptions are used for effective address
calculations only. The code segment assumption does not affect the
instruction pointer value.
B - bytes
The data encountered in the source file are assumed to have meaning as
single byte values. That is, they are defined as "DB".
C - code
The bytes encountered in the source file are assumed to be valid 8088
machine language instructions.
D - generate data operand
The operand of the instructions is changed to immediate data. Subsequent
bytes are interpreted as "C" (code follows).
I - initial value for IP
The hexadecimal value on this line overrides the instruction pointer value
at the beginning of the file - not to be confused with the address at which
execution begins. The default values are 0000 for .EXE files and 0100H for
COM files. The execution address following the END statement is omitted if
this option is invoked.
S - strings
The bytes encountered in the source file are assumed to form text. Quoted
text is produced for valid ASCII characters and byte values for others.
# - defined length strings
The first byte encountered in the source file contains the length of the
character string which begins with the next encountered character. This
length value may be overridden by a subsequent .SEQ file instruction.
$ - defined length strings
The first byte encountered in the source file contains the length of the
character string which begins with the next encountered character plus the
length byte itself. This length value may be overridden by a subsequent
SEQ file instruction.
W - words
Pairs of bytes encountered in the source file are assumed to have meaning
as word values. That is, they are defined as "DW".
X - repeating data structure
A cyclic data structure is assumed to begin at the specified instruction
pointer value. The structure definition may follow and is prefixed by an
ampersand (&) to indicate the continuation of this instruction. If the
definition does not follow, then the most recent definition is used, except
that when no structure is yet defined, then an error message is displayed.
The following elements may be used to define the structure:
& NNNN S - The next NNNN bytes are defined as string characters
& NNNN B - The next NNNN bytes are defined as byte values
& NNNN W - The next NNNN WORDS are defined as word values (WORDS, not BYTES)
& XXNN $ - The next sequence of bytes is defined as NN fields. Each
field consists of a length byte and a string of characters.
The length of each field is contained in the first encountered
byte. The high nibble (XX), if non-zero, is a bit mask of the
length field within the byte. The length field is right
justified within the byte after the byte value is sent to the
* EXAMPLES OF .SEQ COMMANDS *
This example .SEQ file shows all the possible instructions in the
;All switches are on at the beginning.
0 /Z ;do not generate a help file
0 /T ;no object code as comments in output
0 /M ;no macro library in output
0 /H ;append "H" to all numbers
00H A ;ASSUME the following segment values
;Note that the ampersand (&) indicates the extended ASSUME
& 380 DS ;the data segment starts at 380 hex
& 380 ES ;the extra segment starts at 380 hex
0200 I ;initialize the instruction pointer to 200
0200 /F- ;introduce 8087 mnemonics (not ESC)
0200 /E ;no embedded references
0200 C ;code begins at 200
0203H W ;words are at 203
0207 C ;more code starting here
220 X ;complex data structure begins here
& 3 W ;3 words
& 1 B ;byte
& 0E02 $ ;2 strings starting with the 2nd byte follow
;bits 3,2,1 of the first byte contain the length of the
;string including the length byte.
;the high nibble (0E) is the mask.
;see also # in summary below
& 1 B ;byte
;the structure repeats until 351
351 B ;bytes
358 C ;more code
380 S ;strings - list of messages
421 W ;words
4FD /B ;no further byte references
502 /R ;garbage here - turn off reference generation
502 /O ;and .ASM output
600H /O+ ;valid code - turn output back on
1A60 /O- ;output file about to fill diskette - turn output
;off but keep scanning for references. Another run
;will be needed to get the remaining code.
1B00 /D ;treat operand as immediate data
1DFD /B+ ;continue with byte references
1F45 W user_prt ;user provided labels will translate
2256 S $MSG ;to upper case
Comments may be included if preceded by a semicolon. Alphabetic characters
may be either upper or lower case. An "H" may follow the hex address.
* SAMPLE SESSION *
A tutorial sample session wherein DISASMBL.COM is disassembled may be found
in the file SESSION.DOC.