Contents of the ASMGEN.DOC file
* ASMGEN.COM - by J. Gersbach and J. Damke (Ver. 2.01) *
* A program to generate cross-referenced assembly language code *
* from any executablefile. *
* Uploaded toPCanadaby MarkMagner November 23, 1983 *
* PREFACE *
This program will generate 8086/87/88 assembly code text that is
compatible with the IBM Personal Computer Macro Assembler from
any executable diskette file up to 65,535 bytes. The output can
be routed to the console or a diskette file. A reference list
may be generated separately or embedded at the appropiate
instruction counter address in the assembly code.
Some manual touch up will be required before reassembly, but
nearly all the typing is done for you by ASMGEN and anything
questionable is marked with "??".
A file of sequential instructions may be resident on the same
diskette to indicate to ASMGEN which addresses contain code,
bytes, words, or strings. This file may also include
instructions to assume segment register values or toggle the
output of assembley code text, generation of the reference
table, 8087 mnemonics, of the inclusion of embedded reference
information in the assembly file.
DEBUG may be used to browse through the executable file to
determine the starting locations of code and data to develop the
sequential instruction file. It is important to accurately
specify these locations for an accurate reference tabel and
minimum touching up of the ASM output text.
The number of references within the file determines the amount
of memory required since a reference tabel is built in memory
during the first pass. Disassembly is done from disk and only
one file sector is in memory at any given time. Therefore memory
size does not limit the size of the file to be disassembled. 48K
bytes of memory will be enough for most programs but a few will
need 64K or 128K. One diskette drive is sufficient but two is
* STARTING ASMGEN *
There are two ways to work with ASMGEN: either by using the
command menu or by calling ASMGEN with parameters. Following are
the descriptions of both options.
* USING THE ASMGEN MENU *
The program is invoked by typing: ASMGEN
You are then prompted for a file specification. Respond with the
name of the executable file from which you wish to generate the
assembly code. The executable file will normally have an
extension of .EXE or .COM. ASMGEN will check this file spec for
validity and then respond with a prompt that includes a summary
of the command letters indicating that you may give it a
command. The executable file contents are not checked for valid
code and ASMGEN will try to dis-assemble text or compressed
BASIC files and produce unintelligible assembly code.
The commands are:
X filespecThis file spec replaces any previous executable
file spec. The usual file extension is .COM
EXAMPLE: X DATE.COM
A The executable file is disassembled and the assem-
bly code is routed to the specified file. The
usual file extension is .ASM. If the filespec is
omitted, the output will default to the console.
EXAMPLE: A DATE.ASM
R The reference table is sent to the file specified.
The usual file extension is .TBL. If the filespec
is omitted, the output will default to the console.
EXAMPLE: R DATE.TBL
QThe program is terminated and control returned to
Each time a command has been executed, ASMGEN waits with a one
line prompt for the next command.
X , A , R or Q ?
The default filespec for each command is shown in brackets.
Enter the next command of your choice as described above.
* USING ASMGEN WITH PARAMETER CALLS *
Up to three file specifications may be included when ASMGEN is
first called from DOS. The executable file's name is given
first, followed by specifications for the assembly and reference
EXAMPLE: ASMGEN DATE.COM, DATE.ASM, DATE.TBL
If a semicolon follows the last filespec, ASMGEN will exit to
DOS when the command has been executed. If no semicolon is
entered, ASMGEN will display the menu options described above
and wait for further input after executing the command.
EXAMPLE: ASMGEN DATE.COM, DATE.ASM;
If the filespec for the .ASM file and/or .TBL file is omitted,
ASMGEN will generate first the .ASM file, then a .TBL file using
the filename of the first filespec.
EXAMPLE: ASMGEN DATE.COM,,; creates DATE.ASM and DATE.TBL and
exits to DOS.
If only the reference table is desired, the dummy name NUL
should be entered in place of an .ASM filespec
EXAMPLE: ASMGEN DATE.COM, NUL, DATE.TBL
If only one filespec is given when the program is called, the
reference table is built in memory and then the menu options are
displayed for further commands.
EXAMPLE: ASMGEN DATE.COM
* PROGRAM EXECUTION *
The disassembly is done in two passes through the scource file.
On pass #1, the reference table is built in memory and the
actual output is generated during pass #2. Once the reference
table is established, it remains in memory until an X or Q
command is issued, and subsequent A and R command executions
skip pass #1. This saves a lot of time when the executable file
Three contiguous data areas are built dynamically in memory
during pass #1. First is the compressed sequential instruction
list. Second is a list of pointers for .EXE files that point to
the locations of all relocatable variables in the program, also
arranged in numerical order. These are established before
reading any code. Third, the reference table is then built in a
higher area of memory as pass #1 progresses.
If all available memory in the program segment is filled before
the first two data areas are completed, ASMGEN will abort to the
command prompt. After the reference table is started, a shortage
of memory will produce the message "Reference Table Incomplete
Due to Insufficient Memory" and continue.
Ctrl-Break may be used at any time to interrupt a command in
* READING THE ASSEMBLY CODE FILE (.ASM) *
This file begins with a title taken from the executable file's
name and date followed by the current date (in brackets).
If not inhibited by the M switch in a .SEQ file (explained
later), the macro library will appear next in the file.
Next will be a .RADIX 16 pseudo-op which tells the macro
assembler that all numbers are in hexadecimal form.
Then comes a header that indicates a starting value for the code
segment, stack segment, instruction pointer and the stack
pointer. The stack pointer is usually set to FFFF for .COM files
but may be somewhat less depending on available memory. These
values are passed by the linker for .EXE files.
The first ASSUME statement might come next. There is one
generated for each segment that begins with code. All segment
registers are designated according to the current set of
ASSUMEs. They will sometimes be incorrect, so all ASSUME
statements should be checked prior to re-assembly.
The disassembled output follows, terminated by an END statement
and the execution address. An ORG psuedo-op is included if
The text is compatible with the IBM Macro Assembler and the
format is the same except for RETurns. To avoid the need for
PROCedure titles, special mnemonics are provided for all RET
instructions. These are defined in the macro library at the
beginning of the file. Only macros that are needed for the
current file are produced. The optional embedded commands that
make up the reference table enhance the readability of the file.
For very large files, this is sometimes undesirable and a
separate reference table is best.
When invalid instructions are encountered in code areas, they
are reproduced as byte values followed by "??". If a near jump
is defined previously in the code, and it is within range of a
short jump, a NOP instruction is inserted after the jump. The
executable file created with this .ASM file and the Macro
Assembler and Linker will then be the same length as the
original file. This makes it less important to differentiate
between labels and numeric constants since the label values and
their offsets within the file will be the same. The fundamental
problem of disassembly is in knowing if the original assembly
code defined a number as a label which changes as a function of
it's position or as a number that always remains the same. If
you make changes in the assembly code however, you must properly
specify all values. You might as well remove all NOPs at the
Labels are five characters long and begin with "L". Segment
labels begin with "S". The remaining characters are the current
instruction counter in hex form, thus making each label unique
and showing it's location in the original file. The instruction
counter is continuous throughout the assembly code without
resetting at segment boundaries. The segment labels are then in
byte as opposed to paragraph form. In those cases where a label
value is modified by an ASSUME statement, the original value is
included as a comment in the referencing instruction so that it
may be easily changed back if it was not intended as a location.
The word "Relocatable" is printed at the end of any line that
contains an ablolute paragraph value. These are values that DOS
modifies after loading but before executing a program. They are
used for loading segment registers that are sensitive to the
program location in menory. Relocatable values are not modified
by ASSUMEs. ASMGEN converts these numbers from paragraph to byte
values by multiplying them by sixteen so that they will fit
within the 16-bit instruction counter field. When the paragraph
value is negative or exceeds 0FFFH, it is left unchanged and a
warning (??) is issued on that line. When a program larger than
64K bytes is being disassembled, it should be divided into
All words are produced as labels, except when the "L" switch has
been enacted in the .SEQ file (explained later). The label name
indicates it's numeric value and, if it does not occur on an
instruction boundary, the name indicates it's position relative
to the current instruction pointer is given by an EQU statement.
Therefore the Macro Assember will assume that it is a location,
but it is easily changed to a constant since the value is given
in the label name. The word OFFSET precedes a label whenever it
is questionable whether it is a label or an immediate value. You
must decide which of the labels should be constants and which of
the constants should be labels, and change them accordingly.
When changing labels to numbers, be sure to append an "H" if the
number ends with a "D" or a "B" since the Macro Assembler will
otherwise assume that it is decimal or binary.
Bytes are always treated as constants. An optional switch may be
included in the .SEQ file (explained later) which enables
numbers instead of labels if all references to the value are
data segment and immediate operation types.
An effective procedure to follow in attempting to understand the
assembly code file is to look first for the message text area,
the input commands, and the simpler subroutines. Then add label
names to addresses in the .SEQ file (explained later) that
remind the you of their purpose. Add comments to the labels. If
these names are well chosen, the larger routines eventually will
become clear. The embedded references are produced as labels so
they will retain their meanings as they are changed.
It is also helpful to spend some time studying the structure of
data areas. Vector tables, which are frequently used to control
the program's flow, reveal the program's structure very quickly.
If some routines do not have labels at the beginning, it is
usually because the code or tables that reference them (or the
segment register assumptions) are not properly defined in the
* READING THE REFERENCE TABLE (.TBL) *
A referencee is defined as a number that is referenced somewhere
in the program. It may be a program loaction or a numeric
A referencor is is defined as the address in the program from
which a reference is made to the referencee.
Each entry is composed of a referencEE followed by a list of
referencors. If more than one line is needed, additional lines
are indented to the first referencor position. The referencEE is
followed by an "S" if it includes references to the beginning of
segment. The referencor is followed by two letters, the first of
which represents the segment register that is implied or
prefixed in the referencing instruction. The second letter
indicates the type of operation on the referencEE. When the
reference entries are embedded in the assembly code, all values
are preceded with the letter "L".
1st letter| 2nd letter
SEG REGISTER| TYPE OF OPERATION
C code| J jump M modify - INC, ADD, etc.
S stack| C call I immediate - value or offset
D data| R read T test or compare
E extra| W write ? unknown or ESC instruction
| P port
* WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
The sequential instruction file is a list of special
instructions to ASMGEN which the user creates. The file takes
the form of a list of hexadecimal addresses and single-letter
instructions or generation switches. If used, the .SEQ file must
be on the same diskette as the source file and have the same
name as the source file with an extension of .SEQ. Each
instruction in the file must be in one of the following formats:
"addr" represents the instruction pointer value. All addr values
must be in numerical sequence in the file.
"command" may be either a toggle switch or a generation
"label" is optional and replaces the label generated for this
address with this non-blank string.
"comment" is optional and must be preceded by "label" unless the
dummy label "." is used. Everything following "label" is treated
as an address comment and will be printed in the ASM file behind
the generated instruction. The address comment may be up to 255
characters in length and should not contain a semi-colon.
";comment" is optional. Anything following a semi-colon in the
.SEQ file instructions is considered as a comment in the .SEQ
file only and is not added to the generated .ASM file.
"label" and "comment" are not allowed when a generation switch
is coded, but a ";comment" may be used to help clarify the .SEQ
The .SEQ file is read into memory before the first pass starts.
The addresses and commands will be compressed, but "label" and
"comment" will be held in memory one to one. An effect of this
is that memory space required for disassembly increases with
each "label" and "comment" added to the .SEQ file.
* DESCRIPTION OF GENERATION SWITCHES *
THE VARIOUS TOGGLE SWITCHES ARE SET TO ON BY DEFAULT. Switches
may be toggled on and off at any point in the .SEQ
All options switches except /M and /H can be either toggled or
directly set by the user. A suffix of "+" turns the switch ON,
and a suffix of "-" turns the switch OFF. Switches encountered
in the file that have neither of these suffixes are toggled to
the opposite of their state at the time; ON switches are turned
OFF and OFF switches are turned ON.
/B - generate byte references
When ON, byte and word references are included in the reference
table. When OFF, only word references are generated.
/E - embedded references in ASM file
When ON, reference table entries are inserted in the text just
before the referencee's definition statement. When OFF, these
entries are not included with the disassembled text. The entire
reference table can be printed with the "R" command.
/F - 8087 mnemonics
When ON, ESC instructions are produced. When OFF, ESC
instructions are assumed to be 8087 instructions and 8087
mnemonics are produced.
/H - append hex "H"
When this switch appears at any point in the .SEQ file, an "H"
is appended to all hex numbers. This does not, of course, apply
to the labels which are hex values preceded by the letter "L".
The .RADIX 16 pseudo-op is omitted which allows the assembler's
radix to default to decimal. This switch defaults to NO H
APPEND. Note that it will be set only once. It retains it's
value until the next .SEQ file is read.
/L - generate label or number
When ON, all word references are treated as labels. When OFF, a
word reference is treated as a constant if all referencors are
data immediate types.
/M - suppress macro library
When this switch appears at any point in the .SEQ file, no macro
library is included in the text output. The DEFAULT IS THAT THE
MACRO LIBRARY WILL BE INCLUDED. Note that this switch will be
set only once. It retains it's value until the next .SEQ file is
/O - control ASM output
When ON, ASMGEN will output the generated text. When OFF, output
will be suppressed.
/R - control TBL output
When ON, ASMGEN will output the generated reference data. When
OFF, the reference table is not printed.
/T - control trace output
When ON, up to 16 bytes of object code are included as comments
in each line of the assembly code file. When OFF, object code
is not included.
* DESCRIPTION OF .SEQ FILE COMMANDS *
A - assume
The following lines contain ASSUMptions for segment register
values. They become effective at the address specified by this
instruction and may be modified anywhere in the disassembly.
The required format for assumptions is:
& 0400 DS
The ampersand indicates a continuation of the A instruction.
In this example, a data segment beginning at a instruction
pointer value of 400 will be assumed until another A instruction
changes it. CS, ES, and SS are also supported. The segment
assumptions are used for effective address calculations only.
The code segment assumption does not affect the instruction
B - bytes
The bytes encountered in the source file are assumed to have
meaning as single byte values.
C - code
The bytes encountered in the source file are assumed to be valid
8088 machine language instructions.
D - generate data operand
The operand of the instructions is changed to immediate data.
Subsequent bytes are interpreted as "C" (code follows).
--Appears to not work correctly if the /H shitch is set--
I - initial value for IP
The hexadecimal value on this line overrides the instruction
pointer value at the beginning of the file - not to be confused
with the address at which execution begins. The default values
are 0000 for EXE files and 0100H for COM and other files. The
execution address following the END statement is omitted if this
option is invoked.
S - strings
The bytes encountered in the source file are assumed to form
text. Quoted text is produced for printable ASCII characters and
byte values for others.
# - defined length strings
The first byte encountered in the source file contains the
length of the character string which begins with the next
encountered character. This length value may be overridden by a
subsequent SEQ file instruction.
$ - defined length strings
The first byte encountered in the source file contains the
length of the character string which begins with the next
encountered character plus the length byte itself. This length
value may be overridden by a subsequent SEQ file instruction.
W - words
Pairs of bytes encountered in the source file are assumed to
have meaning as word values.
X - repeating data structure
A cyclic data structure is assumed to begin at the specified
instruction pointer value. The structure definition may follow
and is prefixed by an ampersand (&) to indicate the continuation
of this instruction. If the definition does not follow, then the
most recent definition is used. If no structure is yet defined,
then an error message is displayed.
The following elements may be used to define the structure:
& NNNN S - The next NNNN bytes are defined as string characters
& NNNN B - The next NNNN bytes are defined as byte values
& NNNN W - The next NNNN bytes are defined as word values
& XXNN $ - The next sequence of bytes is defined as NN fields.
Each field consists of a length byte and a string
of characters. The length of each field is
contained in the first encountered byte. The high
nibble (XX), if non-zero, is a bit mask of the
length field within the byte. The length field is
right-justified within the byte after the byte
value is sent to the output file.
* EXAMPLES OF .SEQ COMMANDS *
This example .SEQ file shows all the possible instructions in
the appropriate format.
;All switches are on at the beginning.
0/T;no object code as comments in output
0/M;no macro library in output
0/H;append "H" to all numbers
00H A;assume the following segment values
;Note that the ampersand (&) indicates the extended ASSUME
& 380DS;the data segment starts at 380 hex
& 380ES;the extra segment starts at 380 hex
0200 I;initialize the instruction pointer to 200
0200/F;introduce 8087 mnemonics (not ESC)
0200/E;no embedded references
0200 C;code begins at 200
0203H W;words are at 203
0207 C;more code starting here
220 X;complex data structure begins here
& 3 W;words
& 1 B;byte
& 0E02 $;2 strings starting with the 2nd byte follow
;bits 3,2,1 of the first byte contain the length of the
;string including the length byte.
;the high nibble (0E) is the mask.
;see also # in summary below
& 1 B;byte
;the structure repeats until 351
358 C;more code
380 S;strings - list of messages
4FD/B;no further byte references
502/R;garbage here - turn off reference generation
600H/O+;valid code - turn output back on
1A60 /O-;output file about to fill diskette - turn output
;off but keep ;scanning for references. another run
;will be needed to get the remaining code.
1B00/D;treat operand as immediate data
1DFD/B+;continue with byte references
1F45 Wuser_prt;user provided labels will translate
2256 S$MSG;to upper case
^Z;Ctrl-Z, End-of-file marker is necessary
Comments may be included if preceded by a semicolon.
Alphabetic characters may be either upper or lower case.
An "H" may follow the hex address.
* SAMPLE SESSION *
The external command CHKDSK.COM will serve as an example for
this sample session because it is short. The .SEQ file is also
short and easy to generate. Only these few instructions are
0100 /T ;include object code as comments in .ASM file
0100 /E ;simpler output without references
04F7H S ;messages
04F7H /H ;append "H" to numeric values
Using DEBUG, browse through CHKDSK.COM to see how this was
arrived at. Usually, but not always, the best procedure is to
assume code. If the code appears unintelligible, display it in
hex/ASCII. If it is not text, assume bytes. Label positions in
the first disassembly may indicate that some locations should be
words. Next, generate the .ASM file by typing
The assembly code can be viewed on the screen. Then type
to save the assembly source code to a file. Then,
to save the cross-reference table to disk.
The Macro Assembler, Link.exe and Exe2bin could now be used to
assemble CHKDSK.ASM, link it to .EXE and convert it to a .COM
file. No modification should be necessary in this case.
If working with code that is to be modified, the symbol types
must be correctly specified as locations or as constants. If
they are constants, place them outside of any segment. The label
names may then be changed to make the code more readable.
There are several disassembly issues and errors that must be corrected.
1.) put the PAGE directive "PAGE 56,132" on a line near the top of
the .ASM listing.
2.) With the /H switch set, the number 0FFFF is not always provided
with an "H" suffix. It must be corrected by hand.
2a.) With the /H switch set, the immediate operand value generated
with the D command will need to be patched with an H suffix.
3.) .SYS type device driver files start at 0000, and need an ASSUME
directive. They also need a "0000 I" line to start disassembly
at 0000H. Two pointers to executable code are at 0006 and 0008.
4.) Override operators BYTE PTR or WORD PTR are sometimes needed.
5.) Most Microsoft MASM versions simply refuse to assemble certain
disassembled instructions (usually LES or LDS instructions with
an 8-bit register specified). They actually assemble the code
correctly, but doesn't like the construct and declare an assembly
error. In such cases, try using the IBM's MASM 1.00, it isn't
as fussy about assembling such questionable code.
6.) If a message 'Hex value error in /SEQ file' is displayed upon
attempting to disassemble a file, it may be necessary to add a
Ctrl-Z, End-of-file marker, at the end of the .SEQ file.