Category : Assembly Language Source Code
Archive   : TOASM30A.ZIP
Filename : TOASM.DOC
TOASM
Version 4.2a
Shareware Version 3.0a
Copyright (c) 1988,1989
PRINCE WILLIAM CUSTOM COMPUTERS
P.O.B. 2106
Woodbridge, VA 22193
(703) 590-3360
TABLE OF CONTENTS
INTRODUCTION 1
INVOKING TOASM 2
HOW TOASM WORKS 2
OUTPUT DESCRIBED 3
BUILDING A .USR FILE 4
NOTES 6
TIPS 7
KNOWN ERRORS 7
SHAREWARE NOTES
This Shareware version of TOASM is fully functional but somewhat
limited in the size of the files that can be converted. The symbol
table is restricted to 1024 entries to reduce the memory required.
The number of possible user comments and user labels is also reduced.
This is to allow users with limited available memory to use TOASM.
Registered users will receive two copies of TOASM.EXE, one with
similar capabilities to this version and one optimized for 640kb
equipped machines. Registered users will also receive one free update
and discounts on further updates.
Registration is only $45.00! Just send your check or money order to the
above address.
TOASM
INTRODUCTION
TOASM is a program which converts .EXE and .COM files to assembly
language. The generated assembly language is fully compatible with
MASM 4.0 from MicroSoft. While many programs can be converted without
intervention by the user, the user can also provide input through a
.USR file to improve the results. Several programs are included with
TOASM to aid the user in developing a .USR. Also included are several
programs which demonstrate the operation of TOASM.
DUMP is a program which will dump any file in both hexadecimal and
ascii to either the screen, a printer or a file.
FREF is a program which looks through a specified file for references
to addresses provided by the user.
TESTEXE.EXE is a EXE type program which is being used in the
development of TOASM. Each time a problem is discovered converting a
EXE type program, that problem is added to this file. It has no
practical value as anything other than a test for TOASM and to
demonstrate the operation of TOASM with .EXE type files.
TESTEXE.USR is a sample .USR file which is used to convert
TESTEXE.EXE.
TESTCOM.COM is a COM type program which is being used in the
development of TOASM. Each time a problem is discovered converting a
COM type program, that problem is added to this file. It has no
practical value as anything other than a test for TOASM and to
demonstrate the operation of TOASM with .COM type files.
TESTCOM.USR is a sample .USR file which is used to convert
TESTCOM.COM.
TST.BAT is a batch file which will convert both TESTEXE.EXE and
TESTCOM.COM to assembly language. The user may need to change some
commands in this file for different assemblers and/or linkers.
TOASM - 1
INVOKING TOASM
TOASM is started from the DOS command prompt. The only required
argument is the file name of the file to be converted. Two optional
parameters are available.
TOASM InFil.ext [OtFil] [UsrFil]
InFil is the name of the file to be converted. The file name extension
(.ext) is required.
OtFil is an optional name for the generated output file. If not
provided, then the .ASM file will be named InFil.ASM where InFil is
the same as the input file name. It is usually a good idea to provide
the OtFil name so that the original InFil is not overwritten when the
.ASM file is subsequently assembled and linked.
UsrFil is an optional name for the .USR input file. If not provided,
the .USR file (if any) will be presumed to be named InFil.USR where
InFil is the same as the input file name.
HOW TOASM WORKS
TOASM first initializes several internal tables and builds an empty
symbol table. It next reads the .USR file (if provided) and installs
corresponding symbols in the symbol table. If a user provided symbol
includes a label, the label is placed in an internal label table. The
same is true for user provided comments. Next, TOASM reads any
relocation information contained within the file to be converted and
places corresponding entries in the symbol table. Then, TOASM builds
the rest of the symbol table by reading through the entire file to be
converted looking for references. Once the symbol table is fully
built, TOASM reads the entire file one more time while generating the
.ASM output.
TOASM - 2
OUTPUT DESCRIBED
Please execute the TST.BAT file included with TOASM and refer to the
output file (TEXE.ASM) for the following discussion.
The first line generated by TOASM is a title line made up from the
file name of the file to be converted.
Next, any macros used will be found. All "RET" instructions are coded
as macros so that "PROC" statements are not needed.
Following the macros, several "EQU" statements will be found which are
included to reduce the size of the generated file.
The next line is pretty interesting! It defines a label called junk!
This label is only needed if an XLAT instruction with segment override
is found in the file to be converted - MASM gets upset if no label
follows the segment override.
The "Initial Seg Values" comment line shows the segment values found
in the exe header for .EXE files or the presumed values for .COM
files.
While none occurred within the test files, any line containing code
which TOASM does not understand will be flagged with '?' followed by a
comment describing what TOASM did not like about the instruction.
Lines flagged in this manner are not preceded with a ';' so that MASM
will generate an error when the line is assembled. The number of
lines flagged in this manner is displayed in the 'ERRORS' line of the
execution screen. The most important part of any file conversion will
be to include instructions in the .USR file to resolve any of these
errors.
Segment labels are prefixed with 'S' while all other TOASM generated
labels are prefixed with 'L'. The remaining characters within labels
are the 4 hex characters which relate to the offset within the input
file of the label. For .COM input files, the hex value will be the
actual loaded memory offset (file offset + 100H) of the label. For
.EXE files, the hex value denotes the offset within the load image
contained within .EXE file.
The flag <
contains a relocatable symbol. These symbols are those which were
found in the relocation entries during pass 1. Note that this only
applies to .EXE files as there is no relocation table in .COM files.
'Ofs' precedes labels whenever TOASM finds it questionable as to
whether the value is a constant or actual offset. It is up to the
user to determine which of the labels should be treated as constants
and which should be treated as labels. If it is determined that the
value is a constant, a 'D' type entry can be made for the instruction.
TOASM - 3
BUILDING A .USR FILE
The .USR file is a list of special instructions to TOASM which is
optionally created by the user. The format of each line in the .USR
file must be one of the following...
addr command
or
addr command label
or
addr command label comment
"addr" represents the loaded memory offset at which to define the
command. All "addr" values must be in numerical sequence in the file.
For .COM files, the loaded memory offset is the file offset plus 100H
(the same offset that debug will display) . For .EXE files, it is the
offset within the load image found within the EXE header. EXE files
can be confusing in this reguard, the best way to handle them is to
run TOASM first without a .USR file and look at the resulting .ASM
file to determine be base offsets to use.
"command" may be 'C', 'B', 'W', 'S', 'D', 'V'.
C - The bytes encountered are assumed to be code (machine
instructions).
B - The bytes encountered are assumed to be data byte (DB) values
until changed by another "command". The value found will be
treated as a constant.
W - Two byte values encountered are assumed to be data word (DW)
references until changed by another "command". The value found
will be treated as a label.
V - Two byte values encountered are assumed to be data word (DW)
values until changed by another "command". The value found will
be treated as a constant.
S - The bytes encountered are assumed to form a string of ascii
byte values. Quoted text will be produced for valid ASCII
characters and byte values for others.
D - The operand of the instruction is assumed to be a constant.
This command is for one instruction only. Code is automatically
assumed for the next instruction.
"label" is optional and will be used for references to this "addr"
instead of the label TOASM would generate.
"comment" is optional and must be preceded by the "command" and
"label" fields unless a ":" is used. The comment field may include
spaces and tabs. User defined comments over-ride any TOASM generated
comments.
TOASM - 4
Notes...
When converting .EXE files, the address of the data segment is unknown
to TOASM. Since this can cause problems, the user can include a line
in the .USR file which specifies the data segment address. To do this
use: @xxxx where xxxx is the address to be assumed for the data
segment. Only one such line should be included within the .USR file.
TESTEXE.USR makes use of this option.
While reading lines from a .USR file, TOASM first discards all
characters following any ';' found in a line. It then copies all
characters following any ':' found in a line to the internal comment
table.
Upper and/or lower case is acceptable. TOASM converts all characters
read from the .USR file to uppercase as they are read.
Blank lines are considered to be comments an are ignored by TOASM.
Any number of spaces or tabs may separate fields as long as at least
one of either is used.
The address field may contain a trailing 'H' but it is not required.
All addresses are presumed to be in hex.
All addresses must be in asending numeric order.
TOASM - 5
NOTES
Given the opcode and succeeding bytes of 05h,ffh,ffh TOASM will
generate "AND AX,0FFFFh". This is correct. However, the same
instruction sequence will also be generated for 83h,c0h,ffh and
81h,c0h,ffh,ffh! These three code sequences all cause the same final
outcome when executed and there is no separate assembly language
opcode for them. This would not be a problem if all programs were
written using MASM. But since most programs are written using one of
several different high-level language compliers, this problem comes up
quite often.
Some lazy compiler writers also cause other problems. The most
notable of these is the use of 3 byte instead of 2 byte jmp's.
Assemblers generally generate the correct version if the distance to
the target is known when the when the instruction is encountered.
This problem has been handled in TOASM by using a JMPL macro in cases
where the assembler will generate code differently than the original.
Another complier generated difficulty is un-needed segment override
bytes. The instruction LEA SI,ES:[DI+1] is identical to the
instruction LEA SI,[DI+1] but some compliers include the segment
override byte anyway. When found, the segment override should be
marked as a byte in the .USR file -- later versions of TOASM will
handle this problem in this way.
TOASM - 6
TIPS
Generating useful assembly language output depends on the use of the
code generated. If it is done only to see how something is done, a
minimal or no .USR file will be required. However if the desire is
to be able to make changes to a program, more extensive work will
probably be required. Usually, a few commands in the .USR file is all
that is needed to produce output that can be assembled and that will
compare exactly with the original. This does not mean that changes
will be able to be made at this point! For instance, if the
instruction mov ax,ofs L2020 is found at address 100h and a change is
made at 1000h the value moved into ax will be different in the new
program. This may be correct but probably not! The instruction: mov
ax,ofs L2020 probably means the same thing as mov ax,' ' (mov
ax,2020h)! The following is the suggested method to be followed to
create a changeable .ASM file.
1) Find as much data (strings, bytes, words, tables) as possible in
the original and create a .USR file.
2) Run TOASM.
3) Correct any errors found.
4) Run TOASM.
5) Find all constants in instructions and add them to the .USR file.
6) Run TOASM.
7) Assemble and link the new program.
8) Compare the original and new programs and if they don't compare go
back to step 3.
9) Look thru the generated source and start assigning labels and
comments in the .USR file. Be on the lookout for sections of code
that cannot be reached. If any are found, go back to step 3.
KNOWN ERRORS
At the time of this writing, there are no known errors in TOASM. This
is not to say that there are no errors, there are probably many!
Please let us know if you find any.
TOASM - 7
Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!
This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.
But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/