** GETTING STARTED **
Use DEBUG to look at DISASMBL.COM:
note that CX = file size = 2C32, so the highest address that you need to
consider now is 2D32 (100 + file length, where 100 is the beginning of the
.COM file). Also note that there is a valid mnemonic (CLD) being
to see if this is really code. You will see that it is. Now do a general
dump just to get a feel for the code and data areas in the program:
too fast. You are looking for data, so you should concentrate on the ASCII
portion of the dump. You should see that there is data at 2B8. These are
the switches and the command instructions you saw in the program
documentation. Make a note of the address, and the length of the data.
Confirm that the data length is correct:
U 2D0 to confirm that code follows at 2D0
Make a note of the code address (2D0) and then resume the dump:
You will watch lots of garbage go by before you see a repeating pattern in
the ASCII dump at about 1C41. This repeating pattern is a good indicator
that what you are viewing is data rather than code, as are blocks of NULs
and, of course, alphabetic and numeric characters. You must now find where
the code ends and the data begins. First, stop the display with
Unfortunately, you will find this isn't much help, but look for C2, C3 or
CA, (RETurns) and also for E9, EA or EB, (JuMPs). Working backwards from
1C41, you will find a JMP at 1C39, another at 1C37 and a RET at 1C2E. One
of these is probably the end of the code. Try an unassembly to see if it
provides better help than the data dump:
I chose those adresses because the first few instructions are garbage
unless you happen to hit the exact byte where an instruction begins, so any
junk has scrolled off the screen. The ending address selection is
self-evident. The data begins at 1C2F and the clue is the LOOP instruction
following the RET because it refers to a higher address rather than a lower
address, and that is almost never done. Well, I have never seen it. Make
a note of the address 1C2F as the start of the data. Then dump the rest of
the file. You are looking for more code, just in case it might exist:
It should be reasonably apparent that the rest of the file is data. Exit
** CREATING THE SEQUENTIAL FILE **
I prefer to keep all reference information separate from the .ASM code
because it makes the code more legible if it isn't all "junked up" and I
also disable the trace output for the same reason, so every .SEQ file I
write begins with:
Since this is a .COM file, and because it was checked with DEBUG, I know
that the code begins at offset 100. Although it is the default, I set the
instruction mode for code for clarity anyway. Using the information found
during the dump, the rest of the .SEQ file looks like this:
100 C ;for clarity
2B8 S ;(remember the data was mostly letters)
I find it a help when first starting to disassemble a file to say that all
data is defined as byte strings because then DISASMBL will show all data
that is alphanumeric in ASCII (so it is legible if it is indeed string
data) and it can't hurt to assume all data is bytes, at least not yet.
Therefore, I set the data type in the last line of the .SEQ file to "S".
Create the above file with your text editor and then do a disassembly.
** FIRST DISASSEMBLY - CREATE A HELP FILE **
The purpose of this disassembly is to create a help file which will be
integrated into the .SEQ file you just wrote.:
When you are prompted for X, A, R or Q, type:
This sends the .ASM file to the NUL device (the garbage can) and creates
the DISASMBL.S_Q help file. When prompted, quit DISASMBL:
The help file data is in the order DISASMBL found the references to the
data and contains multiple occurrences of the same information because the
executable program refers to most data several times. To put the help
information in sequential order for its integration into the .SEQ file, it
must be sorted.:
The above says to "sort from the file DISASMBL.S_Q to the file
DISASMBL.SRT". When the sort is finished, remove the multiple references
using the BASIC program ELIMDUP:
When prompted, enter the sorted file's name, then the help file's name:
Print the help file so you can make notes on it. Then:
Look at the ASCII part of the dump. You are hunting for strings and for
buffers. You will see that 2177 through 288A are the mnemonics for the op
codes, register names, macros, messages and the like. Your computer may
make the mnemonics difficult to see because the last byte's high bit is on,
so the last letter of the mnemonic may not display. You may want to
disassemble DEBUG to correct that; then again, it may be better to suppress
the display of data with the high bit on. You could rewrite DEBUG so that
a switch would decide how to dump such data.
There is another group of string data at 28AE through 28BD, a possible
buffer at 2922 through 29AF, string data at 29B0 through 2B37 and a
probable buffer beginning at 2B38 and lasting until the end of the program.
Be careful when buffers appear at the end of a program because, in order to
minimize the size of the executable file, the actual end of the buffer may
be set using an EQUate rather than by defining it as data, so the apparent
program size (as indicated by the BX:CX register pair) may not be the
actual program size in memory! As a matter of fact, that is the case with
Write the information acquired from the dump on the help file printout in
the appropriate locations. You will see that much of the .SEQ file
information may be deleted because it is in areas where there are strings.
Cross out those superflous lines now. Also, you will see that lines 1
through 39 of the help file printout contain references to addresses in the
code of DISASMBL, so cross those out too. Exit DEBUG:
** EDITING THE HELP FILE **
Load the DISASMBL.HLP file into your text editor and delete the lines you
crossed out on the printout. Then delete the lines with addresses higher
than 2D31; they are immediate data. Insert the following lines in the
correct sequential locations in the help file, as indicated on the
0 /Z ;no help file this time
1C2F W ;data still begins at 1C2F, but now I think it is WORD type
2046 X ;(I give you this as a present)
& 1 B
& 1 W
20A6 W ;end of repeating data structure (the "W" isn't redundant!)
2177 S ;the beginning of the string data as found with DEBUG
28AE S ;string data found with DEBUG
29B0 S ;string data found with DEBUG
2B38 B ;buffer area
Save the file. Load the DISASMBL.SEQ file into your text editor and delete
the last line. Append the DISASMBL.HLP file to the end of the .SEQ file.
Now comes the hard work. You must analyze the information with respect to
the information around it and with respect to a data item's size.
Moreover, since redundancies take up memory which you may need in order to
disassemble a file, only those entries which are needed should be kept.
Note that redundancies don't hurt anything when you have memory to burn.
On the other hand, the simplest possible .SEQ file should be used because
it is easier to understand, and you certainly don't need to add to the
difficulties of disassembling a program!
At this point, you should have a .SEQ file of 108 lines. Delete the lines
with address 2020 on them; they probably represent 2 spaces rather than a
data location. You can always put new information in a .SEQ file, so be a
bit ruthless in deleting. Just save the original printout of the help file
so you know how the program referred to the data should you need to restore
Delete the lines with the addresses 20B6 20D2 20DA 213A 2152 and 215A
because they all refer to WORD items and are therefore redundant. So is
the 28AC line, so delete it. Similarly, delete 28C0 and 28C6.
At 28DE, the data is referred to both as BYTE and as WORD. Because the
following address (20DF) would not be accessible if this were to be defined
as a WORD, it MUST be a BYTE. Perhaps that is not clear, so let me attempt
to explain a bit better. Every time you define a data item as WORD, that
data item consumes 2 bytes, so the next possible address in this case would
be 28E0 if 28DE is defined as a WORD, and that passes right by 28DF.
Delete the line "28DE
Delete 28DF 28E0 and "28E1
accessible, and delete "28E2
redundant so delete it. Delete 28E5 28E7 and 28E9 because they are also
At 28EF, the following addresses increment by 2. This means that the
program accesses the data as WORDs in spite of what the help file says.
Further, if there is an option, the WORD data type is preferred because
WORD data types are predominant in most programs. Finally, since no
reference is made to 28F0, assume 28EF is a WORD and delete the BYTE
reference to it.
Delete 28F1 and 28F4 because they are redundant. As with 28EF, delete
Although it is not as apparent as some of the previous occurrences, 28F9
must be a BYTE or else 28FC is not accessible, so delete "28F9
count the three bytes from 28F9 to 28FC to satisfy yourself that an odd
number of bytes is involved. You will either give up disassembling or
become proficient at hexadecimal counting...
The lines with 28F9 28FC 28FD 28FE 28FF 2900 and 2901 are redundant; delete
them. Delete "2901
through 2904 redundant. Delete "2904
and delete the BYTE reference to it. Delete 2916 and 2918. Delete "291A
too. 291C must be a BYTE, so delete the WORD line. Delete 291D and the
BYTE reference to 291E. Decide why and delete the next 9 lines, beginning
The area from 292A to 29AA is a 128-byte buffer. Many buffers in DOS are
multiples of 128 bytes. If you subtract 292A from 29AA the result is 80
hex, which is 128 in decimal. This can be done in DEBUG by typing
H29AA 292A. The sum is the first number displayed and the difference is
the second number displayed. In most cases, buffers should be defined as
BYTEs, so delete the WORD reference to 292A.
Delete the BYTE reference to 29AA, the WORD reference to 29AC and delete
29AD 29AE and 29AF because they are either redundant or impossible. Hey!
You're done, so save the .SEQ file, being sure there are 35 lines in it.
** SECOND DISASSEMBLY **
Now that a more complete sequential file exists, disassemble the executable
file again and save the source code it generates. This source code will be
examined for problems due to possible inadequacies of the sequential file.:
Load the DISASMBL.ASM file into your text editor and look it over. You
will find that it "falls apart" after label L126D and again after L1429.
The first clue that something is wrong at L126D is the lack of a label
where one would be expected. DISASMBL only generates a label if there is a
reference to it, so a missing label may not be a disaster, but it sure as
hell is a strong warning that something is wrong! You will notice that
other labels are missing, and you would normally check to find out why, but
in the interest of your sanity, I'm telling you that DISASMBL is a bitch of
a program and that the missing labels are hidden in the program's data and
may be ignored for this demonstration.
The problem at L1429 is more obvious because of the question marks in the
comment area, the ESCape op code and the defined bytes. Because the code
went to hell, all the following code (and some of the preceding code as
well) is suspect. Skip over it for now. However, the data area should be
OK, so it should be examined now to see if there are any obvious problems
there. It too has a problem at L1C46, as you shall see later. Right now,
the problems in the code need to be corrected. To do that:
First unassemble the code starting where it is probably OK, which is at
126D. Note that there is a CALL 7677, which is outside the program address
limits. Also note that there are four ADD instructions in a row and that
two will normally do any math needed. The object here is to find the
correct place where the code resumes and to segregate it from (what must
be) the data. Begin by unassembling at 1272, and note that there is a
reference to the stack pointer without any subsequent STI instruction.
Nobody in his or her right mind messes with the stack pointer without first
disabling interrupts and then reenabling them after the instruction has
completed, so that can't be the correct new start of the code. Now try
unassembling at 1275, the next probable place, and note that it may indeed
be the correct new start. Finally, examine the hex code in the suspect
area beginning at 126F. If you convert the hex to decimal, you will see
that 2710 equals 10,000 decimal, that 03E8 = 1,000, 0064 = 100, 000A = 10
and, of course 0001 = 1. This should finally resolve the matter for you;
the code starts at 1279. Make a note that 126F is a WORD reference and
that 1279 is code.
Now unassemble at 1429. You should immediately notice a zero at 142A, the
next byte following the RET. Try unassembling at 142B, and the result is
an instantaneous OK! Make a note that 142A is a defined BYTE and that the
code starts at 142B, then exit DEBUG.
Enter the corrections in the .SEQ file, observing the necessity of keeping
the addresses sequential:
126F W ;10000, 1000, 100, 10 & 1
** THIRD DISASSMBLY **
Disassemble using the revised .SEQ file, save the DISASMBL.ASM output and,
just for the hell of it, MASM, LINK and EXE2BIN it. You do not need an
assembly listing yet. There are 66 warnings from the assembly, but no
errors. When you EXE2BIN, do not specify a destination filename or you
will probably ruin the existing DISASMBL.COM file! Now, compare the files
DISASMBL.COM and DISASMBL.BIN. Depending on your DOS, the file to be used
to do the comparison may be "FC.EXE" or it may be "COMP.COM". Type, for
example, "FC DISASMBL.COM DISASMBL.BIN" and get ready to press the
key. You will see that the files do not match beginning at 1B58 (which
really means 1C58 because the program starts at 100). This is in the data
area and is your best clue that the data at L1C46 needs to be corrected.
Remember that I previously referred to the possible problem there. Now
that you know it exists, look at the DISASMBL.ASM file and you will see
that the data is one byte too long to match the label. To correct that,
specify that 1C45 is a BYTE instead of a WORD:
Place these corrections in the sequential file and repeat the disassembly,
reassemble, link and do the file compare as before. This time, although
there are still 66 warnings, there is no difference between the files.
** WHAT NOW, COACH? **
If you are satisfied with the results now, this is as far as you must go.
But if you want to get rid of the errors indicated by MASM, you have lots
more work to do. I will tell you what is required and I will make some
suggestions, but this is the end of the tutorial.
** POLISHING UP PROCEDURES **
The basic steps to beautiful source code from the mess you currently have
are as follows:
1. Remove the 0 /Z from the .SEQ file and go through the steps
needed to create and print a help file. This will assist in
deciding how data is to be defined as well as identifying the
2. Redirect the error messages displayed by MASM to a file and print
them. This will help you find the data items incorrectly defined
in the .SEQ file.
3. You may wish to create a .TBL file on disk, edit out the things you
don't care about, and print it. I never use it.
4. Armed with the above, make all possible corrections to the .SEQ
file until you have reached the minimum possible number of warnings
generated by MASM.
5. Output the .ASM code to disk. Using the information at the end of
the .ASM file and the information from the help file, edit the .ASM
code to get rid of the offset references where immediate should be.
Then search and remove all EQUates not referenced. Using the MASM
warnings list, resolve the rest of the discrepancies in the .ASM
code that cause the warnings. These are usually resolved by adding
"BYTE PTR" or "WORD PTR" as required, as DISASMBL tends to leave
them out in certain circumstances. Search out and resolve any
question marks in the source code.
6. Print the edited source code. Read it and pencil in any comments
you are able to make from the reading. DEBUG the program, tracing
its execution. Pencil in what you find out about it from tracing
it through. Try to devise descriptive label names as you
understand more about how the program functions.
7. Edit in your pencil comments. Edit in your label names too, but
put the original label in the comment field so that the address
will be available during subsequent DEBUG sessions. You may also
want to edit in the label names that reference this label using the
CALL and JMP instructions so you know how the program gets to this
particular place in the code. As you affix more names to labels,
these can be changed to reflect the name of the "calling" area, or
you may want to leave them as addresses if that helps clarify
things for you.
8. Repeat steps 6 and 7 until you are satisfied.
9. Resolve everything that would prevent the program from being
relocatable. That is, make sure that if you change anything in the
program, it will not cause the program to execute the wrong code or
access the wrong data. When you are satisfied that the program is
relocatable, compare it again to the original to make sure you
haven't made any mistakes in the editing process.
10. Make the changes you want in the program and enjoy! After all,
that is why you went through all this aggravation, isn't it?