Dec 112017
Explanation of .com and .exe files. | |||
---|---|---|---|
File Name | File Size | Zip Size | Zip Type |
COM-EXE.TXT | 15360 | 4181 | deflated |
Download File COM-EXE.ZIP Here
Contents of the COM-EXE.TXT file
Appendix A
COM and EXE Files Explained
Michael A. Covington
Department of Computer Science
University of Georgia
Copyright 1986 Michael A. Covington
PREPUBLICATION DRAFT.
DISTRIBUTED FOR EDUCATIONAL USE ONLY.
March 4, 1986
Under MS-DOS (PC-DOS) on the IBM PC, there are two types of
executable object programs, COM files and EXE files, whose names
end in '.COM' and '.EXE' respectively. COM files can be loaded into
memory much more quickly than EXE files, but EXE files can contain
larger, more complex programs. This appendix will discuss the
differences between COM and EXE files in detail.
Segment registers
The 8088 microprocessor in the IBM PC uses 20-bit (5-hex-digit)
addresses to identify memory locations. However, none of the
registers in the processor contains more than 16 bits (4 hex
digits). To get around this limitation, the 8088 allows the
programmer to describe addresses as 16-bit offsets. Each offset is
converted into a complete 20-bit address by combining it with the
value contained in the appropriate segment register.
A block of memory small enough to be addressed by varying only the
offset, without changing the segment register, is called a segment.
The maximum size of a segment is 64K bytes.
Offsets of data items are combined with the DS (Data Segment)
register unless instructions specify otherwise. For instance, the
instruction
MOV AL,[200H]
copies the contents of memory location 200H into the AL register.
Suppose the DS register contains 0410H. The full address of the
byte that is copied into AL is obtained as follows:
2
0410 Segment
+ 0200 Offset
--------
06100 Complete (absolute) address
Note that the segment register is shifted one hex digit to the left
before performing the addition. Offsets of program instructions are
combined with the CS (Code Segment) register. For example, the
statement
JMP 0100H
says to jump to offset 0100H relative to the current code segment.
If the CS register contains 0215H, the complete address to which
the processor will jump is obtained as follows:
0215 Segment
+ 0100 Offset
--------
02250 Complete (absolute) address
The two other segment registers are SS (Stack Segment), which
specifies the location of an area of memory used as a pushdown
stack, and ES (Extra Segment), which the programmer is free to use
for any purpose.
The programmer can specify that a particular segment register is to
be used with a particular offset. For instance, to retrieve the
byte whose offset is 0200H relative to the ES rather than the DS
register, use the instructions:
MOV AL,ES:[0200H]
The 'ES:' prefix here is called a segment override and generates an
extra byte in the object program. The most common use of segment
overrides is to access small data areas which for some reason are
stored in the code segment rather than the data segment.
COM files
A COM file contains a program that fits entirely into one segment.
In order to load a COM file into memory, DOS constructs a small
control area called a program segment prefix, 100H bytes long, and
copies the COM file into memory immediately after it.
DOS then sets the CS, DS, and ES registers to point to the
beginning of the program segment prefix, automatically sets aside a
small stack, and begins execution at offset 0100H in the code
segment (which is the first byte that was copied in from the COM
file).
3
With some assemblers, such as Turbo Editasm (TASM), you can create
a COM file by simply writing a series of assembly language
instructions, without any SEGMENT, ENDS, or other pseudo
instructions. With the Microsoft Macro Assembler, however, you must
follow the outline shown here:
; File SAMPLCOM.ASM
; Framework for a program that is to be made into a .COM file
; Assemble the program onto an .OBJ file, use LINK to create
; an .EXE file, and then use EXE2BIN to make it into a .COM file.
; The error message 'No stack segment' is normal when linking.
; Do not try to run the .EXE file.
CODE SEGMENT PUBLIC
ORG 100H ; this must precede the ASSUME stmt
ASSUME CS:CODE,DS:CODE
MAIN PROC FAR
;
; Main program goes here
;
INT 20H ; return to DOS
MAIN ENDP
;
; Other PROCs, if any, go here
;
;
; Data goes here
;
CODE ENDS
END MAIN ; specifies that execution is
; to start at beginning of MAIN
4
Notes on creating COM files
Notice the following important things about the outline:
(1) There is only one segment. (Here it is named CODE, but it could
have any name.)
(2) Immediately after the SEGMENT instruction is an ORG 100H
instruction. This tells the assembler that the program will be
loaded at offset 0100H relative to the segment registers, and all
addresses should therefore be figured from that starting point.
(3) The ASSUME statement tells the assembler to compute addresses
on the assumption that the CS and DS registers both point to the
segment called CODE. (The segment address of CODE will actually be
placed in the CS and DS registers by DOS when the program is loaded
into memory.)
(4) The main program is declared to be a PROC FAR. Actually, this
is not required, but it contributes to readability. Here it is
named MAIN, but it could have any name.
(5) The main program terminates by performing interrupt 20H.
(6) The data goes after the program, within the code segment.
(7) After the end of the code segment, the instruction END MAIN
informs the assembler and linker that execution is to begin at the
beginning of the procedure called MAIN.
There is no data segment because the data goes inside the code
segment. There is no stack segment because DOS automatically
reserves a 256-byte stack at the end of the program's segment.
The process of creating a COM file is shown below. We'll assume
that MASM, LINK, and EXE2BIN are on drive A, and the user's
program, called MYPROG.ASM, is on drive B.
5
How to create a COM file with the Macro Assembler
A>masm
The IBM Personal Computer Macro Assembler Version 1.10 (c)
Copyright IBM Corp. 1982
Source filename [.ASM]: b:myprog.asm
Object filename [B:MYPROG.OBJ]:
Source listing [NUL.LST]:
Cross reference [NUL.CRF]:
Warning Severe
Errors Errors
0 0
A>link
The IBM Personal Computer Linker Version 2.00
(c) Copyright IBM Corp. 1983
Object modules [.OBJ]: b:myprog.obj
Run file [B:MYPROG.EXE]:
List file [NUL.MAP]:
Libraries [.LIB]:
Warning: No STACK segment
There was 1 error detected. ;;; This message is normal!!!
A>exe2bin b:myprog.exe b:myprog.com
A>erase b:myprog.exe
After creating the COM file, delete the EXE file so that, when you
type myprog, DOS will know which of the two to execute. (Remember
that both COM files and EXE files can be executed by typing their
names.)
6
EXE files
An EXE file can consist of any number of segments. Two of these
segments are special: the starting code segment (which contains the
starting point label to which the END statement refers) and the
stack segment (the only segment that is declared as SEGMENT STACK
rather than SEGMENT PUBLIC).
When the program is loaded, the segment registers are given values
as follows:
CS points to the starting code segment.
SS points to the stack segment.
DS and ES point to the program segment prefix.
Execution begins at the beginning of the starting code segment (not
at offset 0100H).
The following is an outline of an assembly language program that is
to be made into an EXE file:
7
; File SAMPLEXE.ASM
; Framework for a program to be made into an .EXE file
; Assemble the program onto an .OBJ file,
; and then use LINK to create an .EXE file from it.
STACK SEGMENT STACK
; The stack should be at least 256 bytes.
; Here we fill it with a byte pattern that will
; be easy to recognize when memory is displayed.
DB 64 DUP('STACK...')
STACK ENDS
DATA SEGMENT PUBLIC
;
; Data goes here
DATA ENDS
CODE SEGMENT PUBLIC
ASSUME CS:CODE,DS:DATA,SS:STACK
MAIN PROC FAR
PUSH DS
SUB AX,AX
PUSH AX ; set up for return to DOS
MOV AX,DATA
MOV DS,AX ; put correct value into DS
;
; Main program goes here
RET ; return to DOS
MAIN ENDP
;
; Other PROCs, if any, go here
CODE ENDS
END MAIN ; specifies that execution is
; to start at beginning of MAIN
8
Notes on creating EXE files
Note the following points about the framework:
(1) There are several segments. Here they are called STACK, DATA,
and CODE, but each of them could have any name.
(2) The ASSUME instruction (at the beginning of the code segment)
tells the assembler which segments are being used for instructions,
data, and stack space.
(3) An EXE file cannot terminate execution with INT 20H because the
segment registers do not have the values that INT 20H expects.
Instead, the program ends with a far return instruction (i.e., a
RET embedded in a PROC FAR).
(4) The purpose of the RET instruction is to make the program give
control back to the operating system. To do this, it must jump to
offset 0 of the segment whose value was in DS at the beginning of
execution. Recall that a RET instruction causes the CPU to pop an
offset and a segment off the stack and jump to that location. The
program must therefore begin by pushing the original value of DS
onto the stack, followed by a word of zeroes (the offset). (SUB
AX,AX is a quick way to place 0 into AX.)
(5) It is up to the programmer to decide which segment is being
used for data and place its segment address into DS. This is done
with the instructions:
MOV AX,DATA
MOV DS,AX
Two MOV instructions are required because there is no way to move a
value directly into a segment register.
Because the label DATA appears on a SEGMENT psuedo-instruction, it
stands for the segment address of the data segment. All other
labels (on data items, program instructions, and the like) stand
for offsets.
(6) The END MAIN statement tells the assembler and linker that
execution should begin at the beginning of the procedure labeled
MAIN. (The procedure containing the main program could have any
name; it does not have to be called MAIN.)
The process of creating an EXE file is shown below. As before, we
assume that MASM and LINK are on drive A and your program (called
MYPROG.ASM) is on drive B.
9
How to create an EXE file with the Macro Assembler
A>masm
The IBM Personal Computer Macro Assembler Version 1.10
(c) Copyright IBM Corp. 1982
Source filename [.ASM]: b:myprog.asm
Object filename [B:MYPROG.OBJ]:
Source listing [NUL.LST]:
Cross reference [NUL.CRF]:
Warning Severe
Errors Errors
0 0
A>link
The IBM Personal Computer Linker Version 2.00
(c) Copyright IBM Corp. 1983
Object modules [.OBJ]: b:myprog.obj
Run file [B:MYPROG.EXE]:
List file [NUL.MAP]:
Libraries [.LIB]:
The finished result is on the file B:MYPROG.EXE.
-end-
ent.
DS and ES point to the program segment prefix.
Execution begins at the beginning of the starting code segment (not
at offset 0100H).
The following is an outline of an assembly language program that is
to be made into an EXE file:
7
; File SAMPLEXE.ASM
; Framework for a program to be made into an .EXE file
; Assemble the program onto an .OBJ file,
; and then use LINK to create an .EXE file from it.
STACK SE
COM and EXE Files Explained
Michael A. Covington
Department of Computer Science
University of Georgia
Copyright 1986 Michael A. Covington
PREPUBLICATION DRAFT.
DISTRIBUTED FOR EDUCATIONAL USE ONLY.
March 4, 1986
Under MS-DOS (PC-DOS) on the IBM PC, there are two types of
executable object programs, COM files and EXE files, whose names
end in '.COM' and '.EXE' respectively. COM files can be loaded into
memory much more quickly than EXE files, but EXE files can contain
larger, more complex programs. This appendix will discuss the
differences between COM and EXE files in detail.
Segment registers
The 8088 microprocessor in the IBM PC uses 20-bit (5-hex-digit)
addresses to identify memory locations. However, none of the
registers in the processor contains more than 16 bits (4 hex
digits). To get around this limitation, the 8088 allows the
programmer to describe addresses as 16-bit offsets. Each offset is
converted into a complete 20-bit address by combining it with the
value contained in the appropriate segment register.
A block of memory small enough to be addressed by varying only the
offset, without changing the segment register, is called a segment.
The maximum size of a segment is 64K bytes.
Offsets of data items are combined with the DS (Data Segment)
register unless instructions specify otherwise. For instance, the
instruction
MOV AL,[200H]
copies the contents of memory location 200H into the AL register.
Suppose the DS register contains 0410H. The full address of the
byte that is copied into AL is obtained as follows:
2
0410 Segment
+ 0200 Offset
--------
06100 Complete (absolute) address
Note that the segment register is shifted one hex digit to the left
before performing the addition. Offsets of program instructions are
combined with the CS (Code Segment) register. For example, the
statement
JMP 0100H
says to jump to offset 0100H relative to the current code segment.
If the CS register contains 0215H, the complete address to which
the processor will jump is obtained as follows:
0215 Segment
+ 0100 Offset
--------
02250 Complete (absolute) address
The two other segment registers are SS (Stack Segment), which
specifies the location of an area of memory used as a pushdown
stack, and ES (Extra Segment), which the programmer is free to use
for any purpose.
The programmer can specify that a particular segment register is to
be used with a particular offset. For instance, to retrieve the
byte whose offset is 0200H relative to the ES rather than the DS
register, use the instructions:
MOV AL,ES:[0200H]
The 'ES:' prefix here is called a segment override and generates an
extra byte in the object program. The most common use of segment
overrides is to access small data areas which for some reason are
stored in the code segment rather than the data segment.
COM files
A COM file contains a program that fits entirely into one segment.
In order to load a COM file into memory, DOS constructs a small
control area called a program segment prefix, 100H bytes long, and
copies the COM file into memory immediately after it.
DOS then sets the CS, DS, and ES registers to point to the
beginning of the program segment prefix, automatically sets aside a
small stack, and begins execution at offset 0100H in the code
segment (which is the first byte that was copied in from the COM
file).
3
With some assemblers, such as Turbo Editasm (TASM), you can create
a COM file by simply writing a series of assembly language
instructions, without any SEGMENT, ENDS, or other pseudo
instructions. With the Microsoft Macro Assembler, however, you must
follow the outline shown here:
; File SAMPLCOM.ASM
; Framework for a program that is to be made into a .COM file
; Assemble the program onto an .OBJ file, use LINK to create
; an .EXE file, and then use EXE2BIN to make it into a .COM file.
; The error message 'No stack segment' is normal when linking.
; Do not try to run the .EXE file.
CODE SEGMENT PUBLIC
ORG 100H ; this must precede the ASSUME stmt
ASSUME CS:CODE,DS:CODE
MAIN PROC FAR
;
; Main program goes here
;
INT 20H ; return to DOS
MAIN ENDP
;
; Other PROCs, if any, go here
;
;
; Data goes here
;
CODE ENDS
END MAIN ; specifies that execution is
; to start at beginning of MAIN
4
Notes on creating COM files
Notice the following important things about the outline:
(1) There is only one segment. (Here it is named CODE, but it could
have any name.)
(2) Immediately after the SEGMENT instruction is an ORG 100H
instruction. This tells the assembler that the program will be
loaded at offset 0100H relative to the segment registers, and all
addresses should therefore be figured from that starting point.
(3) The ASSUME statement tells the assembler to compute addresses
on the assumption that the CS and DS registers both point to the
segment called CODE. (The segment address of CODE will actually be
placed in the CS and DS registers by DOS when the program is loaded
into memory.)
(4) The main program is declared to be a PROC FAR. Actually, this
is not required, but it contributes to readability. Here it is
named MAIN, but it could have any name.
(5) The main program terminates by performing interrupt 20H.
(6) The data goes after the program, within the code segment.
(7) After the end of the code segment, the instruction END MAIN
informs the assembler and linker that execution is to begin at the
beginning of the procedure called MAIN.
There is no data segment because the data goes inside the code
segment. There is no stack segment because DOS automatically
reserves a 256-byte stack at the end of the program's segment.
The process of creating a COM file is shown below. We'll assume
that MASM, LINK, and EXE2BIN are on drive A, and the user's
program, called MYPROG.ASM, is on drive B.
5
How to create a COM file with the Macro Assembler
A>masm
The IBM Personal Computer Macro Assembler Version 1.10 (c)
Copyright IBM Corp. 1982
Source filename [.ASM]: b:myprog.asm
Object filename [B:MYPROG.OBJ]:
Source listing [NUL.LST]:
Cross reference [NUL.CRF]:
Warning Severe
Errors Errors
0 0
A>link
The IBM Personal Computer Linker Version 2.00
(c) Copyright IBM Corp. 1983
Object modules [.OBJ]: b:myprog.obj
Run file [B:MYPROG.EXE]:
List file [NUL.MAP]:
Libraries [.LIB]:
Warning: No STACK segment
There was 1 error detected. ;;; This message is normal!!!
A>exe2bin b:myprog.exe b:myprog.com
A>erase b:myprog.exe
After creating the COM file, delete the EXE file so that, when you
type myprog, DOS will know which of the two to execute. (Remember
that both COM files and EXE files can be executed by typing their
names.)
6
EXE files
An EXE file can consist of any number of segments. Two of these
segments are special: the starting code segment (which contains the
starting point label to which the END statement refers) and the
stack segment (the only segment that is declared as SEGMENT STACK
rather than SEGMENT PUBLIC).
When the program is loaded, the segment registers are given values
as follows:
CS points to the starting code segment.
SS points to the stack segment.
DS and ES point to the program segment prefix.
Execution begins at the beginning of the starting code segment (not
at offset 0100H).
The following is an outline of an assembly language program that is
to be made into an EXE file:
7
; File SAMPLEXE.ASM
; Framework for a program to be made into an .EXE file
; Assemble the program onto an .OBJ file,
; and then use LINK to create an .EXE file from it.
STACK SEGMENT STACK
; The stack should be at least 256 bytes.
; Here we fill it with a byte pattern that will
; be easy to recognize when memory is displayed.
DB 64 DUP('STACK...')
STACK ENDS
DATA SEGMENT PUBLIC
;
; Data goes here
DATA ENDS
CODE SEGMENT PUBLIC
ASSUME CS:CODE,DS:DATA,SS:STACK
MAIN PROC FAR
PUSH DS
SUB AX,AX
PUSH AX ; set up for return to DOS
MOV AX,DATA
MOV DS,AX ; put correct value into DS
;
; Main program goes here
RET ; return to DOS
MAIN ENDP
;
; Other PROCs, if any, go here
CODE ENDS
END MAIN ; specifies that execution is
; to start at beginning of MAIN
8
Notes on creating EXE files
Note the following points about the framework:
(1) There are several segments. Here they are called STACK, DATA,
and CODE, but each of them could have any name.
(2) The ASSUME instruction (at the beginning of the code segment)
tells the assembler which segments are being used for instructions,
data, and stack space.
(3) An EXE file cannot terminate execution with INT 20H because the
segment registers do not have the values that INT 20H expects.
Instead, the program ends with a far return instruction (i.e., a
RET embedded in a PROC FAR).
(4) The purpose of the RET instruction is to make the program give
control back to the operating system. To do this, it must jump to
offset 0 of the segment whose value was in DS at the beginning of
execution. Recall that a RET instruction causes the CPU to pop an
offset and a segment off the stack and jump to that location. The
program must therefore begin by pushing the original value of DS
onto the stack, followed by a word of zeroes (the offset). (SUB
AX,AX is a quick way to place 0 into AX.)
(5) It is up to the programmer to decide which segment is being
used for data and place its segment address into DS. This is done
with the instructions:
MOV AX,DATA
MOV DS,AX
Two MOV instructions are required because there is no way to move a
value directly into a segment register.
Because the label DATA appears on a SEGMENT psuedo-instruction, it
stands for the segment address of the data segment. All other
labels (on data items, program instructions, and the like) stand
for offsets.
(6) The END MAIN statement tells the assembler and linker that
execution should begin at the beginning of the procedure labeled
MAIN. (The procedure containing the main program could have any
name; it does not have to be called MAIN.)
The process of creating an EXE file is shown below. As before, we
assume that MASM and LINK are on drive A and your program (called
MYPROG.ASM) is on drive B.
9
How to create an EXE file with the Macro Assembler
A>masm
The IBM Personal Computer Macro Assembler Version 1.10
(c) Copyright IBM Corp. 1982
Source filename [.ASM]: b:myprog.asm
Object filename [B:MYPROG.OBJ]:
Source listing [NUL.LST]:
Cross reference [NUL.CRF]:
Warning Severe
Errors Errors
0 0
A>link
The IBM Personal Computer Linker Version 2.00
(c) Copyright IBM Corp. 1983
Object modules [.OBJ]: b:myprog.obj
Run file [B:MYPROG.EXE]:
List file [NUL.MAP]:
Libraries [.LIB]:
The finished result is on the file B:MYPROG.EXE.
-end-
ent.
DS and ES point to the program segment prefix.
Execution begins at the beginning of the starting code segment (not
at offset 0100H).
The following is an outline of an assembly language program that is
to be made into an EXE file:
7
; File SAMPLEXE.ASM
; Framework for a program to be made into an .EXE file
; Assemble the program onto an .OBJ file,
; and then use LINK to create an .EXE file from it.
STACK SE
December 11, 2017
Add comments