Dec 062017
Text file on assembly language for beginners, by a beginner. Covers TSRs, High level interfacing, registers, addressing, memory and more. | |||
---|---|---|---|
File Name | File Size | Zip Size | Zip Type |
ASSEMBLY.TXT | 45350 | 15748 | deflated |
TPCREAD.ME | 199 | 165 | deflated |
Download File ASMTXT01.ZIP Here
Contents of the ASSEMBLY.TXT file
/****************************** ASMTXT01.TXT *******************************/
Welcome to the world of assembly language!
I will be speaking in terms of the DOS operating system using 8086
assembly language.
First off, I will admit I am no professional on Assembly Language. I am
a beginner. Why is a beginner writing a text on assembly language ??
Well, I'll tell ya. I looked and looked and looked on all kinds of boards
for text files on assembly when I wanted to get into it. I found nothing, or
a lot of nothing. A lot of the stuff I mention in this text file is based on
my observations from things I've tried, things I've read, and things I've
done. I have learned all of this stuff from experimenting and no
experienced assistance. So, if Some of my explanations are way off-
base, I apologize in advance. If you are an assembly guru reading this,
feel free to make the neccesary corrections. All I ask is you rename the
file to ASMTXT02.TXT or whatever, accordingly. If you can't upload the
revised file using an updated form of the original .ZIP filename, just
be clever with the description.
A quick glance at terminology and other arcane wording found here:
(These aren't neccesarily standards)
MASM = Microsoft Macro Assembler
TASM = Turbo Assembler (Borland International)
PWB = Programmers Work Bench (Microsoft's MASM editor v6.1)
BC = Borland International's C compiler and programming environment.
HLL = High Level Language
asm = acronym pertaining to assembly.
cpu = Central Processing Unit. Your processor _chip_ (286, 386, 486)
vram = Video Ram (video memory)
int = short for "interrupt". Think of interrupts as a "tug-on-the-pantleg"
of the operating system, or the CPU, or the BIOS, etc.
mem = RAM on a pc. Memory.
regs = referring to the registers inside the CPU. reg, regs, etc.
ivt = the Interrupt Vector Table.
IDE = Integrated Developement Environment
... = more code here
DOS call = any DOS interrupt. Defined by "Int 21h"
BIOS call = any BIOS interrupt. Int 10h, Int 13h, etc.
comfile = a "Core Image", or .COM file.
executable = any file you can execute through DOS. .EXE, .COM and maybe
.BAT .
Books I have found to be of high caliber, excellent reading, the
most informative, helpful, and worth the bucks. These are not in any
certain order. You _will_ need more than one book to learn asm. I have,
uh lessee, -10 books on assembly. I've read most all of them, but usually
resort to the following three:
Mastering Turbo Assembler - Tom Swan - Hayden Books 1989.
isbn#: 0-672-48435-8
This guy is an excellent author. He keeps things simple, and covers a
wide area of asm.
$29.95
PC Principles - Gunnar Forst - The M.I.T. press 1992.
isbn#: 0-262-06134-1 (hd. cover?) and 0-262-56053-4 (paperback)
This book was first published in Copenhagen, Denmark under the name
"PC Principper".
The best way to explain this book is "bitchin". This is my 'black-book'.
It covers EVERYTHING. Hardware, Software, how DOS works, how memory works,
.EXE headers, Boot sector info, FAT, Machine Language, blah, blah.
A must. This book beats anything Norton can throw.
$32.50
Using Assembly Language 3rd ed. - Allen L. Wyatt Sr. - Que 1992.
isbn#: 0-88022-884-9
My copy is falling apart. The cover is falling off, and chapter 20 has
been detached from the binding. This book is great because it covers all
the interrupts with snippets of examples. The author keeps things
light-hearted, and throws in some humor here and there.
Excellent reference for HLL interfacing and interrupts in general.
$29.95
-Why learn Assembly?
This is largely a matter of opinion. Some people will irritatingly
state the following:
"assembly is archaic and outdated"
"with all the High Level languages available today, why bother ?"
"There is nothing you can do in assembly that I can't do in a HLL"
The truth of the matter is, human nature tells us to take the easy
way out. HLL's are very forgiving these days with their integrated
compilers wich warn you of things like suspicios pointers and such.
Take a look at some COBOL code sometime; it's almost like talking to
something.
A friend and I bought MASM 6.1 a couple years ago. The online
help was EXCELLENT, but the editor drove me up the wall. You could
(I did) sit for six hours customizing every detail of the editor.
When I would assemble and link the programs from within MASM'S
Programmers Work Bench, I would get weird errors I felt had nothing
to do with the code. I would have to shell to DOS, assemble and link
the code, but I couldn't step thru it with the PWB because it would
try and re-assemble the code again with the Same grating erros. I
was probably doing something wrong, or had something set wrong, but
finally, DEL C:\MASM\*.* .
I purchased Borland C v3.1 a year later, and wow, BC comes with Turbo
Assembler! There is no 'editor' or workbench or anything. You
can run TA from inside BC's 'editor', but I don't think you can
step thru the code; Well not TRUE assembly code. You can regurgitate
assembly code within BC with "asm" statements and step thru it that way
(I learned a lot doing that). It is a kinda ass pain, but it makes a
real nice code stepper for assembly.
With TASM, what you get is a command line assembler and
linker. Plain Vanilla. That's about what I ended up doing with
MASM, but now there is no temptation to try and figure out the
PWB.
If you don't have a good debugger to double check things, running
your assembly programs in a watch-and-see type fashion can produce some
very interesting cataclysmic reactions. I have blown up my File Allocation
Tables in the past, and screwed up the video bios. But, I never
"permanently" wrecked anything. And I never had a debugger until about
2 years ago (Borland's Turbo Debugger). I'm not patient enough to
analyze the hell out of a chunk of code. That is also why I am not
a professional.
If you have the patience for assembly, the payoffs (for me anyway)
can be very rewarding. I have learned a lot about memory, the
processor, hardware, floppy drives, boot sectors, interrupts and
TSR's. Ah yes, TSRs. Here is something I would like to see done
in a HLL. Code a TSR in your favorite HLL, but I would like to
see the resulting .COM file size under 1000 bytes.
Here is mine using Turbo Assembler (I'll discuss some of this later).
This advances a character across the monochrome screen and is toggled
on and off with the scroll lock key.
(The .COM file linked to 183 bytes, yes a 183 byte executable file)
1 .model tiny ;/* tiny mem model for comfiles */
2 intnum equ 1ch ;/* hook the timer interrupt */
3 .code
4 org 100h
5
6 first:
7 jmp loadprog ;/* load the TSR into memory */
8
9 prog proc ;/* tsr code in mem starts here */
10 pushf ;/* save flags for old handler */
11
12 call dword ptr cs:OldInt ;/* chain to old handler */
13
14 cmp cs:inproc, 1 ;/* flag to check if processing */
15 je exit ;/* get out if true */
16
17 mov cs:inproc, 1 ;/* set flag */
18
19 push ax ;/* save all the registers */
20 push bx ;
21 push cx ;
22 push dx ;
23 push di ;
24 push si ;
25 push ds ;
26 push es ;
27 pushf ;/**/
28
29 cli ;/* disable ints while fooling */
30 mov ax, 0040h ;/* with segment registers */
31 mov es, ax ;
32 mov al, byte ptr es:[0017h] ;
33 sti ;/* turn interrupts on again */
34
35 test al, 00010000b ;/* check scroll lock key */
36 jz xorcounter
37
38 cmp cs:counter, 4000d ;/* bottom of screen in vram */
39 je xorcounter ;/* reset counter if there */
40
41 mov ax, 0B000h ;/* point to monochrome vram */
42 mov es, ax ;/* should really cli here */
43 mov si, cs:counter ;/* use counter for character*/
44 mov byte ptr es:[si], 0ffh ;/* erase color attirbute */
45
46 inc si ;/* counter ++ */
47
48 mov byte ptr es:[si], 01h ;/* display character */
49
50 mov cs:counter, si ;/* store counter */
51 mov cs:inproc, 0
52 jmp exit ;/* that's enough for one tick */
53
54 xorcounter:
55 xor cx, cx ;/* reset counter */
56 mov cs:counter, cx ;/**/
57 mov cs:inproc, 0
58
59
60 exit:
61 popf ;/* restore registers */
62 pop es ;
63 pop ds ;
64 pop si ;
65 pop di ;
66 pop dx ;
67 pop cx ;
68 pop bx ;
69 pop ax ;/**/
70
71 iret ;/* return from handler */
72 oldint label word
73 oldaddr dd 0000
74 counter dw 0
75 inproc db 0
76
77 prog endp
78
79 loadprog proc
80
81 mov ah, 35h ;/* DOS function for getting */
82 mov al, intnum ;/* interrupt vector */
83 int 21h
84
85 mov oldint, bx ;/* save old vector */
86 mov oldint[2], es ;/**/
87
88 mov ah, 25h
89 lea dx, prog ;/* "prog" process goes TSR */
90 int 21h
91
92 exit2:
93
94 mov dx, offset succesmsg ;/* display message */
95 mov ah, 9 ;
96 int 21h ;/**/
97
98 mov dx, offset loadprog ;/* release memory used by */
99 int 27h ;/* loading process */
100
101 succesmsg db 0dh, 0ah, 'TSR succesfully loaded.', 0dh, 0ah, '$'
102
103 loadprog endp
104 end first
-Assembler? Assembly?:
An assembler is the program used for assembling your 'text-file'
assembly code. TASM (Borland Intl.), MASM (Microsoft), etc.
It converts the "text" into a file called an object or .OBJ file.
The .OBJ file is then used by the LINKER to create, or link,
the final executable file from the .OBJ:
HELLO.ASM -> Assembler = HELLO.OBJ
HELLO.OBJ -> Linker = HELLO.EXE or HELLO.COM
Assembly is the language. Everyone seems to call assembly language
'assembler'; "When did you first learn Assembler ?"
You can pretty much figure the person asking this question is talking
about programming and not writing an assembler, but if you are a stickler
for terminology, the above statement is misleading.
-Switches
The assembler and linker interpret what are called 'switches' to
handle the whole process in different ways. these are usually designated
by a '-' or '/' (slash). I haven't used all of them, just a couple to
either link to a .COM file or just assemble to an .OBJ for incorporating
the code into a higher level language. (more on that later)
-High Level? Low Level?
Assembly is a low level language. This means it is very close to the
native 'tongue' of the processor. The language of the processor is
actually voltage deviations simply meaning True of False; 1 or 0, on and
off, etc. I guess the step up from the voltage pulses would be binary,
1 and 0 and then machine language, 0 thru 256, then assembly, then
higher level languages like Basic, Pascal, C, Fortran, and a whole
slough of others. If you have ever tried to type a 'binary' file
(executable) from the DOS command line to the screen, you have probably
heard all kinds of beeping from the speaker and a garbled mess of
characters flying all over. This is the machine language you see. All
those little chars. mean something different in the world of the
processor. You could probably write a program using these chars., but
that would be a tedious process. I guess they actually used to do that,
but then along came asm. (assembly).
-Registers ?
Registers are little chunks of memory inside the CPU. All these registers
do is hold numbers. That's it.
Using these registers is a lot faster (for the CPU) than using a variable
in standard memory. All values in these registers are expressed in base 16,
or "Hexidecimal".
There are (for practical purposes) 14 registers in a 8086 CPU (There are
a few more, but they are related to protected mode programming). All of
them, in themselves, are 16 bits wide. This means values ONLY from
0 - 65535 can be stored in them. With the exception of the FLAGS.
AX, BX, CX, DX
CS, IP
DS, ES
SI, DI
SS, SP
BP, FLAGS
AX is called the Accumulator, BX is called the Base, CX is called the Count
and DX is called the Data Index.
the two bytes, in AX for example, are referred to as the High Byte and
Low byte. AH and AL. You can only stick a byte value in each register.
(8 bits, a "char" in C). Decimal values are from 0 - 255. (the ascii
table values for all the characters on a PC)
-= The AX register =-
-= One Word, an Int, or 2 bytes =-
-= high byte =-> A H A L <-= low byte of ax =-
|- 16 bits wide-|
These 4 registers above will hold no more than the value 65535 each.
That is as high as the binary numbers will go for a "Word"
(16 bits or 2 bytes). This is also, incidentally, the max for
one segment of memory.
The way you use these four registers is completely up to you. Stick
whatever values you want in them. The only time you must adhere to any
rules are when you are using an interrupt, or an Instruction such as
ADD, MUL, MOVSB, etc.
The Word registers (ax, bx, cx, dx) can be loaded directly with a value.
The same goes for si and di.
The Segment registers (cs, ds, es, bp, sp) must be loaded from another
register, or from a word sized variable. You can't say:
'mov ds, 0000'.
- I don't know why.
You can, however, say:
'mov ds, ax'
or
'mov ds, [0000]'.
The SS and IP registers could be a gnarly thing to mess with. I don't
think you can "load" these regs with values. (??)
CS is called the Code Segment register and holds the memory segment
address of your code which is executing away in memory. Your program
can look at the CS reg at any time to see where it is.
IP is called the Instruction Pointer. This is the offset of the
current "line of code" the CPU is executing in the address pointed
to by the CS reg. If the CS reg has the value 1000, and IP has the
value 0001, that means your program resides in memory segment 1000
and the CPU is currently executing intstruction 1. It is not a
cool thing to change these values. I don't think you can change the
IP reg.
DS is called the Data Segment. Like the CS reg, this points to a
segment of memory also. DS, however, points to where all (most of,
more on that later) your variables are stored.
This segment of memory is also referred to as DGROUP (Data Group ??).
DGROUP (for all practical purposes) only has enough volume to hold
65535 bytes. The variables in your program are actually assembled
into memory offsets from the beginning of DGROUP. This tells the CPU
what chunk of memory it is currently working with when you work with
a variable. Kinda like the IP reg.
ES is called the Extra Segment. I'm not really sure on this one.
I've used it for accessing BIOS memory locations like the byte
that tells you if the scroll lock is activated (as in the above program).
SI is called the String Index. It is mostly for transferring chunks
of memory with instructions such as MOVSB (MOVe a String Byte).
ES and DI also come into play with that instruction. If you've ever
heard of a BitBlit, I've a sneaky hunch they use MOVSB to do it.
I've used SI for indexing through a string to make all the letters
small caps. Something sorta like this:
mov si, StringPointer
:caploop
or byte ptr [si], 20h
cmp byte ptr [si], 0
je done
inc si
jmp caploop
DI is called the Destination Index. The only times I've used it
was when I was comparing two strings. I had a pointer to each string
and just stepped through them comparing as I went. DI is also used
in a MassMove or BitBlit or whatever you wanna call it.
SS is called the Stack Segment and points to an area of memory where
the Stack resides (more on that later).
When the stack tosses-it's-cookies, you can get errors like
"stack overflow, system halted" and stuff like that. The stack can
be a nasty thing to tweak on.
SP is called the Stack Pointer. It is the offset into the area of
memory where the stack is, and keeps track of the last thing "pushed"
onto, or "popped" from, the stack.
BP is called the Base Pointer and geez, know what ?
I have no damn clue what this is for. Wait, here it is (thanx to
Tom Swan).... says it's used for accessing variables inside the stack.
FLAGS or the flags register, has only 9 bits of the 16 which are used.
(yeah, right). Somewhere, sometime, someone will probably find out
the other 7 bits are actually undocumented features.
the flags:
O = Overflow Flag.
math done on regs where the total exceeds 65535 will trip the
overflow flag to a 1. When you need numbers bigger than 65535,
there are ways of doing it by using 2 regs.
D = Direction Flag
which direction to index bytes in a string. Forward or Backward.
0 is probably forward. I've always used CLD to Clear Direction
Flag before doing string stuff.
I = Interrupt Flag
0 = Interrupts are disabled and will not be valid.
1 = Interrupts are enabled and will be carried out.
you can toggle this flag with CLI (clear int flag) and STI
(set int flag)
T = Trap Flag
I think this is used with debuggers for executing one command
at a time when you are stepping thru code.
S = Sign Flag
Signifies whether a math operation changed the sign of a number.
Haven't used it much.
Z = Zero Flag
indicates result of a comparison operation. CMP CX, 1
Haven't used it much. I always use JE or JNE. I think this
instructions actually test the zero flag for 0 or 1.
A = Auxillary Flag
??
P = Parity Flag
??
C = Carry Flag
I've found DOS usually sets this flag to a 1 if there was an
error while attempting a DOS call. Like opening a file that
doesn't exist.
-JMP, INT, CLI, NOP, DB, DW ??
Assembly is extremely brief syntactically speaking, and executes like
wildfire compared to a HLL. The syntax seems cryptic, and is about as
pleasing to the eye as staring at the sun. The syntax, you will pick
up in no time flat (with some practice coding, of course).
JMP, JE, JNZ, JC, etc are 'JUMPS'. They instruct the processor
to go execute somewhere else (Like go JUMP in the lake).
JMP is a non conditional jump instruction.
It just means drop everything and JMP here or there.
JE is a conditional jump. It jumps on a true, or "equal" expression:
cmp cx, 1
je EXIT
translates into:
if the cx register is equal to 1, jump to the label called EXIT.
JC jumps if the carry flag (of the processor) is set to 1.
The only times I've used this is when I use a DOS call like open a file,
or write to a file. DOS will usually set the carry flag on errors.
JNZ jumps if the Zero Flag is NOT set in the FLAGS reg.
JZ jumps if the Zero Flag IS set.
There is a myriad of jumps. This is where the books come in handy.
CLI clears the Interrupt flag in the FLAGS reg. It disables all interrupts
while your code is fooling with something. -STI enables ints again.
NOP stands for No Operation. It instructs the cpu to do nothing.
The times I normally use it are when I modify some code
in Turbo Debugger and need to "blank-out" some lines. NOP takes up
one byte in memory. You can blank out a four byte instruction with four
NOPs.
DB stands for Define Bytes; DW stands for Define Words. A byte is 8 bits,
a "char" in C, one ascii character. DB tells the cpu, and maybe the
assembler, the stuff following DB is a memory variable, and not code.
The difference in DW is the memory allocated is one 16 bit chunk, a "int"
in C, or a numerical value 0 - 65535, or -32767 to +32767. "Signed"
numbers use the 16th bit in a word to designate "-" or "+".
-Memory: The way I understand it:
I have asked people, teachers, hackers, and read books. I still don't
quite know how all this is laid out. I will give you my interpretation
of how I see it.
Think of the memory on your PC as being several pages of graph paper taped
together, edge to edge.
Each page of this graph paper has 65,535 little squares printed on it.
Each one of these squares will hold 1 byte, or 8 bits, of information.
The whole page comprised of 65,535 squares is called one "Segment" of
memory.
Let's just say you coded a program in the small memory model where
you are allowed one segment of memory for code, and one segment of
memory for data.DGROUP has it's own page of graph paper, and the
code segment has it's own page.
The segments for both will start at 0 and end at 65535.
memory is used in terms of SEGMENT:OFFSET. This could be expressed
as DS:offset. When addressing a variable, the number left of the
colon is the Segment Address, and to the right of the colon is the
offset from the 0th byte of DS:0000.
When your program runs off to use a variable, the CPU will
use the offset which the assembler or linker derived from your variable
name, and stick that offset in one of the registers. When DOS loaded
your program, it found the next available chunk of memory, and set the
DS register equal to the beginning of it. The CPU looks at the DS register
and then gets the offset to find the Address.
Let's say the register used for the offset is the DX register.
Now, the DX is a word register meaning it will only hold up to 16 bits,
or two bytes. Meaning what, class?
Yes, that's right, the maximum attainable number with 16 binary bits is
65535. That suckz, right ? So, what this means is that is just about
impossible for a 16 bit register to access an offset greater than
65535. You have just learned the age old burden of the DOS operating
system. This is the same reasoning behind
Incidentally, if you add 1 to 65535, you don't get 65536. You get
zero, 0, NULL.
Let's say when you coded your program, you defined a variable called
FirstName, and made room for 20 bytes of storage. Now, when you
assembled and linked your program, FirstName got turned into a 16.
The address of FirstName will now be addressed at the 16th byte from
the 0th byte of DGROUP.
Now, whenever FirstName is used in your program while it runs, the
CPU knows FirstName starts at the 16th byte from 0, or DS:0010 (in hex)
- DGROUP:0010.
You can get at "variables" outside of your programs DGROUP. I have only
done this with stuff in the BIOS area, and vram. Lets say you want to
put a character into, or read a character from, video memory. I'll use
VGA text memory. This memory (for most PCs) starts at B800:0000h.
Vram is a little different. B800:0000 is the first character at 0,0
(upper left corner) on the screen. The second byte is the color for
that character. This means, if you want to print a string of characters
on the screen, you will have to make provisions for printing the character,
and then the attribute (color) for that character. I've done this by
using the ES register as a pointer to B800h (monochrome vram is at B000h).
-= example .com program to write Hello World to video memory =-
P8086 ;/* 286 code */
model tiny ;/* use TINY model for .com */
dataseg ;/* declare the data */
string db 'Hello World', '$'
codeseg ;/* declare code */
startupcode ;/* have tasm generate startup code */
call StartUp ;/* call the starting procedure */
mov ah, 4Ch ;/* dos function call to exit to DOS */
mov al, 00h ;/* could return a 00 to a batch file */
int 21h ;/* call dos */
ret
StartUp proc near
mov ax, 0B800h ;/* point to vga text memory */
mov es, ax ;/* with es */
xor di, di ;/* set di to 0 */
mov ax, cs ;/* make sure ds is set to */
mov ds, ax ;/* our DGROUP same as cs in */
;/* .COM file */
mov si, offset string ;/* point to string to display */
sloop:
mov al, [si] ;/* char to print into al */
cmp al, '$' ;/* check for end of string */
je done ;/* get out if true */
mov byte ptr es:[di], al ;/* write char to vram */
inc di ;/* inc the 'counter' */
mov byte ptr es:[di], 1Fh ;/* color white on bleu */
inc di
inc si ;/* set up for next char */
jmp sloop ;/* do it again */
done:
ret ;/* back to caller */
StartUp endp ;/* end of process */
end ;/* end of program */
-The Stack:
The stack is a storage thing. Imagine it as a bullet clip. Yeah, like
in DOOM.... You "push" the first bullet in, and then the second. You
have to "pop" the second bullet out to get at the first. You are also
limited to the number of bullets you can "push". "popping" after the
clip is empty is a bad thing, so you must use the stack carefully.
The stack holds things like addresses of places DOS is going, or went.
Addresses of variables your program is using, values of registers, and
just about anything else.
To make matters worse, DOS or another TSR may be using your stack
for it's own stuff too. Not at the same time, just while it's
servicing an interrupt. When an interrupt occurs, DOS stops everything
and pushes some stuff onto the stack. DOS then proceeds to let the
handler service the interrupt. When the interrupt is done, and DOS
picks up where it left off, the stack is normally the way DOS left
it before it was bothered. I mention this because a confused stack
may not always be your programs fault.
There are internal stacks for DOS, so I've heard. I don't know if
they are a variable size at any given time, or always at a set value.
But DOS always has a stack to play with.
Your program can set up it's own stack at any size you need, well,
just about any size. The size of stack you need is determined by
how many bytes your program might have pushed on it at any given
time. Apparently you don't always need a stack. I have been coding
assembly (succesfully) for about a year, and never set up my own
stack.
-Interfacing with HLLs
One of the major benefits of learning assembly, is your ability to
code assembly routines you can call from a HLL. Many people will
argue you don't need to rewrite the runtime libraries, but I say
that is bull. There is a real useless function call in C called
scanf(). If you have ever used it, it has pissed you off. When
your program expects a number, the user can also enter letters.
This screws everthing up. I never found a way to get scanf to reject
letters when I want numbers and versa-vicea. When you hit escape,
the cursor moves down a line. The peeves go on and on. If there
are any commercial software houses implementing scanf() in their
code, I'd swallow an elephant. My opinion is the same on that
damn '\t' tab character in the runtimes. Take that outta there!!!
okay, okay, I'm chilln. But imagine how much simpler life would
be if you coded a routine for keyboard input that would be useful.
A routine that knew the difference between a number and a letter.
A routine that would limit the number of characters typed so your
buffer wouldn't overflow, and you wouldn't need a stupid prompt
like:
Enter you name (no more than 15 characters):
A routine that might print in color. A routine that would abort if
the ESCAPE key was hit.....
Couple that routine with the speed and size of executable assembly
code, and the peasants will rejoice. You might not need a half-
nanosecond keyboard input routine, but if you get into computer
graphics and animation..... you will appreciate the speed.
Anyway, back on topic. The only interfacing I've done was with
Borland C. The best way to get started in assembly, (in my opinion)
is to get a C compiler with a built in assembler. Borland's
Turbo C and C++ for DOS would be the one I reccomend. It will
let you step thru assembly code, view variables, view the registers
do math in a popup window, set breakpoints tell you about illegal
memory references, and mistakes in general. The best part is, it's
probably about $50.00 bucks now. Maybe less with a student discount.
WARNING: TurboC does not come with TASM. At least, last time I checked.
I heard they don't include TASM with Borland C anymore. The only
difference I could tell from Turbo C and Borland C v3.1 was BC 3.1
did Windoze apps, and came with TASM. There is other stuff I never
used, like Turbo Profiler, that came with BC 3.1 . If you can't get
TASM, there are other assemblers (shareware) out there:
A86V322.ZIP 173335 01-25-90
Excellent shareware Assembler, Version 3.22.
You don't need to include any header files with the assembly code
in TC. The resulting .EXE file will be comparatively gargantuan
because there is a lotta extra junk compiled into your 'inline'
assembly code. I suppose there is probably a way to get it smaller,
but, eh, what the hell. It's just a learning tool.
-= regurgitated assembly code in Turbo C's IDE =-
void main(void) /* this is needed for the compiler */
{ /* opening brace for start of 'main' */
char color = 10; /* a 'byte' variable in C. set to 10 */
asm { /* opening brace for start of inline asm */
mov ah, 9h /* BIOS call 09h */
mov al, '*' /* print a star */
mov bh, 0 /* assume video page 0 */
mov bl, color /* byte value brite green into byte reg */
mov cx, 10h /* print 16 characters */
int 10h /* BIOS video service interrupt */
} /* closing brace for end of inline asm */
return; /* this is needed for the compiler */
} /* this too, and I ain't gonna get */
/* into void prototypes with a return */
This is all you need to do assembly code in C. The pain comes in
when you need to jump to different labels. Since C knows what a
'label' is, the label must be outside of any block of assembly code.
Otherwise C doesn't know it's there:
...
asm { /* opening brace for start of inline asm */
mov ah, 9h /* BIOS call 09h */
mov al, '*' /* print a star */
mov bh, 0 /* assume video page 0 */
mov bl, color /* byte value brite green into byte reg */
mov cx, 10h /* print 16 characters */
jmp DOINT /* jump to a label */
} /* closing brace for end of inline asm */
DOINT: /* label to jump to */
asm {
int 10h /* BIOS video service interrupt */
}
...
Some of the cool stuff in the IDE:
ALT + D, I = Inspect a variable using it's name
ALT + W, W = Open the watch window
ALT + C = Close current window
CTRL + F7 = Add a variable to the 'watch' window
CTRL + F8 = Toggle a breakpoint
CTRL + F4 = Evaluate and modify a variable on-the-fly
So, now you have a way to scope-out your assembly code while it's
running. I've caught a lot of gnarly errors doing this, and it's
a real neat way to debug a routine as you code it. When you are
done, you need to "port" it into raw assembly code.
You will need a "skeleton" interface file you can cut and paste
your finished code into. If you need function calls, you will
have to implement them accordingly. This is what I use:
-= CSTART.ASM - skeleton file for C interface with assembly =-
-= NOTE: You must include the '_' underscores for C =-
PUBLIC _test ;/* name of function call here */
.MODEL small, c ;/* small memory model, c calling style */
.CODE
_test proc ;/* name of function here also */
push ax ;/* save any regs you are using */
push bx
push cx
mov ah, 9h ;/* BIOS call 09h */
mov al, '*' ;/* print a star */
mov bh, 0 ;/* assume video page 0 */
mov bl, color ;/* byte value brite green into byte reg */
mov cx, 10h ;/* print 16 characters */
pop cx ;/* restore pushed reg values */
pop bx ;/* in the order they were pushed */
pop ax
exit:
ret
_test endp ;/* name of function here also */
end
Now that your routine is coded, you will have to assemble the
.ASM file with TASM:
tasm /ml /z /zn test.asm
ml = case sensitivity
z = display line numbers on error
zn = no debugging info
You now have a .OBJ file to throw into your custom library. (TEST.OBJ)
To build the library, go into TC's IDE and select PROJECT from
the menu bar. OPEN a project. The name of the project will be the
name of your library. You should get a project window at the bottom
of the screen. If you don't, hit ALT + W, P.
Hit the insert key and type in the name of your .OBJ file you are
putting into the library.
Go to OPTIONS on the menu, and select MAKE. Set the radio button
to RUN LIBRARIAN in the window marked AFTER COMPILING.
( or complaining! )
Go to OPTIONS again and SAVE the current setup. This saves all your
settings to the project configuration.
Now hit F9. If you don't get any errors, you have a library laying
somewhere, depending on your directory setup.
Now all that is left to do is declare a prototype for the new
function in your C source code.
void test(void);
This should go where you normally declare functions. You could
also make a header file for your new library and add new protos as
you make them. Then, just #include <> the header file whenever you
use your custom library. Oh, yeah. From now on, you will need to
use a project file whenever you use your new library. You do this
so C knows it is going to link in a library at compile time. There
may be another way....
In your project file for using your new library, put the path to
the C source code, and the path to the library in the project window.
That should do it. Now, everytime you call test() in your C code,
you'll get 16 stars on the screen in brite green. Yeah, it's a real
useless function, like scanf(), But it's a start. I wonder if scanf()
was Borland's first function ???
-TSRs
Well, this last section will probably be the last in this textfile.
I'll be close to 40k of text, and 1000 lines. Besides, TSRs are
still a goofy subject to me. I'll tell you what I know, but it
might not be right, my knowledge of coding TSRs is very small.
A TSR is a program that gets executed, sets up what it needs, gets rid
of what it doesn't, and stays in memory processing away until you
unload it. All of mine require the computer be turned off. I haven't
figured out how to unload one without ending up rebooting anyway, so
I figure the hell with it.
Anyway, most TSRs are activated thru an interrupt. This is what's known
as an ISR (Interrupt Service Routine). They attatch themselves to the
operating system interrupts and get to execute everytime their
interrupt is called. There is a table in low, low memory called the
Interrupt Vector Table. This is a fancy name for a list of addresses
DOS will call when responding to an interrupt.
Your TSR will "hook-into" an interrupt by getting the address of
that interrupt from a simple DOS call.
(line 81 thru 83 in the TSR code at the top of this file)
DOS returns the address, of the interrupt you specify, in the
segment registers BX and ES. Your TSR must "remember" this address
so it can "chain" to it, or pass control back to it, whenever.
Line 12 in the above code:
call dword ptr cs:OldInt ;/* chain to old handler */
You aren't overwriting the memory where the original routine is
stored, just overwriting it's address in the IVT. When DOS calls
the interrupt your hooked into, YOUR program gets the call instead
of the original one.
Lets say you hit the Print Screen key. This is interrupt 05h.
When you hit it, DOS gets the message that an interrupt has occurred.
It knows it's interrupt 05h, so it looks up the address of int 05h
in the IVT. DOS then passes control to whoever is at the address of
int 05h. DOS does not care if you screwed up. DOS is all ready to
process away at the new address of execution. When the interrupt
has finished its gig, it returns control back to DOS, and DOS goes
about it's business like nothing ever happened (ideally, that is).
Things don't always work that way, and TSRs are the most frustating
of all applications to code, for the novice I suppose. Sometimes
they work, and sometimes they don't. I rained a whole barrage of
TSR questions down on an unsuspecting Computer prof. a year ago.
This dude really knew his stuff about C, but didn't want to say
jack about TSRs. Maybe he didn't know, or maybe he didn't want
to get into it. I don't know.
Anyway, there are a couple of important things that happen when
DOS services an interrupt. The return address of where DOS is
currently executing is pushed onto the stack. That would be the
CS:IP registers. This is a book mark for DOS.
DOS then pushes the flags register onto the stack, loads the new
address of execution into the CS:IP, and waits for the interrupt
to return control. When the interrupt is done, which is signaled
by an IRET (Interrupt Return), DOS pops the flags off the stack,
and then pops the address of where it left off before the interrupt
occurred.
DOS does not check any values to see if they are correct. DOS
figures the stack is just the way it left it. Execution will
continue at the CS:IP it just popped. It does not care if your
ISR forgot to clean up after itself. This is why you need to
leave the stack just the way you found it.
I think this is also the case with the flags register. If you
use the flags in your ISR, I think they should be left just the
way DOS gave them to you. If you do some math, change the flags,
and didn't save them, the program that got interrupted is going
to think IT changed the flags.
Lets say your playing a game. You want to save your game, or
restore one or something where the disk drive will be accessed.
You move the mouse to "SAVE GAME", point and grunt, and the
information starts dumping to disk. In the meantime, you are
jigging the mouse back-and-forth while the game is being saved,
waiting for it to finish. Everytime you move the mouse, you
are triggering an int 33h. The mouse driver is the ISR for
int 33h. This means the game is writing to the disk, thru DOS,
and DOS keeps jumping off calling the mouse ISR.
I mentioned before, the flags are used to signal an error during
a write to a file. If the mouse driver screws up the flags, and
doesn't restore them the way they came, your game will think IT
screwed up the flags, and give you an error, or crash, or
whatever. Needless to say, the same goes for the other registers.
In the above TSR code, not all the registers are pushed.
The SS, SP, and BP registers are not pushed. Sometimes you get
errors from too many pushes, and sometimes from not enough. It
seems you end up with a Happy-Medium somewhere.
THEY SAY you can't issue any DOS calls inside a TSR. As far as
I'm concerned, this is BULL. Think of all the screen grabbing
programs out there. These programs are writing to a file when
you hit the hot key, so there IS a way to do it. I had a TSR that
did that and saved it in a Borland C getimage() format. The TSR
managed to save the image, but would puke right after. That's how
I blew up my FAT (for the second time). Remember to back up the
important stuff on your hard drive!
So, anyway, this is going to conclude this text on assembly language.
You are going to have to get some books to clarify the things I've
covered here. I am still learning this stuff too, so don't be
surprised if some of it is not quite right.
L8r,
SwmgT
/****************************** ASMTXT01.TXT *******************************/
Welcome to the world of assembly language!
I will be speaking in terms of the DOS operating system using 8086
assembly language.
First off, I will admit I am no professional on Assembly Language. I am
a beginner. Why is a beginner writing a text on assembly language ??
Well, I'll tell ya. I looked and looked and looked on all kinds of boards
for text files on assembly when I wanted to get into it. I found nothing, or
a lot of nothing. A lot of the stuff I mention in this text file is based on
my observations from things I've tried, things I've read, and things I've
done. I have learned all of this stuff from experimenting and no
experienced assistance. So, if Some of my explanations are way off-
base, I apologize in advance. If you are an assembly guru reading this,
feel free to make the neccesary corrections. All I ask is you rename the
file to ASMTXT02.TXT or whatever, accordingly. If you can't upload the
revised file using an updated form of the original .ZIP filename, just
be clever with the description.
A quick glance at terminology and other arcane wording found here:
(These aren't neccesarily standards)
MASM = Microsoft Macro Assembler
TASM = Turbo Assembler (Borland International)
PWB = Programmers Work Bench (Microsoft's MASM editor v6.1)
BC = Borland International's C compiler and programming environment.
HLL = High Level Language
asm = acronym pertaining to assembly.
cpu = Central Processing Unit. Your processor _chip_ (286, 386, 486)
vram = Video Ram (video memory)
int = short for "interrupt". Think of interrupts as a "tug-on-the-pantleg"
of the operating system, or the CPU, or the BIOS, etc.
mem = RAM on a pc. Memory.
regs = referring to the registers inside the CPU. reg, regs, etc.
ivt = the Interrupt Vector Table.
IDE = Integrated Developement Environment
... = more code here
DOS call = any DOS interrupt. Defined by "Int 21h"
BIOS call = any BIOS interrupt. Int 10h, Int 13h, etc.
comfile = a "Core Image", or .COM file.
executable = any file you can execute through DOS. .EXE, .COM and maybe
.BAT .
Books I have found to be of high caliber, excellent reading, the
most informative, helpful, and worth the bucks. These are not in any
certain order. You _will_ need more than one book to learn asm. I have,
uh lessee, -10 books on assembly. I've read most all of them, but usually
resort to the following three:
Mastering Turbo Assembler - Tom Swan - Hayden Books 1989.
isbn#: 0-672-48435-8
This guy is an excellent author. He keeps things simple, and covers a
wide area of asm.
$29.95
PC Principles - Gunnar Forst - The M.I.T. press 1992.
isbn#: 0-262-06134-1 (hd. cover?) and 0-262-56053-4 (paperback)
This book was first published in Copenhagen, Denmark under the name
"PC Principper".
The best way to explain this book is "bitchin". This is my 'black-book'.
It covers EVERYTHING. Hardware, Software, how DOS works, how memory works,
.EXE headers, Boot sector info, FAT, Machine Language, blah, blah.
A must. This book beats anything Norton can throw.
$32.50
Using Assembly Language 3rd ed. - Allen L. Wyatt Sr. - Que 1992.
isbn#: 0-88022-884-9
My copy is falling apart. The cover is falling off, and chapter 20 has
been detached from the binding. This book is great because it covers all
the interrupts with snippets of examples. The author keeps things
light-hearted, and throws in some humor here and there.
Excellent reference for HLL interfacing and interrupts in general.
$29.95
-Why learn Assembly?
This is largely a matter of opinion. Some people will irritatingly
state the following:
"assembly is archaic and outdated"
"with all the High Level languages available today, why bother ?"
"There is nothing you can do in assembly that I can't do in a HLL"
The truth of the matter is, human nature tells us to take the easy
way out. HLL's are very forgiving these days with their integrated
compilers wich warn you of things like suspicios pointers and such.
Take a look at some COBOL code sometime; it's almost like talking to
something.
A friend and I bought MASM 6.1 a couple years ago. The online
help was EXCELLENT, but the editor drove me up the wall. You could
(I did) sit for six hours customizing every detail of the editor.
When I would assemble and link the programs from within MASM'S
Programmers Work Bench, I would get weird errors I felt had nothing
to do with the code. I would have to shell to DOS, assemble and link
the code, but I couldn't step thru it with the PWB because it would
try and re-assemble the code again with the Same grating erros. I
was probably doing something wrong, or had something set wrong, but
finally, DEL C:\MASM\*.* .
I purchased Borland C v3.1 a year later, and wow, BC comes with Turbo
Assembler! There is no 'editor' or workbench or anything. You
can run TA from inside BC's 'editor', but I don't think you can
step thru the code; Well not TRUE assembly code. You can regurgitate
assembly code within BC with "asm" statements and step thru it that way
(I learned a lot doing that). It is a kinda ass pain, but it makes a
real nice code stepper for assembly.
With TASM, what you get is a command line assembler and
linker. Plain Vanilla. That's about what I ended up doing with
MASM, but now there is no temptation to try and figure out the
PWB.
If you don't have a good debugger to double check things, running
your assembly programs in a watch-and-see type fashion can produce some
very interesting cataclysmic reactions. I have blown up my File Allocation
Tables in the past, and screwed up the video bios. But, I never
"permanently" wrecked anything. And I never had a debugger until about
2 years ago (Borland's Turbo Debugger). I'm not patient enough to
analyze the hell out of a chunk of code. That is also why I am not
a professional.
If you have the patience for assembly, the payoffs (for me anyway)
can be very rewarding. I have learned a lot about memory, the
processor, hardware, floppy drives, boot sectors, interrupts and
TSR's. Ah yes, TSRs. Here is something I would like to see done
in a HLL. Code a TSR in your favorite HLL, but I would like to
see the resulting .COM file size under 1000 bytes.
Here is mine using Turbo Assembler (I'll discuss some of this later).
This advances a character across the monochrome screen and is toggled
on and off with the scroll lock key.
(The .COM file linked to 183 bytes, yes a 183 byte executable file)
1 .model tiny ;/* tiny mem model for comfiles */
2 intnum equ 1ch ;/* hook the timer interrupt */
3 .code
4 org 100h
5
6 first:
7 jmp loadprog ;/* load the TSR into memory */
8
9 prog proc ;/* tsr code in mem starts here */
10 pushf ;/* save flags for old handler */
11
12 call dword ptr cs:OldInt ;/* chain to old handler */
13
14 cmp cs:inproc, 1 ;/* flag to check if processing */
15 je exit ;/* get out if true */
16
17 mov cs:inproc, 1 ;/* set flag */
18
19 push ax ;/* save all the registers */
20 push bx ;
21 push cx ;
22 push dx ;
23 push di ;
24 push si ;
25 push ds ;
26 push es ;
27 pushf ;/**/
28
29 cli ;/* disable ints while fooling */
30 mov ax, 0040h ;/* with segment registers */
31 mov es, ax ;
32 mov al, byte ptr es:[0017h] ;
33 sti ;/* turn interrupts on again */
34
35 test al, 00010000b ;/* check scroll lock key */
36 jz xorcounter
37
38 cmp cs:counter, 4000d ;/* bottom of screen in vram */
39 je xorcounter ;/* reset counter if there */
40
41 mov ax, 0B000h ;/* point to monochrome vram */
42 mov es, ax ;/* should really cli here */
43 mov si, cs:counter ;/* use counter for character*/
44 mov byte ptr es:[si], 0ffh ;/* erase color attirbute */
45
46 inc si ;/* counter ++ */
47
48 mov byte ptr es:[si], 01h ;/* display character */
49
50 mov cs:counter, si ;/* store counter */
51 mov cs:inproc, 0
52 jmp exit ;/* that's enough for one tick */
53
54 xorcounter:
55 xor cx, cx ;/* reset counter */
56 mov cs:counter, cx ;/**/
57 mov cs:inproc, 0
58
59
60 exit:
61 popf ;/* restore registers */
62 pop es ;
63 pop ds ;
64 pop si ;
65 pop di ;
66 pop dx ;
67 pop cx ;
68 pop bx ;
69 pop ax ;/**/
70
71 iret ;/* return from handler */
72 oldint label word
73 oldaddr dd 0000
74 counter dw 0
75 inproc db 0
76
77 prog endp
78
79 loadprog proc
80
81 mov ah, 35h ;/* DOS function for getting */
82 mov al, intnum ;/* interrupt vector */
83 int 21h
84
85 mov oldint, bx ;/* save old vector */
86 mov oldint[2], es ;/**/
87
88 mov ah, 25h
89 lea dx, prog ;/* "prog" process goes TSR */
90 int 21h
91
92 exit2:
93
94 mov dx, offset succesmsg ;/* display message */
95 mov ah, 9 ;
96 int 21h ;/**/
97
98 mov dx, offset loadprog ;/* release memory used by */
99 int 27h ;/* loading process */
100
101 succesmsg db 0dh, 0ah, 'TSR succesfully loaded.', 0dh, 0ah, '$'
102
103 loadprog endp
104 end first
-Assembler? Assembly?:
An assembler is the program used for assembling your 'text-file'
assembly code. TASM (Borland Intl.), MASM (Microsoft), etc.
It converts the "text" into a file called an object or .OBJ file.
The .OBJ file is then used by the LINKER to create, or link,
the final executable file from the .OBJ:
HELLO.ASM -> Assembler = HELLO.OBJ
HELLO.OBJ -> Linker = HELLO.EXE or HELLO.COM
Assembly is the language. Everyone seems to call assembly language
'assembler'; "When did you first learn Assembler ?"
You can pretty much figure the person asking this question is talking
about programming and not writing an assembler, but if you are a stickler
for terminology, the above statement is misleading.
-Switches
The assembler and linker interpret what are called 'switches' to
handle the whole process in different ways. these are usually designated
by a '-' or '/' (slash). I haven't used all of them, just a couple to
either link to a .COM file or just assemble to an .OBJ for incorporating
the code into a higher level language. (more on that later)
-High Level? Low Level?
Assembly is a low level language. This means it is very close to the
native 'tongue' of the processor. The language of the processor is
actually voltage deviations simply meaning True of False; 1 or 0, on and
off, etc. I guess the step up from the voltage pulses would be binary,
1 and 0 and then machine language, 0 thru 256, then assembly, then
higher level languages like Basic, Pascal, C, Fortran, and a whole
slough of others. If you have ever tried to type a 'binary' file
(executable) from the DOS command line to the screen, you have probably
heard all kinds of beeping from the speaker and a garbled mess of
characters flying all over. This is the machine language you see. All
those little chars. mean something different in the world of the
processor. You could probably write a program using these chars., but
that would be a tedious process. I guess they actually used to do that,
but then along came asm. (assembly).
-Registers ?
Registers are little chunks of memory inside the CPU. All these registers
do is hold numbers. That's it.
Using these registers is a lot faster (for the CPU) than using a variable
in standard memory. All values in these registers are expressed in base 16,
or "Hexidecimal".
There are (for practical purposes) 14 registers in a 8086 CPU (There are
a few more, but they are related to protected mode programming). All of
them, in themselves, are 16 bits wide. This means values ONLY from
0 - 65535 can be stored in them. With the exception of the FLAGS.
AX, BX, CX, DX
CS, IP
DS, ES
SI, DI
SS, SP
BP, FLAGS
AX is called the Accumulator, BX is called the Base, CX is called the Count
and DX is called the Data Index.
the two bytes, in AX for example, are referred to as the High Byte and
Low byte. AH and AL. You can only stick a byte value in each register.
(8 bits, a "char" in C). Decimal values are from 0 - 255. (the ascii
table values for all the characters on a PC)
-= The AX register =-
-= One Word, an Int, or 2 bytes =-
-= high byte =-> A H A L <-= low byte of ax =-
|- 16 bits wide-|
These 4 registers above will hold no more than the value 65535 each.
That is as high as the binary numbers will go for a "Word"
(16 bits or 2 bytes). This is also, incidentally, the max for
one segment of memory.
The way you use these four registers is completely up to you. Stick
whatever values you want in them. The only time you must adhere to any
rules are when you are using an interrupt, or an Instruction such as
ADD, MUL, MOVSB, etc.
The Word registers (ax, bx, cx, dx) can be loaded directly with a value.
The same goes for si and di.
The Segment registers (cs, ds, es, bp, sp) must be loaded from another
register, or from a word sized variable. You can't say:
'mov ds, 0000'.
- I don't know why.
You can, however, say:
'mov ds, ax'
or
'mov ds, [0000]'.
The SS and IP registers could be a gnarly thing to mess with. I don't
think you can "load" these regs with values. (??)
CS is called the Code Segment register and holds the memory segment
address of your code which is executing away in memory. Your program
can look at the CS reg at any time to see where it is.
IP is called the Instruction Pointer. This is the offset of the
current "line of code" the CPU is executing in the address pointed
to by the CS reg. If the CS reg has the value 1000, and IP has the
value 0001, that means your program resides in memory segment 1000
and the CPU is currently executing intstruction 1. It is not a
cool thing to change these values. I don't think you can change the
IP reg.
DS is called the Data Segment. Like the CS reg, this points to a
segment of memory also. DS, however, points to where all (most of,
more on that later) your variables are stored.
This segment of memory is also referred to as DGROUP (Data Group ??).
DGROUP (for all practical purposes) only has enough volume to hold
65535 bytes. The variables in your program are actually assembled
into memory offsets from the beginning of DGROUP. This tells the CPU
what chunk of memory it is currently working with when you work with
a variable. Kinda like the IP reg.
ES is called the Extra Segment. I'm not really sure on this one.
I've used it for accessing BIOS memory locations like the byte
that tells you if the scroll lock is activated (as in the above program).
SI is called the String Index. It is mostly for transferring chunks
of memory with instructions such as MOVSB (MOVe a String Byte).
ES and DI also come into play with that instruction. If you've ever
heard of a BitBlit, I've a sneaky hunch they use MOVSB to do it.
I've used SI for indexing through a string to make all the letters
small caps. Something sorta like this:
mov si, StringPointer
:caploop
or byte ptr [si], 20h
cmp byte ptr [si], 0
je done
inc si
jmp caploop
DI is called the Destination Index. The only times I've used it
was when I was comparing two strings. I had a pointer to each string
and just stepped through them comparing as I went. DI is also used
in a MassMove or BitBlit or whatever you wanna call it.
SS is called the Stack Segment and points to an area of memory where
the Stack resides (more on that later).
When the stack tosses-it's-cookies, you can get errors like
"stack overflow, system halted" and stuff like that. The stack can
be a nasty thing to tweak on.
SP is called the Stack Pointer. It is the offset into the area of
memory where the stack is, and keeps track of the last thing "pushed"
onto, or "popped" from, the stack.
BP is called the Base Pointer and geez, know what ?
I have no damn clue what this is for. Wait, here it is (thanx to
Tom Swan).... says it's used for accessing variables inside the stack.
FLAGS or the flags register, has only 9 bits of the 16 which are used.
(yeah, right). Somewhere, sometime, someone will probably find out
the other 7 bits are actually undocumented features.
the flags:
O = Overflow Flag.
math done on regs where the total exceeds 65535 will trip the
overflow flag to a 1. When you need numbers bigger than 65535,
there are ways of doing it by using 2 regs.
D = Direction Flag
which direction to index bytes in a string. Forward or Backward.
0 is probably forward. I've always used CLD to Clear Direction
Flag before doing string stuff.
I = Interrupt Flag
0 = Interrupts are disabled and will not be valid.
1 = Interrupts are enabled and will be carried out.
you can toggle this flag with CLI (clear int flag) and STI
(set int flag)
T = Trap Flag
I think this is used with debuggers for executing one command
at a time when you are stepping thru code.
S = Sign Flag
Signifies whether a math operation changed the sign of a number.
Haven't used it much.
Z = Zero Flag
indicates result of a comparison operation. CMP CX, 1
Haven't used it much. I always use JE or JNE. I think this
instructions actually test the zero flag for 0 or 1.
A = Auxillary Flag
??
P = Parity Flag
??
C = Carry Flag
I've found DOS usually sets this flag to a 1 if there was an
error while attempting a DOS call. Like opening a file that
doesn't exist.
-JMP, INT, CLI, NOP, DB, DW ??
Assembly is extremely brief syntactically speaking, and executes like
wildfire compared to a HLL. The syntax seems cryptic, and is about as
pleasing to the eye as staring at the sun. The syntax, you will pick
up in no time flat (with some practice coding, of course).
JMP, JE, JNZ, JC, etc are 'JUMPS'. They instruct the processor
to go execute somewhere else (Like go JUMP in the lake).
JMP is a non conditional jump instruction.
It just means drop everything and JMP here or there.
JE is a conditional jump. It jumps on a true, or "equal" expression:
cmp cx, 1
je EXIT
translates into:
if the cx register is equal to 1, jump to the label called EXIT.
JC jumps if the carry flag (of the processor) is set to 1.
The only times I've used this is when I use a DOS call like open a file,
or write to a file. DOS will usually set the carry flag on errors.
JNZ jumps if the Zero Flag is NOT set in the FLAGS reg.
JZ jumps if the Zero Flag IS set.
There is a myriad of jumps. This is where the books come in handy.
CLI clears the Interrupt flag in the FLAGS reg. It disables all interrupts
while your code is fooling with something. -STI enables ints again.
NOP stands for No Operation. It instructs the cpu to do nothing.
The times I normally use it are when I modify some code
in Turbo Debugger and need to "blank-out" some lines. NOP takes up
one byte in memory. You can blank out a four byte instruction with four
NOPs.
DB stands for Define Bytes; DW stands for Define Words. A byte is 8 bits,
a "char" in C, one ascii character. DB tells the cpu, and maybe the
assembler, the stuff following DB is a memory variable, and not code.
The difference in DW is the memory allocated is one 16 bit chunk, a "int"
in C, or a numerical value 0 - 65535, or -32767 to +32767. "Signed"
numbers use the 16th bit in a word to designate "-" or "+".
-Memory: The way I understand it:
I have asked people, teachers, hackers, and read books. I still don't
quite know how all this is laid out. I will give you my interpretation
of how I see it.
Think of the memory on your PC as being several pages of graph paper taped
together, edge to edge.
Each page of this graph paper has 65,535 little squares printed on it.
Each one of these squares will hold 1 byte, or 8 bits, of information.
The whole page comprised of 65,535 squares is called one "Segment" of
memory.
Let's just say you coded a program in the small memory model where
you are allowed one segment of memory for code, and one segment of
memory for data.DGROUP has it's own page of graph paper, and the
code segment has it's own page.
The segments for both will start at 0 and end at 65535.
memory is used in terms of SEGMENT:OFFSET. This could be expressed
as DS:offset. When addressing a variable, the number left of the
colon is the Segment Address, and to the right of the colon is the
offset from the 0th byte of DS:0000.
When your program runs off to use a variable, the CPU will
use the offset which the assembler or linker derived from your variable
name, and stick that offset in one of the registers. When DOS loaded
your program, it found the next available chunk of memory, and set the
DS register equal to the beginning of it. The CPU looks at the DS register
and then gets the offset to find the Address.
Let's say the register used for the offset is the DX register.
Now, the DX is a word register meaning it will only hold up to 16 bits,
or two bytes. Meaning what, class?
Yes, that's right, the maximum attainable number with 16 binary bits is
65535. That suckz, right ? So, what this means is that is just about
impossible for a 16 bit register to access an offset greater than
65535. You have just learned the age old burden of the DOS operating
system. This is the same reasoning behind
Incidentally, if you add 1 to 65535, you don't get 65536. You get
zero, 0, NULL.
Let's say when you coded your program, you defined a variable called
FirstName, and made room for 20 bytes of storage. Now, when you
assembled and linked your program, FirstName got turned into a 16.
The address of FirstName will now be addressed at the 16th byte from
the 0th byte of DGROUP.
Now, whenever FirstName is used in your program while it runs, the
CPU knows FirstName starts at the 16th byte from 0, or DS:0010 (in hex)
- DGROUP:0010.
You can get at "variables" outside of your programs DGROUP. I have only
done this with stuff in the BIOS area, and vram. Lets say you want to
put a character into, or read a character from, video memory. I'll use
VGA text memory. This memory (for most PCs) starts at B800:0000h.
Vram is a little different. B800:0000 is the first character at 0,0
(upper left corner) on the screen. The second byte is the color for
that character. This means, if you want to print a string of characters
on the screen, you will have to make provisions for printing the character,
and then the attribute (color) for that character. I've done this by
using the ES register as a pointer to B800h (monochrome vram is at B000h).
-= example .com program to write Hello World to video memory =-
P8086 ;/* 286 code */
model tiny ;/* use TINY model for .com */
dataseg ;/* declare the data */
string db 'Hello World', '$'
codeseg ;/* declare code */
startupcode ;/* have tasm generate startup code */
call StartUp ;/* call the starting procedure */
mov ah, 4Ch ;/* dos function call to exit to DOS */
mov al, 00h ;/* could return a 00 to a batch file */
int 21h ;/* call dos */
ret
StartUp proc near
mov ax, 0B800h ;/* point to vga text memory */
mov es, ax ;/* with es */
xor di, di ;/* set di to 0 */
mov ax, cs ;/* make sure ds is set to */
mov ds, ax ;/* our DGROUP same as cs in */
;/* .COM file */
mov si, offset string ;/* point to string to display */
sloop:
mov al, [si] ;/* char to print into al */
cmp al, '$' ;/* check for end of string */
je done ;/* get out if true */
mov byte ptr es:[di], al ;/* write char to vram */
inc di ;/* inc the 'counter' */
mov byte ptr es:[di], 1Fh ;/* color white on bleu */
inc di
inc si ;/* set up for next char */
jmp sloop ;/* do it again */
done:
ret ;/* back to caller */
StartUp endp ;/* end of process */
end ;/* end of program */
-The Stack:
The stack is a storage thing. Imagine it as a bullet clip. Yeah, like
in DOOM.... You "push" the first bullet in, and then the second. You
have to "pop" the second bullet out to get at the first. You are also
limited to the number of bullets you can "push". "popping" after the
clip is empty is a bad thing, so you must use the stack carefully.
The stack holds things like addresses of places DOS is going, or went.
Addresses of variables your program is using, values of registers, and
just about anything else.
To make matters worse, DOS or another TSR may be using your stack
for it's own stuff too. Not at the same time, just while it's
servicing an interrupt. When an interrupt occurs, DOS stops everything
and pushes some stuff onto the stack. DOS then proceeds to let the
handler service the interrupt. When the interrupt is done, and DOS
picks up where it left off, the stack is normally the way DOS left
it before it was bothered. I mention this because a confused stack
may not always be your programs fault.
There are internal stacks for DOS, so I've heard. I don't know if
they are a variable size at any given time, or always at a set value.
But DOS always has a stack to play with.
Your program can set up it's own stack at any size you need, well,
just about any size. The size of stack you need is determined by
how many bytes your program might have pushed on it at any given
time. Apparently you don't always need a stack. I have been coding
assembly (succesfully) for about a year, and never set up my own
stack.
-Interfacing with HLLs
One of the major benefits of learning assembly, is your ability to
code assembly routines you can call from a HLL. Many people will
argue you don't need to rewrite the runtime libraries, but I say
that is bull. There is a real useless function call in C called
scanf(). If you have ever used it, it has pissed you off. When
your program expects a number, the user can also enter letters.
This screws everthing up. I never found a way to get scanf to reject
letters when I want numbers and versa-vicea. When you hit escape,
the cursor moves down a line. The peeves go on and on. If there
are any commercial software houses implementing scanf() in their
code, I'd swallow an elephant. My opinion is the same on that
damn '\t' tab character in the runtimes. Take that outta there!!!
okay, okay, I'm chilln. But imagine how much simpler life would
be if you coded a routine for keyboard input that would be useful.
A routine that knew the difference between a number and a letter.
A routine that would limit the number of characters typed so your
buffer wouldn't overflow, and you wouldn't need a stupid prompt
like:
Enter you name (no more than 15 characters):
A routine that might print in color. A routine that would abort if
the ESCAPE key was hit.....
Couple that routine with the speed and size of executable assembly
code, and the peasants will rejoice. You might not need a half-
nanosecond keyboard input routine, but if you get into computer
graphics and animation..... you will appreciate the speed.
Anyway, back on topic. The only interfacing I've done was with
Borland C. The best way to get started in assembly, (in my opinion)
is to get a C compiler with a built in assembler. Borland's
Turbo C and C++ for DOS would be the one I reccomend. It will
let you step thru assembly code, view variables, view the registers
do math in a popup window, set breakpoints tell you about illegal
memory references, and mistakes in general. The best part is, it's
probably about $50.00 bucks now. Maybe less with a student discount.
WARNING: TurboC does not come with TASM. At least, last time I checked.
I heard they don't include TASM with Borland C anymore. The only
difference I could tell from Turbo C and Borland C v3.1 was BC 3.1
did Windoze apps, and came with TASM. There is other stuff I never
used, like Turbo Profiler, that came with BC 3.1 . If you can't get
TASM, there are other assemblers (shareware) out there:
A86V322.ZIP 173335 01-25-90
Excellent shareware Assembler, Version 3.22.
You don't need to include any header files with the assembly code
in TC. The resulting .EXE file will be comparatively gargantuan
because there is a lotta extra junk compiled into your 'inline'
assembly code. I suppose there is probably a way to get it smaller,
but, eh, what the hell. It's just a learning tool.
-= regurgitated assembly code in Turbo C's IDE =-
void main(void) /* this is needed for the compiler */
{ /* opening brace for start of 'main' */
char color = 10; /* a 'byte' variable in C. set to 10 */
asm { /* opening brace for start of inline asm */
mov ah, 9h /* BIOS call 09h */
mov al, '*' /* print a star */
mov bh, 0 /* assume video page 0 */
mov bl, color /* byte value brite green into byte reg */
mov cx, 10h /* print 16 characters */
int 10h /* BIOS video service interrupt */
} /* closing brace for end of inline asm */
return; /* this is needed for the compiler */
} /* this too, and I ain't gonna get */
/* into void prototypes with a return */
This is all you need to do assembly code in C. The pain comes in
when you need to jump to different labels. Since C knows what a
'label' is, the label must be outside of any block of assembly code.
Otherwise C doesn't know it's there:
...
asm { /* opening brace for start of inline asm */
mov ah, 9h /* BIOS call 09h */
mov al, '*' /* print a star */
mov bh, 0 /* assume video page 0 */
mov bl, color /* byte value brite green into byte reg */
mov cx, 10h /* print 16 characters */
jmp DOINT /* jump to a label */
} /* closing brace for end of inline asm */
DOINT: /* label to jump to */
asm {
int 10h /* BIOS video service interrupt */
}
...
Some of the cool stuff in the IDE:
ALT + D, I = Inspect a variable using it's name
ALT + W, W = Open the watch window
ALT + C = Close current window
CTRL + F7 = Add a variable to the 'watch' window
CTRL + F8 = Toggle a breakpoint
CTRL + F4 = Evaluate and modify a variable on-the-fly
So, now you have a way to scope-out your assembly code while it's
running. I've caught a lot of gnarly errors doing this, and it's
a real neat way to debug a routine as you code it. When you are
done, you need to "port" it into raw assembly code.
You will need a "skeleton" interface file you can cut and paste
your finished code into. If you need function calls, you will
have to implement them accordingly. This is what I use:
-= CSTART.ASM - skeleton file for C interface with assembly =-
-= NOTE: You must include the '_' underscores for C =-
PUBLIC _test ;/* name of function call here */
.MODEL small, c ;/* small memory model, c calling style */
.CODE
_test proc ;/* name of function here also */
push ax ;/* save any regs you are using */
push bx
push cx
mov ah, 9h ;/* BIOS call 09h */
mov al, '*' ;/* print a star */
mov bh, 0 ;/* assume video page 0 */
mov bl, color ;/* byte value brite green into byte reg */
mov cx, 10h ;/* print 16 characters */
pop cx ;/* restore pushed reg values */
pop bx ;/* in the order they were pushed */
pop ax
exit:
ret
_test endp ;/* name of function here also */
end
Now that your routine is coded, you will have to assemble the
.ASM file with TASM:
tasm /ml /z /zn test.asm
ml = case sensitivity
z = display line numbers on error
zn = no debugging info
You now have a .OBJ file to throw into your custom library. (TEST.OBJ)
To build the library, go into TC's IDE and select PROJECT from
the menu bar. OPEN a project. The name of the project will be the
name of your library. You should get a project window at the bottom
of the screen. If you don't, hit ALT + W, P.
Hit the insert key and type in the name of your .OBJ file you are
putting into the library.
Go to OPTIONS on the menu, and select MAKE. Set the radio button
to RUN LIBRARIAN in the window marked AFTER COMPILING.
( or complaining!
Go to OPTIONS again and SAVE the current setup. This saves all your
settings to the project configuration.
Now hit F9. If you don't get any errors, you have a library laying
somewhere, depending on your directory setup.
Now all that is left to do is declare a prototype for the new
function in your C source code.
void test(void);
This should go where you normally declare functions. You could
also make a header file for your new library and add new protos as
you make them. Then, just #include <> the header file whenever you
use your custom library. Oh, yeah. From now on, you will need to
use a project file whenever you use your new library. You do this
so C knows it is going to link in a library at compile time. There
may be another way....
In your project file for using your new library, put the path to
the C source code, and the path to the library in the project window.
That should do it. Now, everytime you call test() in your C code,
you'll get 16 stars on the screen in brite green. Yeah, it's a real
useless function, like scanf(), But it's a start. I wonder if scanf()
was Borland's first function ???
-TSRs
Well, this last section will probably be the last in this textfile.
I'll be close to 40k of text, and 1000 lines. Besides, TSRs are
still a goofy subject to me. I'll tell you what I know, but it
might not be right, my knowledge of coding TSRs is very small.
A TSR is a program that gets executed, sets up what it needs, gets rid
of what it doesn't, and stays in memory processing away until you
unload it. All of mine require the computer be turned off. I haven't
figured out how to unload one without ending up rebooting anyway, so
I figure the hell with it.
Anyway, most TSRs are activated thru an interrupt. This is what's known
as an ISR (Interrupt Service Routine). They attatch themselves to the
operating system interrupts and get to execute everytime their
interrupt is called. There is a table in low, low memory called the
Interrupt Vector Table. This is a fancy name for a list of addresses
DOS will call when responding to an interrupt.
Your TSR will "hook-into" an interrupt by getting the address of
that interrupt from a simple DOS call.
(line 81 thru 83 in the TSR code at the top of this file)
DOS returns the address, of the interrupt you specify, in the
segment registers BX and ES. Your TSR must "remember" this address
so it can "chain" to it, or pass control back to it, whenever.
Line 12 in the above code:
call dword ptr cs:OldInt ;/* chain to old handler */
You aren't overwriting the memory where the original routine is
stored, just overwriting it's address in the IVT. When DOS calls
the interrupt your hooked into, YOUR program gets the call instead
of the original one.
Lets say you hit the Print Screen key. This is interrupt 05h.
When you hit it, DOS gets the message that an interrupt has occurred.
It knows it's interrupt 05h, so it looks up the address of int 05h
in the IVT. DOS then passes control to whoever is at the address of
int 05h. DOS does not care if you screwed up. DOS is all ready to
process away at the new address of execution. When the interrupt
has finished its gig, it returns control back to DOS, and DOS goes
about it's business like nothing ever happened (ideally, that is).
Things don't always work that way, and TSRs are the most frustating
of all applications to code, for the novice I suppose. Sometimes
they work, and sometimes they don't. I rained a whole barrage of
TSR questions down on an unsuspecting Computer prof. a year ago.
This dude really knew his stuff about C, but didn't want to say
jack about TSRs. Maybe he didn't know, or maybe he didn't want
to get into it. I don't know.
Anyway, there are a couple of important things that happen when
DOS services an interrupt. The return address of where DOS is
currently executing is pushed onto the stack. That would be the
CS:IP registers. This is a book mark for DOS.
DOS then pushes the flags register onto the stack, loads the new
address of execution into the CS:IP, and waits for the interrupt
to return control. When the interrupt is done, which is signaled
by an IRET (Interrupt Return), DOS pops the flags off the stack,
and then pops the address of where it left off before the interrupt
occurred.
DOS does not check any values to see if they are correct. DOS
figures the stack is just the way it left it. Execution will
continue at the CS:IP it just popped. It does not care if your
ISR forgot to clean up after itself. This is why you need to
leave the stack just the way you found it.
I think this is also the case with the flags register. If you
use the flags in your ISR, I think they should be left just the
way DOS gave them to you. If you do some math, change the flags,
and didn't save them, the program that got interrupted is going
to think IT changed the flags.
Lets say your playing a game. You want to save your game, or
restore one or something where the disk drive will be accessed.
You move the mouse to "SAVE GAME", point and grunt, and the
information starts dumping to disk. In the meantime, you are
jigging the mouse back-and-forth while the game is being saved,
waiting for it to finish. Everytime you move the mouse, you
are triggering an int 33h. The mouse driver is the ISR for
int 33h. This means the game is writing to the disk, thru DOS,
and DOS keeps jumping off calling the mouse ISR.
I mentioned before, the flags are used to signal an error during
a write to a file. If the mouse driver screws up the flags, and
doesn't restore them the way they came, your game will think IT
screwed up the flags, and give you an error, or crash, or
whatever. Needless to say, the same goes for the other registers.
In the above TSR code, not all the registers are pushed.
The SS, SP, and BP registers are not pushed. Sometimes you get
errors from too many pushes, and sometimes from not enough. It
seems you end up with a Happy-Medium somewhere.
THEY SAY you can't issue any DOS calls inside a TSR. As far as
I'm concerned, this is BULL. Think of all the screen grabbing
programs out there. These programs are writing to a file when
you hit the hot key, so there IS a way to do it. I had a TSR that
did that and saved it in a Borland C getimage() format. The TSR
managed to save the image, but would puke right after. That's how
I blew up my FAT (for the second time). Remember to back up the
important stuff on your hard drive!
So, anyway, this is going to conclude this text on assembly language.
You are going to have to get some books to clarify the things I've
covered here. I am still learning this stuff too, so don't be
surprised if some of it is not quite right.
L8r,
SwmgT
/****************************** ASMTXT01.TXT *******************************/
December 6, 2017
Add comments