Category : Assembly Language Source Code
Archive   : ASMTUT4.ZIP
Filename : CHAP19-1.DOC

 
Output of file : CHAP19-1.DOC contained in archive : ASMTUT4.ZIP



191

CHAPTER 19 - STRINGS


Sometimes we want to deal with long strings of information. Here
long means hundreds or thousands of bytes, not tens of bytes. The
8086 provides a group of instructions to move and compare
strings. These instructions have a rigid structure, but with a
little bit of effort we can get them to work easily for us. We
will start with SCAS, since it is simple, yet embodies all the
rigid features of these instructions.

SCAS (scan string) compares either a byte to AL or a word to AX.
The byte or word must be in memory, and the register must be AL
or AX. SCAS also increments or decrements the pointer. First, the
size:

scasb

compares a byte to AL, while:

scasw

compares a word to AX. But where's the pointer? You have no
choice, it's DI. Not only is it DI, but it MUST be ES:DI. The ES
segment is coded into the 8086 microcode; the DI register is
coded into the 8086 microcode; there is nothing you can do to
change it. What about incrementing or decrementing? In the flags
register, there is a flag called the direction flag. It is set
manually by the program. If DF = 0, SCAS increments DI; if DF =
1, SCAS decrements DI.{1} The equivalent software for the
instruction would be:

(scasb) (scasw)
DF = 0 cmp al, es:[di] cmp ax, es:[di]
pushf pushf
add di, 1 add di, 2
popf {2} popf


DF = 1 cmp al, es:[di] cmp ax, es:[di]
pushf pushf
sub di, 1 sub di, 2
popf popf

____________________

1. Every time you have called show_regs DF has been there; it
doesn't show 0 and 1, it shows + and - (+ = 0 , - = 1).

2. The microcode doesn't really push and pop the flags. This
is only to indicate that the order of operations is (1) get the
byte (word) from the string, (2) compare and set the flags, and
finally (3) increment (decrement) the pointer without changing
any of the flags.

______________________

The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson




The PC Assembler Tutor 192
______________________

Thus, at the end of the end of the instruction, DI is in a new
place and the flags are set according to the compare result. DI
is incremented by a byte for the byte instructions; it is
incremented by a word for the word instructions. The same pattern
holds true for decrementing DI.

We set the direction flag with the instruction STD (set direction
flag) and we clear the direction flag by using CLD (clear
direction flag). It only needs to be set or cleared once, and
this should be done before starting the operation. DF is only
changed by those specific instructions from the program - it
can't be changed by any arithmetical or logical operation on the
chip.

If you have a string and you are looking for a specific number,
(27 for instance), you simply put that number in AL (or AX) and
run a loop. If:

long_string db 5000 dup (?)

contains data and we want to look for a 27d then the operation
is:

lea di, long_string{3}
mov al, 27
cld

search_loop:
scasb
jne search_loop

on exiting, DI will point 1 PAST the matching byte (word). You
move back one byte (word) to find the match. Why would anyone
want this instruction? With a 0 in AL, it will find the end of a
C (0d terminated) string quickly. Also, that number 27 is no
accident. 27d is the ASCII escape character. For a lot of
hardware, 27d indicates that the bytes that follow are not ASCII
characters but are technical information. For instance, on my
printer the sequence (27d, 65d, 0d, 11d) sets tabs every 11
columns. SCAS can find where these substrings are so the program
can operate on them.

In order to use string instructions, we need strings to work on.
The one we will use is called CH1STR.OBJ. It is in \XTRAFILE. It
is an object file that contains one data segment containing a
string of lower case characters. The string is several thousand
bytes long, it is terminated by 0, and it contains ONLY the lower
case letters (a-z). It is the first draft of part of chapter 0
with all punctuation, numbers, spaces, carriage returns etc.
deleted. All upper case letters have been converted to lower case
so we don't have to worry about the difference between A and a, Q
and q.

The name of the array in CH1STR.OBJ is CH1STR and it is defined:
____________________

3. Assuming that long_string's segment address is in ES.




Chapter 19 - Strings 193

____________________


PUBLIC ch1str

so that you can access it with the SEG and OFFSET operators.
First, let's find out how long the string is.{4}

MYPROG1.ASM
; - - - - - - - - - -
STRINGSTUFF SEGMENT PUBLIC 'DATA'
EXTRN ch1str:BYTE
STRINGSTUFF ENDS
; - - - - - - - - - -
;- - - - - - - - - - PUT CODE BELOW THIS LINE

mov ax, seg ch1str ; segment address of ch1str
mov es, ax

mov di, offset ch1str ; offset address of ch1str
mov al, 0 ; try to match zero
cld ; increment (DF = 0)

string_end_loop:
scasb
jne string_end_loop

dec di ; back up one
mov ax, di
sub ax, offset ch1str
call print_unsigned

;- - - - - - - - - - PUT CODE ABOVE THIS LINE

In all these string operations, we need to be careful about
boundary conditions. What if there is no valid data? What if
there is one valid item? What if the string is empty?

On exiting the loop, DI will point 1 past the first 0d, so we
need to back up one to point to the first 0d. Then subtracting
the starting position will give us the count.{5} Try it out and
find out how long it is. Since we now have 3 object modules, the
link instruction must read:

link myprog+ch1str+asmhelp ;

assuming that you name your program myprog.asm. Save the result
because we will need to use this number several times.

____________________

4. Just to keep it from being too easy, I have put garbage
both in front of the string and behind the string. That means
that the string length is shorter than the length of the object
file and the string does not start with the first byte of the
object file.

5. If the first byte in the string is 0d, we move one, then
move back one which gives the length zero.




The PC Assembler Tutor 194
______________________

You will notice that we have gotten the segment address by using
the SEG operator. You don't need to know the name of the segment.
The segment doesn't even have to be PUBLIC. As long as the
VARIABLE is either in the same file or is in another file and
PUBLIC, the linker will find the correct segment address and put
it there.


To make things a little more complicated, we will make another
infinite loop. This time you will enter a character, and the
program will find the first occurance of that character. We need
to add some error checking here. Since you will probably be
dreaming about taking your next vacation in Hawaii while you are
entering the data, a few characters that don't exist in the
string (things like G $ ? ~ ) might creep in. It would be
possible to run way past the end of the string before you found
that character. We'll put the length of the string (from the last
program) in CX, have a regular loop so we can't go too far, and
jump out of the loop if we find a match.

MYPROG2.ASM
; - - - - - - - - - -
STRINGSTUFF SEGMENT PUBLIC 'DATA'
EXTRN ch1str:BYTE
STRINGSTUFF ENDS
; - - - - - - - - - -
;- - - - - - - - - - PUT CODE BELOW THIS LINE

mov ax, seg ch1str
mov es, ax

outer_loop:
call get_ascii_byte ; returns character in al
mov cx, $$$$$$$ ; enter string length here

mov di, offset ch1str
cld ; increment (DF = 0)

string_end_loop:
scasb
je after_loop ; if equal, we found the char
loop string_end_loop

mov ax, 0 ; we fell through the loop
call print_unsigned
jmp outer_loop

after_loop:
mov ax, di ; move for printing
sub ax, offset ch1str ; number of bytes
call print_unsigned

jmp outer_loop


;- - - - - - - - - - PUT CODE ABOVE THIS LINE





Chapter 19 - Strings 195
____________________

Those dollar signs are the place to enter the exact length of the
string that you got from the last program. This time we jump out
of the loop if we find a match; DI will be 1 past the matching
character, but this will give us the right count (if we find the
character in the first space, we increment once). If we can't
find a match we fall through the loop and print a 0. Remember to
link all 3 modules when you run the program. Run the program and
then we'll move forward.

This type of thing is so common with string operations that there
is a special prefix for SCAS and all other string operations
which makes the coding simpler. It has several forms:

rep decrement cx ; repeat if cx is not zero
repe decrement cx ; repeat if cx not zero and zf = 1
repz decrement cx ; repeat if cx not zero and zf = 1
repne decrement cx ; repeat if cx not zero and zf = 0
repnz decrement cx ; repeat if cx not zero and zf = 0

REP is for the move instructions which we will see later - it
won't work here. For each prefix, if either (or both) of the
conditions is not true, the repitition stops. For instance, with
REPE, if cx is zero, and/or if the comparison was not equal (so
the zero flag was not set), the instruction will stop. For our
program, the coding is:

repne scasb

That's it. That replaces the whole inner loop. Here is our new
coding of the last program.

MYPROG3.ASM
; - - - - - - - - - -
STRINGSTUFF SEGMENT PUBLIC 'DATA'
EXTRN ch1str:BYTE
STRINGSTUFF ENDS
; - - - - - - - - - -
;- - - - - - - - - - PUT CODE BELOW THIS LINE

mov ax, STRINGSTUFF
mov es, ax
cld ; increment (DF = 0)

outer_loop:
call get_ascii_byte ; returns character in al
mov cx, $$$$$$$ ; enter string length here
lea di, ch1str ; address of string

repne scasb

je found_the_char ; an equal comparison
mov ax, 0 ; we didn't find a match
call print_unsigned
jmp outer_loop

found_the_char:
mov ax, di ; move for printing




The PC Assembler Tutor 196
______________________

sub ax, offset ch1str ; number of bytes
call print_unsigned
jmp outer_loop

;- - - - - - - - - - PUT CODE ABOVE THIS LINE

There are two possibilities for exiting the 'repne scasb'
instruction. Either we found an equal comparison or we exhausted
all the characters in ch1str. If we found an equal comparison, JE
will send us to the print routine. Otherwise we print a 0 because
we finished the loop without finding anything.


STOS

We can ask the operating system to allocate memory for us while
the program is running.{6} When you get it, however, it will
contain trash. The fast way to clear it is to use STOS (store to
string). The instruction is:

stosb
or:
stosw

The equivalent action (not counting changing the value of DI) is:

mov es:[di], ax ; or AL for byte moves

Once again (1) the pointer is the ES:DI pair, which is mandatory,
and (2) DI is incremented or decremented (by 1 for byte, by 2 for
word) depending on the status of DF, the direction flag. The
instruction moves a byte (a word) from the AL (AX) register to
the memory address pointed to by ES:DI. We can use the REP{7}
instruction to speed things up a bit. If we have a 11,872 word
block of memory, we can clear it with the following instructions:

; - - - - - - -
DATASTUFF SEGMENT
my_bufferdw 11872 dup (?)
DATASTUFF ENDS
; - - - - - - -

mov ax, seg my_buffer
mov es, ax
cld ; increment (DF = 0)

mov ax, 0 ; clear the buffer with 0s
mov di, offset my_buffer
mov cx, 11872
rep stosw
____________________

6. Cf. You-know-who's Programmer's Guide to You-know-what or
"DOS Programmer's Reference."

7. There is no comparison here, so REPE or REPNE doesn't make
any sense.




Chapter 19 - Strings 197
____________________


That's as fast as it gets. Why does the STOS instruction use AX?
Because that's the register that port i/o uses. If you are
writing a communications program, you need speed. You can have
the following:

; - - - - - - - - -
DATASTUFF SEGMENT
port_address dw 0F2A8h ; this address is legal but
; there's nothing there.
input_buffer db 4000h dup (?)
output_buffer db 4000h dup (?)
DATASTUFF ENDS
; - - - - - - - - -

mov ax, DATASTUFF
mov es, ax
cld ; increment (DF = 0)
mov di, offset input_buffer
mov dx, port_address

input_loop:
in al, dx
stosb
jmp input_loop
; - - - - - - - - -

A real program would be much more complicated because we would
have to check to see if data was ready to come in and we might
need to check the data for errors. Also we would occasionally
have to clear the buffer. The port address F2A8h is just an
arbitrary address. It's a legal address but there's nothing
there.

We should write a program, so let's input a character and have it
fill the screen. We'll leave the last line of the screen alone so
you can see your input. Move your cursor to the last line before
beginning the program.

; - - - - - ENTER CODE BELOW THIS LINE

mov ax, 0B800h ; or 0B000h for a monochrome card
mov es, ax
cld ; increment (DF = 0)


outer_loop:
call get_ascii_byte ; AL = fill char from input
mov ah, 07h ; black background, white letters
sub di, di ; set di to zero
mov cx, 1920 ; 24 lines X 80 chars
rep stosw
jmp outer_loop

; - - - - - ENTER CODE ABOVE THIS LINE

If you have a monochrome card, the segment address is 0B000h. If




The PC Assembler Tutor 198
______________________

you have a color card and are in text mode, the segment address
should be 0B800h. This fills the first 24 lines with the input
character. The STOS instruction has no effect on the cursor.


LODS

The opposite of STOS is LODS (load string) It moves a byte (word)
from the string to the AL (AX) register. This time, for a change,
we use the SI register as a pointer, and the default register is
DS.{8} As always, SI is incremented or decremented by a byte
(word) depending on the setting of DF, the direction flag. The
two possibilities are:

lodsb
and
lodsw

The equivalent action (not counting changing the value of SI) is:

mov ax, [si] ; or AL for byte moves

This is an instruction for people that write device drivers. You
could use it if you are sending a string of characters to the
printer, but that's about it. Code for doing that would have the
following form:

; - - - - - - -
buffer db 1000 dup (?)
; - - - - - - -
lea si, buffer ; the buffer must be in the ds segment
cld ; increment

out_loop:
lodsb
and al, al ; if 0, end of string
jz quit_loop

mov dl, al ; move character to dl {9}
mov ah, 5 ; int 21h function 5
int 21h ; print a character
jmp out_loop

quit_loop:
; continue with the program

If you actually run this program, many printers will not print
anything until they get an end of line signal ( 10d, 13d).


____________________

8. Register DS can be overriden. We'll talk about that in the
second part of this chapter.

9. Int 21h (AH = 5) prints one character from DL to the
printer. Why it's DL and not AL is a mystery.



  3 Responses to “Category : Assembly Language Source Code
Archive   : ASMTUT4.ZIP
Filename : CHAP19-1.DOC

  1. Very nice! Thank you for this wonderful archive. I wonder why I found it only now. Long live the BBS file archives!

  2. This is so awesome! 😀 I’d be cool if you could download an entire archive of this at once, though.

  3. But one thing that puzzles me is the “mtswslnkmcjklsdlsbdmMICROSOFT” string. There is an article about it here. It is definitely worth a read: http://www.os2museum.com/wp/mtswslnk/