56

CHAPTER 8 - SHIFT AND ROTATE

There are seven instructions that move the individual bits of a
byte or word either left or right. Each instruction works
slightly differently. We'll make a standard program and then
substitute each instruction into that program.

SAL - SHL

The instructions SHL (shift logical left) and SAL (shift
arithmetic left) are exactly the same. They have the same machine
code. They shift each bit to the left. How far? That depends.
There are two (and only two) forms of this instruction. All other
shift and rotate instructions have these two (and only these two)
forms as well. The first form is:

shl al, 1

Which shifts each bit to the left one bit. The number MUST be 1.
No other number is possible. The other form is:

shl al, cl

shifts the bits in AL to the left by the number in CL. If CL = 3,
it shifts left by 3. If CL = 7, it shifts left by 7. The count
register MUST be CL (not CX). The bits on the left are shifted
out of the register into the bit bucket, and zeros are inserted
on the right. The easy way to understand this is to fire up the
standard program. Remember, from now on we always use
template.asm.

;sal.asm
; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
mov ax_byte, 0A3h ; half reg, low reg binary
mov bx_byte, 0A4h ; half reg, low reg hex
mov cx_byte, 0A1h ; half reg, low reg signed
mov dx_byte, 0A2h ; half reg, low reg unsigned
lea ax, ax_byte
call set_reg_style

mov ax, 0 ; clear registers
mov bx, 0
mov cx, 0
mov dx, 0
mov di, 0
mov bp, 0
call show_regs

outer_loop:
call get_hex_byte ; get number and put in registers
mov bl, al
mov cl, al

______________________

The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson

Chapter 8 - Shift and Rotate 57
____________________________

mov dl, al
mov si, 8 ; 8 iterations of the loop
and al, al ; set the flags
call show_regs_and_wait
shift_loop:
sal al, 1
sal bl, 1
sal cl, 1
sal dl, 1
call show_regs_and_wait
dec si
jnz shift_loop
jmp outer_loop

; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

This standard program is with bytes, not words. This is because
if we had used words we would have performed 16 individual shifts
and that would have been time consuming and boring. First we set
the style to half registers. Notice that one is binary, one is
hex, one is signed and one is unsigned. That covers all bases.
All the registers are then cleared. It would be nice to use the
loop instruction, but CX is committed, so we make our own loop
instruction. We move 8 into SI. The loop instructions are:

dec si
jnz shift_loop

DEC decrements a register or a variable by 1. Its counterpart INC
increments a register or variable by 1. JNZ (jump if not zero)
jumps to 'shift_loop' if SI is not zero.

We get a hex byte in AL and put the same byte in BL, CL, and DL.
This way we will be able to see what is happening in binary, hex,
signed and unsigned. Before starting, we have:

and al, al

This is there to set the flags correctly before starting. All
four are shifted left one bit each time, and then we look at the
result.

Assemble, link and run it. Enter the number 7. In binary, that is
(0000 0111). Take a look at the flags before starting. It is a
positive number so SF shows '+'. ZF is not set. PF shows 'O'. O
stands for odd. Every time you perform an arithmetic or logical
operation, the 8086 checks parity. Parity is whether the number
contains an even or odd number of 1 bits. This contains 3 1 bits,
so the parity is odd. The possible settings are 'E' for even and
'O' for odd.{1} SAL checks for parity (though some of the other
instructions don't). Now press ENTER. It will shift left 1 and
you will have (0000 1110). What does the unsigned number say now?
14. Press ENTER again. (0001 1100) What does the unsigned number
say? 28. Again (0011 1000) 56. Again (0111 0000) 112. Notice that
____________________

1 This is for use by communications programs.

The PC Assembler Tutor 58
______________________

the signed number reads +112. Look at the CF and OF. They are
both cleared. Things are going to change now. Press ENTER again.
(1110 0000). SF is now '-'. OF, the overflow flag is set because
you changed the number from positive to negative (from +112 to
-32). What is the unsigned number now? 224. CF is cleared. PF is
'0'. Shift again. (1100 0000) OF is cleared because you didn't
change signs. (Remember, the leftmost bit is the sign bit for a
signed number). PF is now 'E' because you have two 1 bits, and
two is even. CF is set because you shifted a 1 bit off the left
end. Keep pressing ENTER and watch SF, OF, CF, and PF.

Let's look at the unsigned numbers we had until we started
shifting 1 bits off the left end. We started with 7, then had 14,
28, 56, 112, 224. This instruction is multiplying by 2. That's
right, and it is MUCH faster than multiplication (about 50 times
faster). Far and away the fastest way to multiply a register by
2, 4 or 8 is to use sal.

; by 2 ;by 4 ; by 8
sal di,1 sal di, 1 sal di, 1
sal di, 1 sal di, 1
sal di, 1

For a register, it is faster to use a series of 1 shifts than to
load cl. For a variable in memory, anything over 1 shift is

Do a few more numbers to see what is happening both with the
number and the flags. CF always signals when a 1 bit has been
shifted off the end.

SAR and SHR

Unlike the left shift instruction, there are two completely
different right shift instructions. SHR (shift logical right)
shifts the bits to the right, setting CF if a 1 bit is pushed off
the right end. It puts 0s in the leftmost bit. Make a copy of
SAL.ASM and replace the four instructions:

sal al, 1
sal bl, 1
sal cl, 1
sal dl, 1

with SHR. We'll call the new program SHR.ASM. Run this one too.
Instead of 7, use E0h (1110 0000) which is 224d. The first time
you shift (0111 0000) the OF flag will be set because the sign
changed. Keep shifting, noting the flags and the unsigned number.
This time we have 224, 112, 56, 28, 14, 7, 3, 1. It is dividing
by two and is once again MUCH faster than division. For a single
shift, the remainder is in CF. For a shift of more than one bit,
you lose the remainder, but there is a way around this which we
will discuss in a moment. Do some more numbers till you are
comfortable with the flags and the operation.

If you want to divide by 16, you will shift right four times, so

Chapter 8 - Shift and Rotate 59
____________________________

you'll lose those 4 bits. But those bits are exactly the value of
the remainder. All we need to do is:

mov dx, ax ; copy of number to dx
and dx, 0000000000001111b ; remainder in dx
mov cl, 4 ; shift right 4 bits
shr ax, cl ; quotient in ax

Using a mask, we keep only the right four bits, which is the
remainder.

SAR

SAR (shift arithmetic right) is different. It shifts right like
SHR, but the leftmost bit always stays the same. This will make
more sense when you run the program. Make another copy, call it
SAR.ASM, and change the four instructions to SAR. The flags
operate the same as for SHR and SHL. The overflow flag will never
change since the left bit will always stay the same.

First enter 74h (+116). We will be looking at the signed numbers
only. Copy down the signed numbers as you go along. They should
be: 116, 58, 29, 14, 7, 3, 1, 0, 0. Now try 8Ch (-116). The
numbers you should get are: -116, -58, -29, -15, -8, -4, -2, -1,
-1. They started out the same, then they got off by one. The
negative numbers are one too negative. Try 39h (+57). The
numbers here are: 57, 28, 14, 7, 3, 1, 0, 0, 0. Just as it should
be for division by 2. Now try C7 (-57). Here the numbers are:
-57, -29, -15, -8, -4, -2, -1, -1, -1. This time it went screwy
right off the bat. Once again, the negative numbers are one too
negative.

SAR is an instruction for doing signed division by 2 (sort of).
It is, however, an incomplete instruction. The rule for SAR is:
SAR gives the correct answer if the number is positive. It gives
the correct answer if the number is negative and the remainder is
zero. If the number is negative but there is a remainder, then
the answer is one too negative. The reason for this is a little
complex, but we need to add some code if we want to do signed
division.{2} For SHR, the remainder part was optional. Here it is
not. We need to know whether the remainder is zero or not. For
this example we will do a word shift left by 6. That's dividing
by 64.

call get_signed ; number in ax
mov bx, ax ; copy in bx
and bx, remainder_mask ; the remainder
mov cl,6 ; shift right 6 bits
sar ax, cl
jns continue ; is it positive?
____________________

2 Both the code and the reasons will be explained (but not
proved) in the summary.

The PC Assembler Tutor 60
______________________

and bx, bx ; is the remainder zero?
jz continue
inc ax
continue:

We get the remainder, then shift right 6 bits. Upon finishing
SAR, the sign flag will be set correctly. Here is yet another
jump. This one is JNS (jump on not sign) jumps if the sign flag
is NOT set, that is if the number is positive. If it is positive,
then everything is ok so we skip ahead. If the number is
negative, then we check to see if there was a remainder. If there
wasn't, everything is ok, so we go ahead. If there was a
remainder, then we INC (add 1) ax.

Is the remainder correct? If the number was positive, the
remainder is correct, but if the number was negative, then we
need to do one more thing. After INC, but before 'continue' we
have a SUB instruction:

inc ax
sub bx, 64 ; correct the remainder
continue:

Why that is the correct number will be explained in the summary.
What a lot of work when we could simply write:

mov cx, 64
call get_signed
cwd ; sign extend
idiv cx ; signed division

Is there any advantage to this instruction? Not really. Remember
that the more you shift, the longer it takes. If you shift 2,
then it's about 1/3 faster than division. If you shift 14, then
it is only 15% faster than division. Considering that even a slow
PC can do 25000 divisions a second, you must be in serious need
of speed to use this. In any case, you will never or almost never
use SAR for signed division, while you will find lots of
opportunity to use SHR and SHL for unsigned multiplication and
division.

ROR and ROL

ROR (rotate right) and ROL (rotate left) rotate the bits around
the register. We will just do one program since they operate the
same way, only in opposite directions. Make another copy of
SAL.ASM and put in ROR in the appropriate spots.

Enter a number. This time you will notice that the bits, rather
than dissapearing off the end, reappear on the other side. They
rotate around the register. The only flags that are defined are
OF and CF. OF is set if the high bit changes, and CF is set if a
1 bit moves off the end of the register to the other side. Do a
few more, and we'll go on to the last two instructions.

Chapter 8 - Shift and Rotate 61
____________________________

RCR and RCL

RCR (rotate through carry right) and RCL (rotate through carry
left) rotate the same as the above instructions except that the
carry flag is involved. Rotating right, the low bit moves to CF,
the carry flag and CF moves to the high bit. Rotating left, the
high bit moves to CF and CF moves to the low bit. There are 9
bits (or 17 bits for a word) involved in the rotation. Make yet
another copy of the program, and change those 4 instructions to
RCR. Also, since we have 9 bits instead of 8, change the loop
count to 9 from 8:

mov si, 9

Enter a number and watch it move. Before you start moving, look
at CF and see if there is anything in it. There are only two
flags defined, OF and CF. Obviously, CF is set if there is
something in it. OF is wierd. In RCL (the opposite instruction to
the one we are using), OF operates normally, signalling a change
in the top (sign) bit. In RCR, OF signals a change in CF. Why? I
don't have the slightest idea. You really have no need for the OF
flag anyways, so this is unimportant.

Well, those are the seven instructions, but what can you do with
them besides multiply and divide?

First, you can work with multiple bit data. The 8087 has a word
length register called the status register. Looking at the upper
byte:

15 14 13 12 11 10 9 8
X X X

bits 11, 12 and 13 contain a number from 0 to 7. The data in this
register is not directly accessable. You need to move the
register into memory, then into an 8086 register. If you want to
find what this number is, what do you do?

mov bx, status_register_data
mov cl, 3
ror bx, cl
and bh, 00000111b

we rotate right 3 and then mask off everything else. The number
is now in BH. We could have used SHR if we wanted. Another 8087
register is the control register. In the upper byte it has:

15 14 13 12 11 10 9 8
X X

a number from 0 to 3 in bits 10 and 11. If we want the
information, we do the same thing:

mov bx, control_register_data
mov cl, 2
ror bx, cl

The PC Assembler Tutor 62
______________________

and bh, 00000011b

and the number is in BH.

You are now going to write a program that inputs an unsigned
number and prints out its hex representation. Here it is:

; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
mov ax_byte, 0A5h ; half regs, right ascii
mov bx_byte, 4 ; hex
mov dx_byte, 4 ; hex
lea ax, ax_byte
call set_reg_style
call show_regs

outer_loop:
call get_unsigned
mov bx, ax
mov dx, ax
mov cx, 4
inner_loop:
push cx ; save cx
mov cl, 4
rol bx, cl ; rotate left 1/2 byte
mov al, bl ; copy to al
and al, 0Fh ; mask off upper 1/2 byte
cmp al, 10 ; < 10, 0 - 9 ; > 9 A - F
jae use_letters
add al, '0' ; change to ascii
jmp print_it
use_letters:
add al, 'A' - 10 ; 10 = 'A'
print_it:
call print_ascii_byte
call show_regs_and_wait
pop cx
loop inner_loop
jmp outer_loop
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

AL will be shown in ascii while BX and DX will be in hex. We save
the original number in DX. Since the first thing we want to print
is the left hex character, we rotate left, not right. We move the
low byte to AL, mask off everything but the low hex number and
then convert to an ascii character. If it is 0 - 9, we add '0'
(the character, not the number). If it is > 9, we add "'A' - 10"
and get a letter (if the number is 10, we get 'A'). JAE means
jump if above or equal, and is an unsigned comparison.{3}

____________________

3 You are getting innundated with conditional jump
instructions. Don't worry. As long as you understand each one
when you run across it, you don't have to remember it. All jump
instructions will be covered soon.

Chapter 8 - Shift and Rotate 63
____________________________

Finally, we print the ascii character that is in AL.{4}

Another thing to notice is that just inside the loop we push CX.
That is because we use CL for the ROL instruction. It is then
POPped just before the loop instruction. This is typical. CX is
the only register that can be used for counting in indexed
instructions. It is common for indexing instructions to be
nested, so you temporarily store the old value of CX while you
are using CX for something different.

push cx ; typical code for a shift
mov cl, 7
shr si, cl
pop cx

Finally, let's multiply large numbers by 2. Here's the code:

; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
byte1 db ?
byte2 db ?
byte3 db ?
byte4 db ?
error_message db "Result is too large.", 0
; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE

; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
outer_loop:
lea ax, byte1 ; get 4 byte number
call get_unsigned_4byte

shl byte1, 1
rcl byte2, 1
rcl byte3, 1
rcl byte4, 1
jnc go_on
lea ax, error_message
call print_string
go_on:
lea ax, byte1
call print_unsigned_4byte
jmp outer_loop
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE

This will require some explaination. Get_unsigned_4byte gets a
number from 1 to four billion. We put it in memory. Normally, the
following instructions would be done word by word. We are doing
them byte by byte so you can see the mechanics of the situation.
The low byte is shifted left 1 bit. This doubles it, but may
shift a 1 bit from the high bit into CF. If it does, then it will
be present when we rotate byte2. That moves CF into the low bit
and moves the high bit into CF. We do it again. And again. If
there is an unsigned overflow, it will be signalled by CF being
____________________

4 Any subroutine in ASMHELP.OBJ that involves a one byte input
or output has the data in AL.

The PC Assembler Tutor 64
______________________

set after:

rcl byte4, 1

JNC (jump on not carry) will skip the error message if everything
is ok. Print_string prints a zero terminated string, that is a C
string which is terminated by the number (not the character) 0.
Finally, we print the number.

A word about large numbers in ASMHELP.OBJ. It is assumed that you
would like to use commas if you could. Any data type over 1 word
long allows commas. The following are considered the same by
ASMHELP.OBJ in its input routines:

23546787
2,3,5,4,6,7,8,7
23,,5,46,,78,7
23,546787
23,546,787

It always prints commas correctly in the print routines.

Chapter 8 - Shift and Rotate 65
____________________________

SUMMARY

All shift and rotate instructions operate on either a register or
on memory. They can be either 1 bit shifts:

sal cx, 1
ror variable1, 1
shr bl, 1

or shifts indexed by CL (it must be CL):

rcl variable2, cl
sar si, cl
rol ah, cl

SHL and SAL

SHL (shift logical left) and SAL (shift arithmetic left) are
exactly the same instruction. They move bits left. 0s are
placed in the low bit. Bits are shoved off the register (or
memory data) on the left side, and CF indicates whether the
last bit shoved was a 1 or a 0. It is used for multiplying
an unsigned number by powers of 2.

SHR

SHR (shift logical right) does the same thing as SHL but in
the opposite direction. Bits are shifted right. 0s are
placed in the high bit. Bits are shoved off the register (or
memory data) on the right side and CF indicates whether the
last bit shoved off was a 0 or a 1. It is used for dividing
an unsigned number by powers of 2.

SAR

SAR (shift arithmetic right) shifts bits right. The high
(sign) bit stays the same throughout the operation. Bits are
shoved off the register (or memory data) on the right side.
CF indicates whether the last bit shoved off was a 1 or a 0.
It is used (with difficulty) for dividing a signed number by
powers of 2.

ROR and ROL

ROR (rotate right) and ROL (rotate left) rotate the bits of
a register (or memory data) right and left respectively. The
bit which is shoved off one end is moved to the other end.
CF indicates whether the last bit moved from one end to the
other was a 1 or a 0.

RCR and RCL

The PC Assembler Tutor 66
______________________

RCR (rotate through carry right) and RCL (rotate through
carry left) rotate the bits of a register (or of memory
data) right and left respectively. The bit which is shoved
off the register (or data) is placed in CF and the old CF is
placed on the other side of the register (or data).

INC
INC increments a register or a variable by 1.

inc ax
inc variable1

DEC
DEC decrements a register or a variable by 1.

dec ax
dec variable1

The following is fairly technical. It is only for those willing
to wade their way through a turgid explaination. If you don't
understand it, forget it.

CODE FOR SHL

If you are shifting an UNSIGNED number right by 'X' bits, it is
the same as dividing by (2 ** X) 1 bit = (2**1 = 2), 2 bits =
(2**2 = 4), 7 bits = (2**7 = 128). This is the same as dividing
by a number which is all 0s except the Xth bit which is 1 (for 0
we have 0000 0001, for 1 we have 0000 0010, for 3 we have 0000
1000, for 7 we have 1000 0000). The remainder mask will be this
number minus 1 (for 0 we have 0000 0000, for 1 we have 0000 0001,
for 3 we have 0000 0111, for 7 we have 0111 1111).

CODE FOR SAR

The order of numbers is important for SAR. If you start with 0
and add 1 each time, the actual sequence of signed numbers that
you get (from the bottom up) is:

-1
-2
.
.
-32767
-32768
+32767
+32766
.
.
3
2
1
0

Chapter 8 - Shift and Rotate 67
____________________________

The positive numbers are increasing in absolute value while the
negative numbers are decreasing in absolute value. If you divide
by shifting and there is no remainder, then the quotient is
exact. If there is a remainder, the quotient will truncate
towards 0 IN THE ABOVE DIAGRAM. This means that positive numbers
will truncate down, while the negative numbers will truncate
towards -32768, and will be one too negative.

If the number was positive, the remainder will be positive and
will be exactly the same as for SHR. If the number was negative,
then things are more complicated. We'll take division by 32 as an
example. If we divide by 32 (0010 0000) the remainder mask will
be 31 (0001 1111). If the number is negative, then what we get